Appearance
Rev the Engine (OpenAI Codex)
Idea in one line: Do multiple plan → critique → plan loops before you let the model touch your repo or tools. In OpenAI's stack you can make this deliberate "pre-execution thinking" explicit and reliable.
What this maps to in OpenAI
"Ultrathink" → Reasoning models & higher test-time compute. OpenAI's newer reasoning models (e.g., o3-mini / o3 family) are explicitly designed to spend more compute "thinking" at inference, i.e., deeper test-time reasoning than standard chat models.
"Plan Mode" → Plan-only turns (no tools, no edits). In the API, keep a turn strictly in planning by disabling tools:
tool_choice: "none"
(or the equivalent setting in the Responses/Assistants APIs). OpenAI's docs and changelog describetool_choice
control for tool calling. (OpenAI Platform)"Revving" → Iterated plan→critique cycles. Run several plan-only turns where each turn critiques and improves the previous plan (no side-effects yet). This mirrors research-backed patterns like ReAct (reasoning↔acting separation) and Reflexion (self-critique), which show measurable gains from iterative reasoning steps.
A concrete recipe (API-level)
Plan (no tools):
- Call a reasoning model and disable tool calls (
tool_choice: "none"
). - Force a structured plan (JSON) with
response_format
/ JSON schema so the plan is explicit, diff-able, and checkable. (Microsoft for Developers)
- Call a reasoning model and disable tool calls (
Critique & refine (still no tools):
- Feed back the plan and prompt the model to find missing edge cases, risky steps, ordering inefficiencies, and produce a revised plan (again as JSON).
- Repeat this loop 2–3×. This is your "rev the engine" phase, grounding the practice in iterative reasoning (Reflexion) rather than one-shot planning. (DataCamp)
(Optional) Best-of-N planning:
- Ask the API for multiple plan candidates and pick/merge the best parts (Chat Completions historically supported
n
for multiple candidates; many teams still use this pattern). (OpenAI Platform)
- Ask the API for multiple plan candidates and pick/merge the best parts (Chat Completions historically supported
Approve, then execute:
- Only after approval do you enable tool/function calls to run code, modify files, or hit external systems—i.e., switching from plan to act. (OpenAI function/tool calling docs.) (OpenAI Platform)
Parallelize safely (when needed):
- If the plan has separable work items, you can fan them out as parallel tool calls/subtasks—this follows the reason-then-act literature (ReAct) but keeps the risky steps after your robust plan is locked.
Why this works (and the evidence)
- More deliberate test-time compute helps on hard tasks → use OpenAI's reasoning models for the planning turns.
- Separate "thinking" from "doing." ReAct shows that interleaving/structuring reasoning and actions reduces hallucinations and improves robustness; your rev loops are the "reasoning" half done to convergence before any action.
- Self-critique improves plans. Reflexion-style prompts (critique → revise) reliably push solutions past one-shot baselines. (DataCamp)
Practical guardrails
- Keep planning turns tool-free. Use
tool_choice: "none"
so nothing executes while you're still refining. (OpenAI Platform) - Enforce structure. Ask for plans that conform to a JSON schema (
response_format
with schema) so you can diff/validate them mechanically. (Microsoft for Developers) - Only then execute with tools/functions. Use function/tool calling after an approved plan. (OpenAI Platform)
When to "rev" vs. just run
- Rev the engine for: multi-step refactors, cross-module feature work, or anything with tricky dependencies.
- Single pass is fine for: small, localized edits where the failure surface is tiny.
TL;DR
On OpenAI, "Rev the Engine" = reasoning model + plan-only turns (tool_choice: "none"
) + 2–3 self-critique loops + JSON-structured plans → then execute with tools. This keeps risk low, improves solution quality, and uses test-time compute where it pays off most. (OpenAI Platform)