Appearance
A.B.E — Always Be Experimenting (Codex CLI)
The best engineers don't wait for perfect docs or polished UX—they probe, measure, and iterate. With Codex, that mindset pays off because the system is designed to be steered and stress-tested: you can pin rules in AGENTS.md
, toggle approval levels, switch models and reasoning, and even kick off parallel cloud tasks that keep working while you move on. (GitHub)
Why experimentation matters (now)
OpenAI's latest notes on Codex emphasize two things: (1) the agent can run locally or in the cloud, and (2) it's increasingly capable of taking on long, complex work (the GPT-5-Codex release details multi-hour autonomous execution and improved adherence to AGENTS.md
). That means small tweaks to setup and prompts can yield large changes in throughput and quality—precisely what experimentation uncovers. (OpenAI)
- Local: Codex CLI can read, modify, and run your code in the chosen directory; it supports non-interactive runs via
codex exec
. (OpenAI Developer) - Cloud: Codex Cloud provisions a sandboxed container per task, so you can run many tasks in parallel, in the background, and trigger them from web/IDE/iOS/GitHub. (OpenAI Developer)
A.B.E playbook — concrete experiments you can run this week
1) Constraint vs. freedom: tune your AGENTS.md
Write explicit invariants, allowed edits, and examples; then compare results with a looser version.
- Start with a minimal but precise
AGENTS.md
(OpenAI: "a README for agents" that guides coding agents). - Measure: diff size, test pass rate, and number of corrections you request.
- Expectation: tighter
AGENTS.md
→ more consistent adherence, fewer off-track edits. (GitHub)
2) Plan first, then execute: approvals as a safety dial
Run the same task across approval modes:
/approvals → Read Only
to force planning (no edits/runs).- Approve the plan; switch to Auto for local edits and commands within your workspace.
- Use Full Access sparingly (network & broader access).
- Measure: planning clarity, number of follow-up approvals, and rework. (Approval levels and behavior are documented in the CLI page and upgrade post.) (OpenAI Developer)
3) Model & reasoning sweeps
In CLI, try /model
to toggle models/reasoning, or launch with --model
:
- Compare default GPT-5 vs GPT-5-Codex for different task types
- Test o4-mini for simpler, faster iterations
- Measure: output quality, speed, and cost differences across models
Minimal experiment harness (repeatable)
- Define a target (e.g., "add tests for module X").
- Lock an
AGENTS.md
variant (strict vs loose). (GitHub) - Pick an approval mode (
/approvals
). (OpenAI Developer) - Choose engine (CLI
exec
vs Cloud task). (OpenAI Developer) - Log metrics: diff size, test outcomes, approvals needed, elapsed time, and any manual fixes.
- Iterate: change one variable at a time.
Codex provides logs, diffs, and test results to support this evidence-based loop—use them. (OpenAI)
Culture: publish your findings
We're still early in the adoption curve. Small, public write-ups ("strict vs loose AGENTS.md
", "Read-Only planning vs Auto", "CLI vs Cloud on long refactors") help the community converge on patterns faster—exactly how prior tooling waves matured. The platform is explicitly built to support parallel delegation, approvals, and project-level guidance, which makes it unusually fertile for controlled experiments. (OpenAI Developer)
Always Be Experimenting. Treat Codex like a system to be tuned: constrain, delegate, measure, repeat.
Sources
- Codex CLI docs — local agent, install,
exec
,/model
, approval modes. (OpenAI Developer) - Codex Cloud docs — background parallel tasks, sandboxed per-task containers, delegate from web/IDE/iOS/GitHub. (OpenAI Developer)
- AGENTS.md — official repo: "README for agents" to guide coding agents. (GitHub)
- Introducing upgrades to Codex (OpenAI) — GPT-5-Codex capabilities, adherence to
AGENTS.md
, approvals simplification, compaction, code review. (OpenAI)