A.B.E — Always Be Experimenting (Codex CLI)

The best engineers don't wait for perfect docs or polished UX—they probe, measure, and iterate. With Codex, that mindset pays off because the system is designed to be steered and stress-tested: you can pin rules in AGENTS.md, toggle approval levels, switch models and reasoning, and even kick off parallel cloud tasks that keep working while you move on. (GitHub)

Why experimentation matters (now)

OpenAI's latest notes on Codex emphasize two things: (1) the agent can run locally or in the cloud, and (2) it's increasingly capable of taking on long, complex work (the GPT-5-Codex release details multi-hour autonomous execution and improved adherence to AGENTS.md). That means small tweaks to setup and prompts can yield large changes in throughput and quality—precisely what experimentation uncovers. (OpenAI)

Local: Codex CLI can read, modify, and run your code in the chosen directory; it supports non-interactive runs via codex exec. (OpenAI Developer)
Cloud: Codex Cloud provisions a sandboxed container per task, so you can run many tasks in parallel, in the background, and trigger them from web/IDE/iOS/GitHub. (OpenAI Developer)

A.B.E playbook — concrete experiments you can run this week

1) Constraint vs. freedom: tune your `AGENTS.md`

Write explicit invariants, allowed edits, and examples; then compare results with a looser version.

Start with a minimal but precise AGENTS.md (OpenAI: "a README for agents" that guides coding agents).
Measure: diff size, test pass rate, and number of corrections you request.
Expectation: tighter AGENTS.md → more consistent adherence, fewer off-track edits. (GitHub)

2) Plan first, then execute: approvals as a safety dial

Run the same task across approval modes:

/approvals → Read Only to force planning (no edits/runs).
Approve the plan; switch to Auto for local edits and commands within your workspace.
Use Full Access sparingly (network & broader access).
Measure: planning clarity, number of follow-up approvals, and rework. (Approval levels and behavior are documented in the CLI page and upgrade post.) (OpenAI Developer)

3) Model & reasoning sweeps

In CLI, try /model to toggle models/reasoning, or launch with --model:

Compare default GPT-5 vs GPT-5-Codex for different task types
Test o4-mini for simpler, faster iterations
Measure: output quality, speed, and cost differences across models

Minimal experiment harness (repeatable)

Define a target (e.g., "add tests for module X").
Lock an AGENTS.md variant (strict vs loose). (GitHub)
Pick an approval mode (/approvals). (OpenAI Developer)
Choose engine (CLI exec vs Cloud task). (OpenAI Developer)
Log metrics: diff size, test outcomes, approvals needed, elapsed time, and any manual fixes.
Iterate: change one variable at a time.

Codex provides logs, diffs, and test results to support this evidence-based loop—use them. (OpenAI)

Culture: publish your findings

We're still early in the adoption curve. Small, public write-ups ("strict vs loose AGENTS.md", "Read-Only planning vs Auto", "CLI vs Cloud on long refactors") help the community converge on patterns faster—exactly how prior tooling waves matured. The platform is explicitly built to support parallel delegation, approvals, and project-level guidance, which makes it unusually fertile for controlled experiments. (OpenAI Developer)

Always Be Experimenting. Treat Codex like a system to be tuned: constrain, delegate, measure, repeat.

Sources

Codex CLI docs — local agent, install, exec, /model, approval modes. (OpenAI Developer)
Codex Cloud docs — background parallel tasks, sandboxed per-task containers, delegate from web/IDE/iOS/GitHub. (OpenAI Developer)
AGENTS.md — official repo: "README for agents" to guide coding agents. (GitHub)
Introducing upgrades to Codex (OpenAI) — GPT-5-Codex capabilities, adherence to AGENTS.md, approvals simplification, compaction, code review. (OpenAI)

A.B.E — Always Be Experimenting (Codex CLI) ​

Why experimentation matters (now) ​

A.B.E playbook — concrete experiments you can run this week ​

1) Constraint vs. freedom: tune your AGENTS.md ​

2) Plan first, then execute: approvals as a safety dial ​

3) Model & reasoning sweeps ​

Minimal experiment harness (repeatable) ​

Culture: publish your findings ​

Sources ​