CodexLog

Experiments, insights & mechanics about OpenAI Codex by Kevin — builder of Manus AI outbound sites and ops engineer exploring Dev/Fin/ML/LLMOps.

Latest Post The /approvals workflow in vX.Y.Z is the quiet superpower. Move between Auto, Read Only, and Full Access to keep Codex fast when it's safe—and deliberate when it matters. Cut risk, keep velocity. Learn more about Approval-Driven Coding.

What is Codex?

Codex is OpenAI's agentic coding companion designed to work where developers actually live: the terminal, the editor, and the repo. It reads context, plans tasks, executes commands, and proposes multi-file edits you can review and gate. The goal is not "chat about code," but ship code with repeatable, auditable steps.

For the freshest model and CLI notes, see the official OpenAI docs. CodexLog focuses on the fieldcraft—what works in real projects, why it works, and how to reproduce it.

What is Codex CLI?

Codex CLI is the thin layer that binds your shell, git hygiene, and project context. It turns natural-language goals into concrete actions inside your workspace, so you set direction while Codex handles the mechanics.

Capabilities at a glance

Terminal-native control: Run inside your repo; no window-switching rituals.
Multi-file edits: Coherent refactors across modules with reasoning you can inspect.
Approval gates: /approvals to flip between Auto, Read Only, and Full Access.
Context discipline: Inspect what Codex sees; prune or pin files for clean prompts.
Git-first workflow: Branch, commit, test, and open PRs—auditable end-to-end.
Task memory: Persist goals and constraints so follow-ups compound, not repeat.
Execution hooks: Let Codex run scripts/tests and report artifacts back to you.

The Turning Point

The first time I pointed Codex at a stubborn VitePress theme, it felt like cheating. It generated a plan with checkboxes, asked before touching risky files, and even suggested a rollback strategy. Then CI failed. A tiny CSS variable leaked from a custom :root scope and broke dark mode in nested pages.

I wasn't disappointed—I was intrigued. If Codex could be steered like a junior teammate, then the fix was not more tokens; the fix was better operations. I rewrote my request as a procedure: task context, rules, numbered steps, and examples. On the next run, Codex isolated the theme variables, added tests for the color map, and the pipeline went green.

That moment set the tone for CodexLog. We don't worship prompts. We operationalize them—turning intent into procedures that survive CI, code review, and the passage of time.

Mechanics & Best Practices

1) Prompt as Procedure

Stop writing essays. Write procedures.

Task Context: What branch, what directories, what constraints.
Rules: Style guide, lint gates, performance budgets, i18n, accessibility.
Steps (numbered): Plan → Implement → Test → Explain diffs → Commit.
Examples: One good before/after is worth 100 adjectives.

Why it works: You're feeding Codex a runnable playbook, not vibes. Output becomes predictable, reviewable, and diff-friendly.

2) Approval-Driven Coding

Default to Read Only while Codex maps the repo and drafts a plan. Flip to Auto for low-risk edits (docs, comments, local scripts). Use Full Access only when tests are green and you're watching the terminal.

Why it works: You preserve velocity without letting an LLM spray changes across the tree. Risk is gated to intent.

3) Context Hygiene

Your context is a budget. Spend it where it changes decisions.

Include entry points, contracts, and failing tests.
Exclude vendored code, massive logs, and generated assets.
Pin critical files so Codex can't lose the thread mid-refactor.

Why it works: Cleaner context reduces hallucinated couplings and keeps Codex focused on surfaces that matter.

4) Diff-First Reviews

Ask Codex to explain diffs like a teammate: what changed, why, and how it affects callers. Reject vague rationales. If a diff can't be justified in two tight paragraphs, it probably needs revision.

Why it works: Narrative around diffs compresses your review time and forces consistent reasoning across files.

5) Test Harness Early, Not Late

Give Codex runnable tests before you ask for sweeping changes. Even one failing test with clear assertions is enough to anchor a refactor.

Why it works: Execution feedback closes the loop. Codex learns faster from npm test than from three more paragraphs of wishful thinking.

6) Branch Discipline

Keep Codex work on scoped branches. Use conventional commits (feat:, fix:, refactor:). Let Codex propose messages but enforce your house style.

Why it works: You retain auditability, make reverts painless, and keep PRs focused for human reviewers.

Enterprise & Team Notes

Compliance & Secrets: Keep Codex in Read Only until secret scanners and pre-commit hooks are verified. Make environment variables explicit; mask anything sensitive.
Least-Privilege Execution: In CI, run Codex with a service account that can read repos, run tests, and push branches—no production keys.
Policy as Code: Treat your rules (lint, license, dependency policy) as first-class context. When Codex proposes changes, those policies enforce the non-negotiables.
Traceability: Require Codex to annotate PRs with a work log: prompt hash, context summary, commands executed, and test results. Your auditors will thank you.
On-call Friendly: When incidents hit, Codex can summarize logs and propose patches, but ship through the same approval gates. Consistency beats cleverness at 3 a.m.

Field Recipes (Copy, Adapt, Ship)

Refactor a Theme Safely

Read Only: Map docs/.vitepress and extract CSS variables.
Plan: Propose variable scopes; list risk areas.
Tests: Add minimal snapshot or visual regression hooks.
Auto: Apply scoping + run tests + explain diffs.
Full Access: Commit with refactor(theme): isolate root vars.

Kill a Flaky Test

Identify nondeterminism (timeouts, real time, random seeds).
Mock clocks/IO, inject fixed seeds.
Rerun test matrix; record pass rate.
Tighten assertions; document failure modes.

Modernize a Build

Inventory plugins, lockfile age, Node/PNPM versions.
Stage upgrades in branches per subsystem (lint → test → build).
Run Codex to rewrite deprecated APIs with examples.
Add "guard tests" to freeze the new behavior.

Common Failure Patterns (and Fixes)

Symptom: Codex edits unrelated files. Fix: Narrow context; pin files; restate scope with explicit allowlist.
Symptom: Vague rationales in PR. Fix: Enforce diff explanations; reject and request concrete before/after.
Symptom: Green locally, red in CI. Fix: Include CI config in context; require Codex to run the same scripts.
Symptom: Endless back-and-forth on style. Fix: Put your style guide and lint rules into the prompt as rules, not hints.

The Journey Begins

CodexLog is a working notebook, not a press release. Every technique here has shipped in real repos or been stress-tested by the community. We'll keep publishing mechanics that favor procedures over prose and evidence over adjectives.

If you're new, start with Prompt as Procedure and Approval-Driven Coding. If you're leading a team, wire Codex into your existing gates—tests, lint, CI—and demand traceability in every PR. When it clicks, it feels like magic. It isn't. It's just good operations, applied to an AI teammate.

Learn more about Approval-Driven Coding • Learn more about Context Hygiene • Learn more about Diff-First Reviews

CodexLog ​

What is Codex? ​

What is Codex CLI? ​

The Turning Point ​

Mechanics & Best Practices ​

1) Prompt as Procedure ​

2) Approval-Driven Coding ​

3) Context Hygiene ​

4) Diff-First Reviews ​

5) Test Harness Early, Not Late ​

6) Branch Discipline ​

Enterprise & Team Notes ​

Field Recipes (Copy, Adapt, Ship) ​

Common Failure Patterns (and Fixes) ​

The Journey Begins ​