Skip to content

Agent Engineering with Codex CLI

In the evolution of working with large language models, we've moved through three clear phases:

  1. Prompt Engineering – crafting precise, one-off prompts.
  2. Context Engineering – structuring context through files like AGENTS.md to improve reliability.
  3. Agent Engineering – designing specialized, reusable, efficient agents within Codex CLI and Codex Cloud.

Each step has expanded what developers can share: from individual prompts → to project context setups → to portable agent definitions that work across projects. OpenAI's Codex CLI documentation explicitly supports this shift by encouraging AGENTS.md-based rules and YAML-style agent configs.


Understanding the Agent Ecosystem

AspectCustom Agents (Codex CLI)Split-Role ApproachesTask/Manual Tools
ActivationAutomatic delegationManual orchestrationManual invocation
System PromptIsolated / customInherits globalInherits global
PortabilityHigh (file-based)LowLow
ConfigurationYAML frontmatter + promptTask descriptionCommand sequence
Tool SelectionExplicit, role-specificShared toolsetShared toolset
Best ForReusable domain expertsMulti-perspective runsOne-off operations

Custom agents provide automatic activation, isolated contexts, and precise tool scoping. This eliminates the token bloat and manual overhead of older approaches.


Custom Agent Design Principles

One of the biggest challenges in Codex CLI is token efficiency. Every agent has an initialization overhead based on model, tool count, and system prompt size. Codex Cloud docs note that heavier setups can increase cost and latency, so optimization is critical.

Agent "Weight Classes"

  • Lightweight (<3k tokens): Fast, cheap, highly composable. Great for commit message bots, quick linting, or doc formatting.
  • Medium (10–15k tokens): Balanced performance for QA, performance review, or UX analysis.
  • Heavy (25k+ tokens): Costly, slower startup, best for deep research or cross-repo analysis.

Like CPU architectures, the big.LITTLE strategy applies: combine heavy agents sparingly with multiple lightweight agents for throughput.


Model Selection Strategy

Codex CLI supports model specification per agent (gpt-5-codex, gpt-4.5-codex, etc.). Official docs show you can set models at the config level or override at runtime.

Conventional pairings:

  • Light agent + GPT-4.5 Codex: cost-efficient automation.
  • Medium agent + GPT-5 Codex (Sonnet equivalent): balanced reasoning.
  • Heavy agent + GPT-5 Codex (Opus equivalent): deep analysis.

Experimental strategies:

  • Try GPT-5 Codex on lightweight agents for surprising insights at low token use.
  • Run GPT-4.5 on heavier configs for efficiency benchmarking.

The frontier is open—benchmark and share results.


When to Use Custom Agents

Best applications include:

  • Code Review: security audits, performance checks, maintainability.
  • Research: API doc summarization, library comparison.
  • QA: automated test strategy suggestions, edge case identification.
  • Docs: SEO-aware tech writing, accessibility review.
  • Design: UX critiques, layout consistency audits.

Because agents are file-based (.codex/agents/*.md), they are:

  • Cross-project portable (drop them into new repos).
  • Team-shareable (commit to VCS, distribute internally).
  • Community-ready (easily publish recipes, similar to how r/ClaudeAI shares agent configs).

Configuration Example

A sample .codex/agents/security-reviewer.md:

yaml
---
name: security-reviewer
description: Proactively review code for security risks after each commit.
tools: Read, Grep, Glob
model: gpt-5-codex
---
You are a senior security engineer.
- Focus on authentication, validation, and secret handling.
- Summarize risks, categorize by severity.
- Suggest minimal, safe fixes.

Codex will auto-route tasks to this agent when your request matches its description.


Design Best Practices

  • Token-first design: minimize initialization size for frequent agents.
  • Separation of concerns: keep roles specific (e.g., don't mix UX and security).
  • Examples help: LLMs respond well to included examples of good/bad output.
  • Nickname agents: use short handles (S1, UX1) for manual invocation efficiency.
  • Benchmark chainability: test how multiple agents compose in a pipeline.

Community & Collaboration

Like prompts and config files before, custom agents are shareable assets. The OpenAI ecosystem (forums, GitHub repos, CodexLog) is beginning to collect and refine agent libraries. The more we document initialization costs, performance trade-offs, and surprising model/agent synergies, the faster this field evolves.

Agent Engineering isn't just a workflow improvement—it's a community movement toward portable AI expertise.


Sources