Appearance
Agent Engineering with Codex CLI
In the evolution of working with large language models, we've moved through three clear phases:
- Prompt Engineering – crafting precise, one-off prompts.
- Context Engineering – structuring context through files like
AGENTS.md
to improve reliability. - Agent Engineering – designing specialized, reusable, efficient agents within Codex CLI and Codex Cloud.
Each step has expanded what developers can share: from individual prompts → to project context setups → to portable agent definitions that work across projects. OpenAI's Codex CLI documentation explicitly supports this shift by encouraging AGENTS.md
-based rules and YAML-style agent configs.
Understanding the Agent Ecosystem
Aspect | Custom Agents (Codex CLI) | Split-Role Approaches | Task/Manual Tools |
---|---|---|---|
Activation | Automatic delegation | Manual orchestration | Manual invocation |
System Prompt | Isolated / custom | Inherits global | Inherits global |
Portability | High (file-based) | Low | Low |
Configuration | YAML frontmatter + prompt | Task description | Command sequence |
Tool Selection | Explicit, role-specific | Shared toolset | Shared toolset |
Best For | Reusable domain experts | Multi-perspective runs | One-off operations |
Custom agents provide automatic activation, isolated contexts, and precise tool scoping. This eliminates the token bloat and manual overhead of older approaches.
Custom Agent Design Principles
One of the biggest challenges in Codex CLI is token efficiency. Every agent has an initialization overhead based on model, tool count, and system prompt size. Codex Cloud docs note that heavier setups can increase cost and latency, so optimization is critical.
Agent "Weight Classes"
- Lightweight (<3k tokens): Fast, cheap, highly composable. Great for commit message bots, quick linting, or doc formatting.
- Medium (10–15k tokens): Balanced performance for QA, performance review, or UX analysis.
- Heavy (25k+ tokens): Costly, slower startup, best for deep research or cross-repo analysis.
Like CPU architectures, the big.LITTLE strategy applies: combine heavy agents sparingly with multiple lightweight agents for throughput.
Model Selection Strategy
Codex CLI supports model specification per agent (gpt-5-codex
, gpt-4.5-codex
, etc.). Official docs show you can set models at the config level or override at runtime.
Conventional pairings:
- Light agent + GPT-4.5 Codex: cost-efficient automation.
- Medium agent + GPT-5 Codex (Sonnet equivalent): balanced reasoning.
- Heavy agent + GPT-5 Codex (Opus equivalent): deep analysis.
Experimental strategies:
- Try GPT-5 Codex on lightweight agents for surprising insights at low token use.
- Run GPT-4.5 on heavier configs for efficiency benchmarking.
The frontier is open—benchmark and share results.
When to Use Custom Agents
Best applications include:
- Code Review: security audits, performance checks, maintainability.
- Research: API doc summarization, library comparison.
- QA: automated test strategy suggestions, edge case identification.
- Docs: SEO-aware tech writing, accessibility review.
- Design: UX critiques, layout consistency audits.
Because agents are file-based (.codex/agents/*.md
), they are:
- Cross-project portable (drop them into new repos).
- Team-shareable (commit to VCS, distribute internally).
- Community-ready (easily publish recipes, similar to how r/ClaudeAI shares agent configs).
Configuration Example
A sample .codex/agents/security-reviewer.md
:
yaml
---
name: security-reviewer
description: Proactively review code for security risks after each commit.
tools: Read, Grep, Glob
model: gpt-5-codex
---
You are a senior security engineer.
- Focus on authentication, validation, and secret handling.
- Summarize risks, categorize by severity.
- Suggest minimal, safe fixes.
Codex will auto-route tasks to this agent when your request matches its description.
Design Best Practices
- Token-first design: minimize initialization size for frequent agents.
- Separation of concerns: keep roles specific (e.g., don't mix UX and security).
- Examples help: LLMs respond well to included examples of good/bad output.
- Nickname agents: use short handles (
S1
,UX1
) for manual invocation efficiency. - Benchmark chainability: test how multiple agents compose in a pipeline.
Community & Collaboration
Like prompts and config files before, custom agents are shareable assets. The OpenAI ecosystem (forums, GitHub repos, CodexLog) is beginning to collect and refine agent libraries. The more we document initialization costs, performance trade-offs, and surprising model/agent synergies, the faster this field evolves.
Agent Engineering isn't just a workflow improvement—it's a community movement toward portable AI expertise.
Sources
- OpenAI Codex CLI documentation: Configuration & Approvals
- OpenAI Blog: Introducing Codex (on
AGENTS.md
and scoping) - Codex Cloud documentation: sandboxing, background tasks, performance notes