10 Lessons for Writing a Good AGENTS.md: Get Codex and Claude Code to Understand Your Project
One markdown file plus a famous name, 162K stars (andrej-karpathy-skills). I stared at that number for a while. The name did a lot of the lifting. The same file from an unknown account would mostly
Written by
Voxyz AI

One markdown file plus a famous name, 162K stars (andrej-karpathy-skills). I stared at that number for a while. The name did a lot of the lifting. The same file from an unknown account would mostly get ignored. But it caught fire because it hit a real need: everyone's handing their code to AI now, and the thing that trips everyone up is the same. The model can write. It just doesn't know your project's rules. This file was the first to turn "how do I keep the agent in line" into something you can copy. And the slice that caught on is narrow. It's about behavior, how the model should act while it codes: ask when unsure, make the smallest change, don't refactor for fun. This isn't a post about AGENTS.md formatting. It's 10 lessons from actually shipping with these tools: which counterintuitive moves work better, and which traps you only need to hit once. I run both Codex and Claude Code, and everything below works for both.
- Shorter Is Better. 200 Lines Is the Ceiling. You'd think more information means the tool understands you better. The opposite is true: the more you write, the easier it is for the tool to miss the few lines that actually matter. The entry file loads in full at the start of every session. Every wasted line there pushes out something the tool actually needs. Two numbers worth knowing. Codex has a project_doc_max_bytes limit, 32 KiB by default, but it's not that one oversized file gets truncated. Codex concatenates from global to project root down to your current directory, and once the combined size hits the cap it stops adding more. Write too much up front and the rules closest to the task can get squeezed out of context. Claude Code's official guidance is to target under 200 lines, and CLAUDE.md loads in full no matter how long it is, so longer means worse adherence. Working rule: keep the root file between 100 and 200 lines. Past that, split it into docs/ or subdirectories.
An engineer who's never seen your project should be able to answer, within 30 seconds of reading it: what is this, how do I run it, where does code go, how do I verify a change?
- What Not to Pull In Matters as Much as What to Use You list your stack and assume the tool won't go rogue. But it doesn't know your project's baggage. It'll helpfully reach for the "best" option it knows, and that option may collide with your migrations and conventions. Codex and Claude Code both run commands, edit files, and ask for permissions on their own. In Codex's default Auto preset, local work is usually workspace-write plus on-request: reads and writes inside the workspace and routine commands run freely, but writing outside it or hitting the network asks first. A file that only lists your stack won't stop any of that. So the do-not list has two layers: what to never introduce, and what it isn't allowed to decide alone.
A do-not list isn't a mood. It's a compressed record of decisions. With a Reason and a Revisit, the tool knows why a rule exists and when it can loosen. The Stop-and-ask column matters more. It isn't a ban. It says: this call isn't yours to make alone.
- Write Rules the Tool Can Actually Check "Write clean code" sounds like a good rule. To the tool it says nothing. The tool can't read "clean," "simple," or "performant." It can read "use named exports," "components under 200 lines," "async/await instead of .then() chains."
Codex and Claude Code both run commands to check their own work, so "what done means" has to be written down too:
Quick test: after reading a rule, can you judge in five seconds whether a piece of code follows it? If yes, the rule's good. If not, rewrite it.
- Write Down How It Should Behave, Not Just What the Project Is You assume the thing to write down is project knowledge. But where the tool actually goes off the rails is its behavior, not its knowledge. It doesn't ask when it's unsure; it picks one reading and barrels ahead. It doesn't stop when it should; it "improves" the code next door while it's at it. That 162K repo from the intro took off for exactly this layer. It describes no specific project. It just turns a few of Karpathy's observed failure modes into behavior rules. Worth stealing into your own file.
Hand it a vague task and watch the first move. If it restates the goal or asks a question instead of charging in, this layer is doing its job.
- AGENTS.md Is a Router, Not a Library The temptation is to cram every architecture doc into this one file. But its job isn't storage. It points the tool to where the information actually lives. A regular user's entry file is a knowledge dump. A power user's is a router.
I added a PLANS.md there, and it's one of the highest-leverage moves. AGENTS.md doesn't hold the plan itself. It carries one line: for anything complex, go write a plan in .agent/PLANS.md, split it into phases, wait for my sign-off. The template lives in the repo. OpenAI's cookbook has a whole piece on using PLANS.md to carry multi-hour, multi-step work.
A goal is just the top-level objective you hand it. Paired with the phases laid out in PLANS.md, that's what lets it carry a task that runs for hours. My setup is simple: I run these in an isolated worktree (a worktree is just a separate checkout of your code for this one task, so a blowup doesn't touch your main branch), drop a clear goal before bed, and check a string of commits and verification notes in the morning. The longest single run I've had was 36 hours, where it took a full architecture problem from start to finish, and it came out decent. I've seen people run 6-day ones; I just haven't hit a task that needs that long. The premise comes first: tests run, the sandbox caps what it can touch, every phase is reversible, and there are no production credentials or prod write access on the machine. Without that, it isn't automation. It's locking an intern on the production box overnight. It can run that long only because the goal is written tight and the phases are cut fine. The template I reach for looks like this:
Those four prompts are just lesson 4's behavior rules pressed into a single task. One more thing: don't write a big goal as one big plan. Break it into stages, one plan per phase, and run each plan through an adversarial pass to confirm it's coherent and buildable before you let it go. Then even after an overnight run, what you wake up to is a clean string of commits, not a pile to roll back. The pointer mechanics differ slightly between the two. Codex loads referenced docs on demand, only when it needs them. Claude Code's import pulls the whole file in at launch, so don't hang big docs off it; use skills or path-scoped .claude/rules/ instead. If the root file shows no big blocks of doc text, only "when you need X, go read Y," you've got it right.
- Give Sensitive Directories Their Own Local File That goal from the last lesson, the one you can leave running all night? You don't let go of it because you trust the model. You let go because of the few layers of guardrail that start here. Some modules carry ten times the risk of the rest. Give them their own file. Both tools walk from the project root down to your current directory. The closer a file is to the task, the more it counts. But the two handle "priority" differently. Codex is closer to an override: one file per level, the nearer one beats the farther one, and in the same directory AGENTS.override.md beats AGENTS.md. Claude Code is more like concatenation: every CLAUDE.md gets stitched into context in order, the later ones carry more weight, and if rules contradict, the model can still waver. Subdirectory CLAUDE.md loads on demand, and you can scope it with paths in .claude/rules/. Drop a local file in each high-risk directory and you've put a railing around the danger zone.
A subdirectory file should carry only that directory's local risks, not a copy of the root.
- Let the File State Intent; Let hooks / sandbox / rules Enforce It You write a red line into the file and assume the tool holds it. It won't always remember. Don't keep red lines in the file alone. Anthropic says it plainly: to actually block an action regardless of what the model decides, use a PreToolUse hook; writing it into CLAUDE.md doesn't count. The two tools line up roughly, but the hardness differs.
Below are two concrete Codex configs. Copy them if they help, skip them if not. The point stands either way. These layers enforce the red lines. Don't count on the model to remember them. Codex hooks load from ~/.codex/hooks.json, /.codex/hooks.json, or the [hooks] table in config.toml, and support lifecycle events like PreToolUse, PostToolUse, and Stop.
Codex also has rules, which control which commands can run outside the sandbox, with allow, prompt, and forbidden as the actions. A good replacement for "hope the model remembers not to run the dangerous thing." One heads-up: rules are still marked experimental, so don't sell them as a forever-stable standard, but they're the best place right now for command-level policy.
On the Claude Code side, the PreToolUse hook plus permissions.deny and sandbox.enabled in managed settings are the harder enforcement layer. On Codex, keep one thing in mind: the PreToolUse hook can intercept Bash, apply_patch, and MCP calls, which is useful, but OpenAI itself calls it a guardrail, not a complete enforcement boundary, and not every command gets caught. The real hard edges still come from the sandbox, permission profiles, rules, CI, an isolated worktree, and withholding production credentials. What you put in the file is a "please remember." Intercept what you can with hooks, govern what you can with rules, isolate what you can with the sandbox, and stop trusting the model to comply. The more dangerous and irreversible the action, the further down these layers it belongs.
- Long-Term Memory Has to Be Auditable. Don't Leave It All to the Tool. Every new session, the tool meets your project fresh, like it has amnesia. You don't need a vector database for this, and you shouldn't hand memory entirely to the tool's built-in system either. The two tools' auto-memory sits in opposite states. Codex's Memories is off by default and not yet available in some regions. Claude Code's auto memory is on by default (since v2.1.59), with Claude writing what it learns into MEMORY.md, the first 200 lines or 25KB of which load every session. But both vendors flag the same thing: mandatory team rules belong in the file, in Git, and auto-memory is only a backup.
Set one bar for what's worth saving, and most of the junk never lands:
If your memory is auditable, deletable, and shows up in a git diff, it's healthy. Otherwise long-term memory slowly turns into long-term pollution.
- Keep the Three Kinds of Files Apart Mix personal preferences, team conventions, and machine permissions into one file to save effort, and what you're really building is a drawer nobody dares clean out.
Personal preferences go global, shared across every project:
The file in the project skips personal taste and carries only this repo's conventions:
- One Source of Truth, Feeding Both Tools You use both Codex and Claude Code, so the natural move is to write a file for each. But two files will drift, and in two months nobody can say which one is right. Anthropic is explicit: Claude Code reads CLAUDE.md, not AGENTS.md. The fix is simple. Make AGENTS.md the single source of truth and let CLAUDE.md hold one line, an import:
Claude Code pulls the whole imported file in at launch, then appends its own lines. If you don't need Claude-specific content, a symlink works too:
Don't maintain two full sets of rules. AGENTS.md is the source, CLAUDE.md keeps just the import plus the rare Claude-only addition. Write them separately and in two months they won't match.
A Skeleton You Can Copy Straight In Don't want to think it through from scratch? Paste this into AGENTS.md and adjust. Ten lessons, compressed into one file:
Don't let this file carry the dangerous stuff alone, back it with hooks / rules / sandbox. On Claude Code, add a CLAUDE.md whose single line imports it, plus a few Claude-only notes. One source, both tools read it.
Where to Start Run /init for a first draft: Codex produces AGENTS.md, Claude Code produces CLAUDE.md (and reads an existing AGENTS.md). Cut the root file under 200 lines / 32 KiB, and move big docs into docs/ and PLANS.md. Add Do NOT introduce and Stop and ask, and hand dangerous actions to rules / hooks / sandbox instead of just writing them down. Make AGENTS.md the source of truth and import it into CLAUDE.md with a one-line import. Don't maintain two. To Close The entry file isn't write-once-and-forget. It should grow like a test suite, every time the tool gets something wrong. Each time the tool repeats a mistake, turn that mistake into a more specific rule. Each process you have to explain by hand, turn into a doc pointer, a hook, a rule, or a test command. The entry file isn't the agent's knowledge base. It's the agent's working contract. It answers four questions for you: where am I and how does the code run, how should I act when I'm unsure, how do I prove I'm done, and which calls aren't mine to make? AGENTS.md and CLAUDE.md answer the first three. The fourth goes to config, rules, hooks, sandbox, and CI.
In a month, Codex and Claude Code won't have gotten smarter. You'll just have turned your project's implicit knowledge into something they read, run, and verify before every job. If this helped: → Repost it to someone whose AGENTS.md is already too stuffed to touch → Bookmark the skeleton above and copy it next time you write an AGENTS.md Everything I'm writing as I build: voxyz.ai/insights. References andrej-karpathy-skills (the 162K-star CLAUDE.md repo from the intro): github.com/multica-ai/andrej-karpathy-skills AGENTS.md (Codex official guide): developers.openai.com/codex/guides/agents-md Codex Best Practices: developers.openai.com/codex/learn/best-practices Codex Hooks: developers.openai.com/codex/hooks Codex Rules: developers.openai.com/codex/rules Codex Agent Approvals & Security (sandbox / approval): developers.openai.com/codex/agent-approvals-security Codex Memories: developers.openai.com/codex/memories PLANS.md (OpenAI Cookbook): developers.openai.com/cookbook/articles/codex_exec_plans Claude Code Memory (CLAUDE.md / AGENTS.md import / auto memory): code.claude.com/docs/en/memory

Next step
If you want to build your own system from this article, choose the next step that matches what you need right now.
Related insights
I tried letting my scheduled agents deliver only HTML, and I'm not going back
A couple weeks ago Thariq published "Using Claude Code: The Unreasonable Effectiveness of HTML," and it hit 12.6M views. His argument: Markdown has become the bottleneck for agent output, and he's
Read nextAn AI That Confidently Quotes the Wrong Note Is Scarier Than One That Admits It's Lost
I came across a tweet from Garry: he ran ZeroEntropy against his own 120k markdown gbrain, switching the embedding / reranker path over for a head-to-head. ZE won 11/20 queries, faster and cheaper. I
Read nextHow to Quickly Recreate Any Website For Your Own Product
This morning I scrolled into DilumSanjaya's post. 1M views, 10K bookmarks. I built a version for my own product. Here's the full method and opensource repo. His original is a cell anatomy
Read next