Stop Using /init for AGENTS.md

作者：Addy Osmani | 2026-02-23

TL;DR: A good mental model is to treat AGENTS.md as a living list of codebase smells you haven’t fixed yet, not a permanent configuration. Auto-generated AGENTS.md files hurt agent performance and inflate costs by 20%+ because they duplicate what agents can already discover. Human-written files help only when they contain non-discoverable information - tooling gotchas, non-obvious conventions, landmines. Every other line is noise.

What the research actually says

Lulla et al. (ICSE JAWs 2026) ran a paired experiment: 124 real GitHub pull requests, with and without AGENTS.md. Found that AGENTS.md reduced median wall-clock runtime by 28.64% and output token consumption by 16.58%.

A separate study from ETH Zurich tested four agents across SWE-bench. Finding: LLM-generated context files reduced task success by 2-3% while increasing cost by over 20%. Developer-written files improved success by about 4% - but also increased cost by up to 19%.

Key insight from ETH Zurich: when they stripped all documentation from repos and then tested with LLM-generated context files, those files improved performance by 2.7%. The auto-generated content isn’t useless. It’s redundant. The agent could find all of it anyway by reading the repo.

The Lulla paper used human-authored AGENTS.md files with real project-specific knowledge. Non-obvious tooling requirements. Actual gotchas. That’s the context that saves the agent time.

The pink elephant problem

If your AGENTS.md mentions tRPC - even as a passing note - the model has tRPC in context for every prompt. If tRPC is only used in legacy endpoints, you’ve biased your agent toward the wrong pattern. LLMs don’t distinguish between “this is what we used to do” and “this is what you should do.”

Research on LLM context shows: more context often degrades performance. Liu et al.’s “Lost in the Middle” (2024) showed LLMs struggle with information in the middle of long contexts. Levy et al. showed longer context degrades task performance even when content is perfectly relevant.

What actually earns a line

When a developer-written context file mentioned uv, agents used it 1.6 times per task. When not mentioned: fewer than 0.01 times. The practical filter: can the agent discover this on its own by reading your code? If yes, delete it.

AGENTS.md should contain:

Use uv for package management
Always run tests with --no-cache or you’ll get false positives from fixture setup
The auth module uses a custom middleware pattern; do not refactor to standard Express middleware
The legacy/ directory is deprecated but imported by three production modules - don’t delete anything

And almost nothing else.

The static file problem

A flat instruction set can’t condition on what kind of task is being run. An agent doing a documentation change faithfully runs the full test suite. Tokens burned, minutes wasted.

The ACE framework (Agentic Context Engineering, ICLR 2026) treats context as an evolving playbook through a generator/reflector/curator pipeline. Outperformed static approaches by 12.3%.

The better architecture: 3-layer AGENTS.md

Layer 1: Protocol file — Not a codebase overview. A routing document. Available personas and when to invoke them. Available skills and task classes. Available MCP connections. Minimum essential repo facts the agent genuinely cannot discover.

Layer 2: Focused persona/skill files — Each loaded selectively based on task type. UX-focused agent loads different context than backend agent. Total context per task stays bounded.

Layer 3: Maintenance subagent — Keeps the protocol file accurate as the codebase evolves. Documentation rots.

Automated optimization

Arize AI’s prompt learning work: instead of manually writing CLAUDE.md, they used an automated optimization loop — run agent on training tasks, evaluate output, generate LLM feedback on why solutions failed, use meta-prompting to refine instructions. Results: +5.19% accuracy on cross-repo split, +10.87% on in-repo split.

What helps a human understand a codebase and what helps an LLM navigate it are often different things. The optimizer figures out the delta.

AGENTS.md as diagnostic tool

Think of AGENTS.md as a living document of friction you haven’t fixed yet. Every line signals something confusing enough to trip an AI agent — probably confusing to new human contributors too. The right response isn’t to grow the context file. It’s to fix the actual problem.

Start nearly empty with one instruction: “If you encounter something surprising or confusing, flag it as a comment.” Fix the underlying issues. Keep the file minimal.

Practical takeaways

Stop running /init. Auto-generated output is redundant with existing documentation.
Before adding any line: can the agent find this by reading the code? If yes, don’t write it.
When an agent struggles repeatedly, treat it as a codebase problem before a context problem.
If running agents at scale in CI/CD, the 15-20% cost overhead from context files compounds across thousands of runs.
Consider building a maintenance agent for keeping context files accurate.
Hold your intuitions about what the agent needs loosely.

Coding agents aren’t new hires. They can grep the entire codebase before you finish typing. What they need isn’t a map. They need to know where the landmines are.

My Brain Wiki

探索

Stop Using /init for AGENTS.md

Stop Using /init for AGENTS.md

What the research actually says

The pink elephant problem

What actually earns a line

The static file problem

The better architecture: 3-layer AGENTS.md

Automated optimization

AGENTS.md as diagnostic tool

Practical takeaways

关系图谱

目录