Summary

AI agents that autonomously write, modify, and test code. Their effectiveness depends heavily on the “harness” — the controls, guides, and sensors surrounding the model. Key examples: Claude Code, Cursor, GitHub Copilot agent mode.

Key Points

  • Agents need both feedforward (guides like CLAUDE.md, architecture docs) and feedback (tests, linters, code review) controls

  • Harnessability of a codebase matters: strong types, clear modules, framework abstractions make it easier for agents

  • Agents provide computational (deterministic) and inferential (AI-based, probabilistic) execution

  • Human developers still provide “implicit harness” — absorbed conventions, organizational alignment, experienced judgment

  • The role of developers shifts from writing code to engineering the harness and validating high-level decisions

  • OpenAI Codex case study: 3 engineers, 5 months, 1M lines, zero handwritten code — proof point for agent-first development at scale

  • Anthropic multi-agent harness: Generator-Evaluator separation (GAN-inspired) for long-running tasks; self-evaluation bias is a real problem — agents overpaise their own work

  • Context anxiety: Claude Sonnet 4.5 prematurely wraps work as context limits approach — context resets outperform compaction for long tasks

  • Harness simplification: As models improve (Opus 4.6), harness scaffolding can be reduced — sprint decomposition removed when model handles longer coherence

  • Two failure modes for long tasks: Over-ambition (context exhaustion mid-feature) and premature completion (sees progress, declares done)

  • Session continuity via files: progress.txt + feature list JSON + init.sh = structured handoff between agent sessions

  • Meta-Harness: Automated harness search using a coding agent as optimizer — discovered harnesses outperform hand-engineered solutions and transfer across models (see Meta-Harness)

  • Six agentic patterns (Schluntz & Zhang): Augmented LLM → Prompt Chaining → Routing → Parallelization → Orchestrator-Workers → Evaluator-Optimizer — increasing complexity, start simple (see Agentic Patterns)

  • Tool design = prompt engineering: Anthropic spent more time on tool definitions than system prompts for SWE-bench agent

  • Tool use as meta-ability: Coding/scripting ability is an agent’s “meta-ability” — offloading deterministic logic (math, ETL, file ops) to reliable tools rather than relying on neural inference. Even non-coding agents benefit from Bash tools (see Tool Use as Meta-Ability)

Open Questions

  • What’s the right level of autonomy for different types of tasks?
  • How to build reliable behaviour validation beyond test suites?
  • How do coding agents change team structures and skill requirements?

Evidence Timeline

  • 2026-04-10: “Claude Code from Source” book ingested — 18-chapter deep dive into Claude Code’s architecture (6 abstractions, tool execution pipeline, multi-agent orchestration, memory system, permission model). The most comprehensive technical analysis of a production AI coding agent to date.

  • 2026-04-07: Initial compilation from Böckeler’s harness engineering article

  • 2026-04-07: Updated with OpenAI Codex case study data (1M lines, zero handwritten code)

  • 2026-04-07: Updated with Anthropic’s multi-agent harness, context anxiety finding, and harness simplification insight

  • 2026-04-07: Updated with Justin Young’s session continuity patterns — two failure modes and file-based handoff

  • 2026-04-07: Added Meta-Harness — automated harness discovery outperforming manual engineering

  • 2026-04-07: Meta-Harness open-source library available: superagentic-metaharness (filesystem-first harness optimization)

  • 2026-04-07: Added six agentic patterns from Schluntz & Zhang (Anthropic, 2024-12-19) — composable building blocks from augmented LLM to evaluator-optimizer

  • 2026-04-07: Added tool use as meta-ability from rosa’s article — coding/scripting as foundational agent capability, Bash tools for deterministic offloading

  • 2026-04-07: LangChain Terminal Bench 2.0 evidence: same model (GPT-5.2-Codex) jumped Top 30 → Top 5 with harness-only changes (Chaofa Yuan)

  • 2026-04-07: OpenClaw architecture analysis (rosa): 7 core patterns shared by modern agent frameworks — Gateway, Agentic Loop (ReAct), Skills (on-demand loading), MCP (tool portability), Memory (markdown files + SQLite), Heartbeat (proactive cron-triggered behavior)

相关页面

agentic-patterns, meta-harness, tool-use-as-meta-ability, justin-young, msitarzewski