Harness Engineering

Summary

Harness engineering is the practice of building systematic controls around AI coding agents — everything in an agent setup except the model itself. It uses feedforward controls (guides) to steer behavior before code generation, and feedback controls (sensors) to monitor and correct after generation.

Key Points

Agent = Model + Harness — harness is everything except the model itself
Guides (Feedforward): CLAUDE.md, prompts, architecture docs, coding standards — steer the agent before it acts
Sensors (Feedback): Tests, linters, type checkers, code review agents — validate after action
Both are needed: Feedback-only → repeated errors; feedforward-only → no validation
Computational vs Inferential: Deterministic checks (tests/lints) are fast and reliable; AI-based checks (code review agents) are richer but probabilistic
Three dimensions: Maintainability (code quality), Architecture Fitness (system properties), Behaviour (correctness) — behaviour is hardest
Timing matters: Shift checks left (pre-commit > pre-integration > pipeline > monitoring)
Harnessability: Strong types, clear module boundaries, framework abstractions make codebases more agent-tractable
Ashby’s Law applied: Constrain the solution space (e.g., predefined service topology) to make comprehensive harnesses feasible

Engineering Hierarchy (Chaofa Yuan)

Prompt Engineering → Context Engineering → Harness Engineering 是递进扩展关系：

Prompt Engineering：聚焦指令措辞
Context Engineering：管理整个输入窗口（what goes into the context）
Harness Engineering：控制执行环境和系统约束（everything outside model weights）

Transient vs Persistent Harness

并非所有 harness 设计都具有同等寿命：

Transient：补偿当前模型局限的设计（如强制自验证、推理三明治），模型进步后可能过时
Persistent：物理约束驱动的架构决策（持久存储、沙箱、版本控制），与模型能力无关

Harness-Model Co-evolution

Harness 执行轨迹成为训练数据 → 模型改进 → harness 可简化 → 新轨迹 → 持续共同演化。这与 Anthropic 观察到的”模型越强，harness 越简”一致（Opus 4.6 移除了 sprint 分解）。

Practical Implications

This directly relates to how we set up this knowledge base:

CLAUDE.md = a guide (feedforward control)
The maintain/lint workflow = a sensor (feedback control)
The schema constrains the solution space (Ashby’s Law)

Open Questions

How to measure harness coverage and quality?
How to resolve conflicts between contradictory guidance signals?
What does a good behaviour harness look like beyond tests?
Can automated harness search (Meta-Harness) work for open-ended tasks where evaluation is harder?

Evidence Timeline

2026-04-07: Compiled from Birgitta Böckeler’s article on martinfowler.com (published 2026-04-02)
2026-04-07: Added Chaofa Yuan’s engineering hierarchy (prompt → context → harness), transient vs persistent harness distinction, harness-model co-evolution
2026-06-08: Split case studies, frameworks, and sandbox sections into harness-engineering-case-studies

My Brain Wiki

探索