Summary

Engineer at Anthropic Labs. Authored “Harness Design for Long-Running Application Development” (2026-03-24), demonstrating GAN-inspired generator-evaluator separation for complex frontend tasks.

Key Points

  • Proposed three-agent architecture: Planner → Generator → Evaluator with structured file communication
  • Identified self-evaluation bias as a core problem — agents overpraise their own output
  • Demonstrated GAN-style feedback loop: evaluator biased toward skepticism via iterative prompt tuning
  • Showed cost-quality tradeoff: Retro Game Maker — 200/6h (polished), 20x cost for qualitative leap
  • Key insight: as model capabilities improve (Opus 4.6), harness scaffolding can be simplified — removed sprint decomposition

Open Questions

None currently


Evidence Timeline

  • 2026-04-07: Created from “Harness Design for Long-Running Application Development” article (published 2026-03-24)

相关页面

barry-zhang, erik-schluntz, rosa