Summary
A practice introduced by Garry Tan where every AI agent failure is transformed into a permanent, tested skill. The core idea: “Every failure becomes a skill. Every skill has tests. Every eval runs daily.” This creates structural prevention of recurring errors.
Key Points
- Definition: Turning ad-hoc fixes into durable, tested infrastructure
- Trigger: When an agent makes a mistake that shouldn’t happen again
- Process: 10-step checklist from failure to permanent skill
- Outcome: Bugs become structurally impossible to repeat
- Philosophy: Agent’s judgment improves permanently, not just for current session
- Verb usage: “Skillify it” - one command to make a prototype permanent
- Contrast with typical AI: Normal AI apologizes, promises to do better, then repeats the error weeks later
The 10-Step Checklist
- SKILL.md — The contract (name, triggers, rules)
- Deterministic code — scripts/*.mjs (no LLM for what code can do)
- Unit tests — vitest for deterministic functions
- Integration tests — Live endpoints and real data
- LLM evals — Quality + correctness with LLM-as-judge
- Resolver trigger — Entry in AGENTS.md routing table
- Resolver eval — Verify the trigger actually routes correctly
- Check-resolvable + DRY audit — Find unreachable skills and duplicates
- E2E smoke test — Full pipeline verification
- Brain filing rules — Knowledge base organization standards
Examples from Garry’s Practice
Example 1: Calendar Recall
- Failure: Agent searched live APIs for 10-year-old trip instead of local knowledge base
- Skill: calendar-recall with rule: “Historical events go through local knowledge base first”
- Script: calendar-recall.mjs (sub-millisecond grep vs minutes of API calls)
- Result: Old failure path becomes structurally unreachable
Example 2: Timezone Math
- Failure: Agent miscalculated UTC→PT conversion by 1 hour
- Skill: context-now with rule: “ALWAYS run context-now.mjs before time-sensitive claims”
- Script: context-now.mjs (50ms precise calculation vs mental math)
- Result: Deterministic tool constrains latent space
Impact
- Before: Agent apologizes, error recurs weeks later
- After: Error becomes structurally impossible
- Scale: Garry has 179 unit tests across 5 suites, 35+ daily LLM evals
- Adoption: “Skillify it” became a verb in daily workflow
Open Questions
- How to balance skill creation overhead vs error prevention value?
- When does skill proliferation become maintenance burden?
- Can skills become too rigid for novel situations?
Evidence Timeline
- 2026-04-22: Introduced in “How to really stop your agents from making the same mistakes” article with 10-step checklist and examples
相关页面
diarization, latent-vs-deterministic, resolver, curator, self-evolving-harness
- compound-engineering — CE 的 compound 步骤与 skillify 的”每次失败转化为永久修复”异曲同工