Skillify

Summary

A practice introduced by Garry Tan where every AI agent failure is transformed into a permanent, tested skill. The core idea: “Every failure becomes a skill. Every skill has tests. Every eval runs daily.” This creates structural prevention of recurring errors.

Key Points

Definition: Turning ad-hoc fixes into durable, tested infrastructure
Trigger: When an agent makes a mistake that shouldn’t happen again
Process: 10-step checklist from failure to permanent skill
Outcome: Bugs become structurally impossible to repeat
Philosophy: Agent’s judgment improves permanently, not just for current session
Verb usage: “Skillify it” - one command to make a prototype permanent
Contrast with typical AI: Normal AI apologizes, promises to do better, then repeats the error weeks later

The 10-Step Checklist

SKILL.md — The contract (name, triggers, rules)
Deterministic code — scripts/*.mjs (no LLM for what code can do)
Unit tests — vitest for deterministic functions
Integration tests — Live endpoints and real data
LLM evals — Quality + correctness with LLM-as-judge
Resolver trigger — Entry in AGENTS.md routing table
Resolver eval — Verify the trigger actually routes correctly
Check-resolvable + DRY audit — Find unreachable skills and duplicates
E2E smoke test — Full pipeline verification
Brain filing rules — Knowledge base organization standards

Examples from Garry’s Practice

Example 1: Calendar Recall

Failure: Agent searched live APIs for 10-year-old trip instead of local knowledge base
Skill: calendar-recall with rule: “Historical events go through local knowledge base first”
Script: calendar-recall.mjs (sub-millisecond grep vs minutes of API calls)
Result: Old failure path becomes structurally unreachable

Example 2: Timezone Math

Failure: Agent miscalculated UTC→PT conversion by 1 hour
Skill: context-now with rule: “ALWAYS run context-now.mjs before time-sensitive claims”
Script: context-now.mjs (50ms precise calculation vs mental math)
Result: Deterministic tool constrains latent space

Impact

Before: Agent apologizes, error recurs weeks later
After: Error becomes structurally impossible
Scale: Garry has 179 unit tests across 5 suites, 35+ daily LLM evals
Adoption: “Skillify it” became a verb in daily workflow

Open Questions

How to balance skill creation overhead vs error prevention value?
When does skill proliferation become maintenance burden?
Can skills become too rigid for novel situations?

Evidence Timeline

2026-04-22: Introduced in “How to really stop your agents from making the same mistakes” article with 10-step checklist and examples

My Brain Wiki

探索