Building Effective Agents
Published: 2024-12-19 Authors: Erik Schluntz, Barry Zhang (Anthropic)
Overview
Anthropic shares insights from working with dozens of teams building LLM agents across industries. The key finding: the most successful implementations use simple, composable patterns rather than complex frameworks.
Key Distinction: Workflows vs Agents
- Workflows: LLMs and tools orchestrated through predefined code paths
- Agents: Systems where LLMs dynamically direct their own processes and tool usage
When to Use Agents
Start with the simplest solution possible. Agentic systems trade increased latency and cost for better task performance. Workflows suit well-defined tasks, while agents excel when flexibility and model-driven decision-making are needed at scale.
Six Core Agentic Patterns
1. Augmented LLM
Enhanced with retrieval, tools, and memory capabilities. The basic building block — an LLM with access to external capabilities.
2. Prompt Chaining
Decomposing tasks into sequential steps with programmatic checks between them. Each step processes the output of the previous one. Good for tasks that can be cleanly decomposed into fixed subtasks.
3. Routing
Classifying inputs and directing them to specialized follow-up tasks. A single LLM decides which path to take, then routes to specialized handlers.
4. Parallelization
Running LLM tasks simultaneously through:
- Sectioning: Breaking a task into independent subtasks run in parallel
- Voting: Running the same task multiple times for diverse outputs or consensus
5. Orchestrator-Workers
A central LLM dynamically breaks down tasks and delegates to worker LLMs. Unlike prompt chaining, the subtasks are not predetermined — the orchestrator decides based on the input.
6. Evaluator-Optimizer
One LLM generates responses while another provides evaluation and feedback in a loop. Similar to the GAN pattern. The loop continues until the evaluator is satisfied or a max iteration count is reached.
Agent Implementation
True agents operate autonomously based on environmental feedback loops. They require:
- Clear success criteria
- Feedback loops for course correction
- Meaningful human oversight at appropriate checkpoints
Three Core Principles
- Simplicity — maintain simplicity in agent design
- Transparency — show planning steps explicitly
- Agent-Computer Interface (ACI) — craft thorough tool documentation and testing
Tool Design Best Practices
Tool definitions deserve equal prompt engineering attention as overall prompts. Effective tool formats should:
- Provide sufficient tokens for model reasoning
- Mirror natural language patterns
- Eliminate unnecessary formatting overhead
- Include example usage and clear boundaries
Anthropic spent more optimization time on tools than overall prompts when building their SWE-bench agent.
Practical Applications
Customer Support
Combines chatbot interfaces with tool integration for data retrieval, ticket updates, and refund processing with measurable resolution metrics.
Coding Agents
Leverage automated testing for verification and iteration. Agents now solve GitHub issues in SWE-bench, though human review remains essential.