What Multi-Agent Means in Practice
A multi-agent system is any architecture where more than one AI model call contributes to a final result. In the simplest case, that's two sequential calls — one to plan, one to execute. In more complex cases, it's a network of specialized agents working in parallel, coordinated by an orchestrating agent.
The reason to use multiple agents rather than one large prompt:
- Context window limits — a single agent can only hold so much in context. Breaking work across agents sidesteps this.
- Specialization — a focused agent with a tight system prompt outperforms a generalist agent on specific subtasks.
- Parallelism — independent subtasks can run simultaneously, reducing total wall-clock time.
- Error isolation — if one step fails, you retry only that step rather than restarting the whole pipeline.
Multi-agent systems are harder to debug, cost more in total tokens, and fail in more ways than single-call patterns. Start with the simplest approach that works. Add agents when you have a concrete reason, not because the architecture sounds impressive.
Tool Use: Giving Agents Real-World Capabilities
Tool use (also called function calling) lets an AI model request actions your code then executes — database queries, API calls, file reads, calculations. The model decides when to use a tool and what arguments to pass; your code runs the tool and returns the result.
Defining Tools
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
const tools: Anthropic.Tool[] = [
{
name: 'get_customer',
description: 'Look up a customer by ID. Returns name, email, and account status.',
input_schema: {
type: 'object',
properties: {
customer_id: {
type: 'string',
description: 'The unique customer identifier (UUID)',
},
},
required: ['customer_id'],
},
},
{
name: 'update_account_status',
description: 'Set a customer account status to active, suspended, or closed.',
input_schema: {
type: 'object',
properties: {
customer_id: { type: 'string' },
status: {
type: 'string',
enum: ['active', 'suspended', 'closed'],
},
},
required: ['customer_id', 'status'],
},
},
];
The Tool Use Loop
The model returns a tool_use block when it wants to call a tool. You execute the call, return the result as a tool_result message, and continue the conversation. Repeat until the model returns a final text response.
async function runAgentLoop(userMessage: string): Promise<string> {
const messages: Anthropic.MessageParam[] = [
{ role: 'user', content: userMessage },
];
while (true) {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
tools,
messages,
});
// Add assistant response to conversation history
messages.push({ role: 'assistant', content: response.content });
// Done — no tool calls, return the text
if (response.stop_reason === 'end_turn') {
const textBlock = response.content.find((b) => b.type === 'text');
return textBlock?.text ?? '';
}
// Execute all tool calls in this response
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== 'tool_use') continue;
const result = await executeTool(block.name, block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: JSON.stringify(result),
});
}
// Add tool results and continue
messages.push({ role: 'user', content: toolResults });
}
}
Executing Tools Safely
async function executeTool(name: string, input: unknown): Promise<unknown> {
switch (name) {
case 'get_customer':
return db.customers.findById((input as { customer_id: string }).customer_id);
case 'update_account_status': {
const { customer_id, status } = input as { customer_id: string; status: string };
return db.customers.updateStatus(customer_id, status);
}
default:
return { error: `Unknown tool: ${name}` };
}
}
The model's tool inputs are structured but not guaranteed to be valid for your system. Validate them before execution — check that IDs exist, enum values are in range, and required fields are present. Return a descriptive error in tool_result content rather than throwing; the model can often recover by trying a different argument.
Orchestrator–Subagent Pattern
In this pattern, one "orchestrator" agent breaks down a task and delegates subtasks to specialized "subagents." Each subagent has its own system prompt focused on a narrow domain.
// Orchestrator decides what to do
async function orchestrate(task: string): Promise<string> {
const plan = await callModel({
system: `You are a task planner. Given a task, break it into subtasks.
Return JSON: { "steps": [{ "agent": "researcher|writer|reviewer", "input": "..." }] }`,
user: task,
});
const steps = JSON.parse(plan).steps;
let context = '';
for (const step of steps) {
const result = await callSubagent(step.agent, step.input, context);
context += `\n\n## ${step.agent} output:\n${result}`;
}
return context;
}
// Subagent has a specialized system prompt
async function callSubagent(
agentType: 'researcher' | 'writer' | 'reviewer',
input: string,
context: string
): Promise<string> {
const systemPrompts = {
researcher: 'You are a precise researcher. Extract and summarize factual information.',
writer: 'You are a technical writer. Write clear, concise prose from provided facts.',
reviewer: 'You are a code reviewer. Identify bugs, security issues, and improvements.',
};
return callModel({
system: systemPrompts[agentType],
user: context ? `Context:\n${context}\n\nTask: ${input}` : input,
});
}
Parallel Agent Execution
When subtasks are independent, run them simultaneously. Promise.all fires all calls at once and waits for all to complete:
async function analyzeCodebase(files: string[]): Promise<Analysis[]> {
// Analyze all files in parallel
const analyses = await Promise.all(
files.map((file) =>
callModel({
system: 'You are a code reviewer. Identify issues, suggest improvements.',
user: `Review this file:\n\`\`\`\n${file}\n\`\`\``,
})
)
);
// Synthesize results in a single final call
const summary = await callModel({
system: 'You are a technical lead. Summarize findings across multiple code reviews.',
user: analyses.map((a, i) => `File ${i + 1}:\n${a}`).join('\n\n'),
});
return { analyses, summary };
}
Be aware of rate limits — firing 50 parallel requests will hit per-minute token limits. Use a concurrency limiter for large batches:
import pLimit from 'p-limit';
const limit = pLimit(5); // max 5 concurrent AI calls
const results = await Promise.all(
items.map((item) => limit(() => callModel({ ... })))
);
Handling Failures in Agent Pipelines
Agent pipelines fail in unique ways: the model misuses a tool, an API call returns an error, or the loop runs longer than expected. Plan for these explicitly.
Loop Termination Guard
An agent loop that calls tools repeatedly should have a hard iteration limit. Without one, a confused model can loop indefinitely:
const MAX_ITERATIONS = 10;
let iterations = 0;
while (true) {
if (++iterations > MAX_ITERATIONS) {
throw new Error(`Agent exceeded ${MAX_ITERATIONS} iterations — aborting`);
}
// ... rest of loop
}
Tool Error Recovery
Return errors as tool_result content rather than throwing exceptions. The model can often recover — it may retry with different arguments, use a different tool, or explain to the user what went wrong:
try {
const result = await executeTool(block.name, block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: JSON.stringify(result),
});
} catch (err) {
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: `Error: ${err instanceof Error ? err.message : 'Unknown error'}`,
is_error: true,
});
}
Timeouts on Individual Steps
async function callWithTimeout<T>(fn: () => Promise<T>, timeoutMs: number): Promise<T> {
const timeout = new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error('Step timed out')), timeoutMs)
);
return Promise.race([fn(), timeout]);
}
// Wrap each subagent call
const result = await callWithTimeout(() => callSubagent('researcher', input), 30_000);
Designing Good Tool Descriptions
The model decides which tools to call based on their name and description. Weak descriptions lead to wrong tool choices or missing required arguments.
| Weak description | Better description |
|---|---|
"Get user data" |
"Look up a user by their UUID. Returns name, email, role, and created_at timestamp." |
"Run query" |
"Execute a read-only SQL SELECT query against the analytics database. Returns up to 1,000 rows as JSON." |
"Send message" |
"Send a Slack message to a channel. Use only for non-urgent notifications, not for DMs or critical alerts." |
Include constraints in tool descriptions: what the tool cannot do, its side effects, and any limits (row caps, rate limits, scoped permissions). The model uses this to make better decisions.
When Not to Use Agents
Multi-agent patterns add cost, latency, and failure modes. Use a single prompt when:
- The task fits in one context window and doesn't require external data.
- The output is simple and structured (classification, extraction, summarization).
- Latency matters — each agent hop adds 1–5 seconds.
- You're still figuring out what the right output looks like — iterate on a single prompt first.
Add tool use when the model needs real-world data it can't infer. Add multiple agents when a single agent's context or focus is genuinely insufficient.
Multi-Agent Checklist
- Tool descriptions — specific, including constraints and side effects. The model's tool selection quality depends on them.
- Input validation — validate tool inputs before execution; return errors as
tool_resultcontent, not exceptions. - Iteration limit — every agent loop needs a hard cap on iterations to prevent runaway execution.
- Parallelism — use
Promise.allfor independent subtasks; add a concurrency limiter for large batches. - Timeouts — set per-step timeouts, especially for external tool calls that can hang.
- Start simple — try a single prompt before adding agents. Add complexity only when you have a concrete reason.
Related Guides
Claude API for Developers
Full API reference: system prompts, streaming, tool use deep-dive, and prompt caching.
Streaming AI Responses
SSE, real-time UI, and streaming tool use — including handling tool calls mid-stream.