Claude Agent SDK: Build Tool-Using AI Agents from Scratch
Build tool-using AI agents with Anthropic SDK. Covers tool definitions, multi-tool calls, error handling, and cost optimization in hands-on Python code.
I ran into the Tool Use moment while building a FastAPI streaming backend with the Claude API. The trigger was simple. A user asked “how many days are left in this year?” and Claude answered wrong. Not just wrong, but confidently wrong. I remember thinking, “OK, a chatbot can’t handle this.”
Tool Use fixes that structurally. Instead of the model calculating directly, it calls a calculation function and uses the result to answer. That difference is what separates a chatbot from an agent.
What follows are the Tool Use patterns I validated by directly installing and running anthropic SDK 0.101.0. Basic tool definitions, the agentic loop, error handling, cost. Everything here is based on code I actually ran.
Why Tool Use Is Different from a Chatbot: The Structural Gap
An LLM samples tokens from a probability distribution. Tasks like date arithmetic, precise numerical calculations, or live API lookups are structurally unreliable. The model recreates patterns from training data, not ground truth.
Tool Use addresses this at a different layer. The model decides what to do, and actual execution is delegated to external code. Instead of computing directly, the model emits something like calculate("365 - today.day_of_year"), and Python runs it and returns the result.
# Chatbot: model answers directly
# "Doesn't know today's date, has to compute directly -> can be wrong"
response = client.messages.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "How many days left in this year?"}]
)
# Agent: delegates to a tool
# "Model picks the tool, Python computes accurately"
response = client.messages.create(
model="claude-opus-4-7",
tools=tools, # includes date calculation tool
messages=[{"role": "user", "content": "How many days left in this year?"}]
)
The decisive difference is reliability. Python’s datetime module doesn’t get dates wrong.
Installing anthropic 0.101.0 and Initializing the Client
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install anthropic
Results from running this directly in a temp directory:
anthropic version: 0.101.0
Client instantiated: ✓
Client type: Anthropic
0.101.0 is the latest as of 2026-05-13. This is the official Anthropic SDK. It’s completely different from packages like pyautogen that were common before 2025.
import anthropic
import json
from typing import Any
client = anthropic.Anthropic(api_key="your-api-key") # or set ANTHROPIC_API_KEY env var
The SDK auto-loads the API key from ANTHROPIC_API_KEY. Don’t hard-code it.
Defining Your First Tool: JSON Schema Is All You Need
Tool Use uses a structure similar to OpenAI Function Calling. Each tool has three parts:
name: Tool identifier (like a function name)description: The basis for the model’s decision on when to use this toolinput_schema: JSON Schema for input parameters
tools = [
{
"name": "get_current_date_info",
"description": "Returns current date and time information. Use for questions about 'today', 'now', or anything requiring current date knowledge.",
"input_schema": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "IANA timezone (e.g. America/New_York, Asia/Seoul). Default: UTC"
}
},
"required": []
}
},
{
"name": "calculate",
"description": "Performs mathematical operations. Handles addition, subtraction, multiplication, division, exponentiation, and modulo.",
"input_schema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide", "power", "modulo"],
"description": "The operation to perform"
},
"a": {"type": "number", "description": "First operand"},
"b": {"type": "number", "description": "Second operand"}
},
"required": ["operation", "a", "b"]
}
}
]
The description field matters more than it looks. The model reads only the description to decide whether to use this tool. When I tested with vague descriptions, the model picked the wrong tool or skipped it entirely.
Validated tool definition structure from my sandbox:
Tool: get_current_date_info
Description: Returns current date info
Required params: []
Tool: calculate
Description: Performs math operations
Required params: ['operation', 'a', 'b']
Implementing the Agentic Loop: A Cycle of Calls and Responses

This is the core. Tool Use doesn’t finish in a single API call. When the model calls a tool → we execute it → we feed the result back. This cycle repeats until the model returns end_turn.
def run_agent(user_message: str, tools: list, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for i in range(max_iterations):
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=tools,
messages=messages,
)
# No tool call — return the final answer
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
# Handle tool calls
if response.stop_reason == "tool_use":
# Add the full assistant response to messages (including tool calls)
messages.append({"role": "assistant", "content": response.content})
# Collect all tool results and add together
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = process_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Tool results go under the "user" role (API requirement)
messages.append({"role": "user", "content": tool_results})
return "Max iterations exceeded"
Two things are easy to miss here.
First, add the entire response.content to messages, not just the text block. The model needs to know which tool it called in order to generate its next response correctly.
Second, tool results go under the user role. Counterintuitive, but the API treats tool execution results as coming from the environment (the user side), not the assistant.
Building Real Tools: Calculator, Date, File Reader
The tool execution function is straightforward. It takes a name and input, returns a string:
from datetime import datetime
import pytz
import json
import operator
from typing import Any
# Safe math — uses operator mapping instead of string expression execution
SAFE_OPERATIONS = {
"add": operator.add,
"subtract": operator.sub,
"multiply": operator.mul,
"divide": operator.truediv,
"power": operator.pow,
"modulo": operator.mod,
}
def process_tool_call(tool_name: str, tool_input: dict[str, Any]) -> str:
if tool_name == "get_current_date_info":
tz_str = tool_input.get("timezone", "UTC")
try:
tz = pytz.timezone(tz_str)
now = datetime.now(tz)
day_of_year = now.timetuple().tm_yday
days_remaining = 365 - day_of_year
return json.dumps({
"date": now.strftime("%Y-%m-%d"),
"time": now.strftime("%H:%M:%S"),
"timezone": tz_str,
"day_of_year": day_of_year,
"days_remaining_in_year": days_remaining,
})
except Exception as e:
return json.dumps({"error": str(e)})
elif tool_name == "calculate":
op_name = tool_input.get("operation")
a = tool_input.get("a", 0)
b = tool_input.get("b", 0)
op_func = SAFE_OPERATIONS.get(op_name)
if op_func is None:
return f"Error: Unknown operation: {op_name}"
try:
if op_name == "divide" and b == 0:
return "Error: Cannot divide by zero"
result = op_func(a, b)
return str(result)
except Exception as e:
return f"Error: {e}"
elif tool_name == "read_file":
import os
filepath = tool_input.get("path", "")
# Path traversal prevention: only allow within designated base directory
allowed_base = "/app/data"
abs_path = os.path.realpath(filepath)
if not abs_path.startswith(allowed_base):
return "Error: Path not allowed"
try:
with open(abs_path, "r") as f:
content = f.read(2000) # 2KB limit
return content
except FileNotFoundError:
return f"Error: File not found: {filepath}"
return f"Error: Unknown tool: {tool_name}"
Actual sandbox results:
calculate(multiply, 15, 7) = 105
calculate(add, 105, 3) = 108
calculate(divide, 100, 4) = 25.0
Input validation (required field present): True
Input validation (missing required field): False — Missing required field: location
The error classification strategy from the FastAPI + Claude API streaming guide applies here too. Categorize tool errors as retryable versus non-retryable for better production stability.
Handling Multiple Tool Calls: Can We Run in Parallel?
Claude can call multiple tools simultaneously in a single turn. Ask “compare the weather in Seoul and Tokyo” and it returns two get_weather calls at once.
# When Claude calls multiple tools in one turn
tool_use_blocks = [b for b in response.content if b.type == "tool_use"]
# Technically possible to run in parallel
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=4) as executor:
futures = {
executor.submit(process_tool_call, block.name, block.input): block
for block in tool_use_blocks
}
tool_results = []
for future in as_completed(futures):
block = futures[future]
result = future.result()
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
Sandbox-verified multi-tool results:
{"type": "tool_result", "tool_use_id": "tool_1", "content": "25.0"}
{"type": "tool_result", "tool_use_id": "tool_2", "content": "{\"temp\": 18, \"condition\": \"Sunny\"}"}
I’d only apply parallel execution to idempotent read tools. External API calls with side effects need careful rate-limit and ordering consideration.
Error Handling: Failing Gracefully
When a tool fails, return is_error: true. The model reads this, recognizes the error, and either tries something else or gives the user contextual guidance.
def safe_process_tool_call(tool_name: str, tool_input: dict) -> tuple[str, bool]:
"""Tool execution with error handling. Returns (content, is_error)."""
try:
result = process_tool_call(tool_name, tool_input)
return result, False
except Exception as e:
error_msg = f"Tool execution failed: {type(e).__name__}: {str(e)}"
return error_msg, True
for block in response.content:
if block.type == "tool_use":
content, is_error = safe_process_tool_call(block.name, block.input)
tool_result = {
"type": "tool_result",
"tool_use_id": block.id,
"content": content,
}
if is_error:
tool_result["is_error"] = True
tool_results.append(tool_result)
When is_error: true is set, the model doesn’t just skip past it. From my testing, it reads the error content and responds with something like “The file couldn’t be found, so please double-check the path.” Returning empty strings or ignoring errors tends to produce confused or hallucinated responses.
The Real Cost of Tool Use: How Many Tokens Does It Add?
Honestly, Tool Use costs more. According to Anthropic’s documentation, each tool definition adds roughly 200–300 tokens of overhead.
5 tool definitions → ~1,250 tokens fixed overhead (every request)
1 tool call → additional input + output tokens
3-turn agentic loop → accumulating context
The agentic loop accumulates context. After 5 turns, everything from the first message to the fifth tool result is in context. Costs can compound quickly in long-running agents.
Two ways to manage this:
1. Combine with Prompt Caching: Tool definitions are the same on every request. As covered in the Claude API Prompt Caching guide, caching the system prompt with cache_control: {"type": "ephemeral"} applies here too, and tool definitions benefit similarly from repeated identical structures.
2. Pass only the tools you need: Always including 10 tool definitions is worse than passing the 2–3 that matter for the current task. More tools consume more tokens and occasionally lead the model to pick the wrong one.
Streaming Tool Use
Tool Use works with streaming responses. In anthropic 0.101.0, use client.messages.stream:
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=4096,
tools=tools,
messages=messages,
) as stream:
# Stream text chunks in real time
for text_chunk in stream.text_stream:
print(text_chunk, end="", flush=True)
# Get the final message after streaming completes
final_message = stream.get_final_message()
if final_message.stop_reason == "tool_use":
# ... same handling as above
When streaming with tool use: if you’re showing text chunks to the user in real time and also need to process tool calls, design the UX flow before you start. The Vercel AI SDK approach is worth looking at to see how this gets abstracted on the frontend side.
Production Pattern: GitHub Issue Monitor Agent
A complete example that ties everything together. This is a simple agent that fetches and summarizes GitHub issues:
import anthropic
import json
from typing import Any
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
tools = [
{
"name": "list_github_issues",
"description": "Fetches the issue list for a GitHub repository.",
"input_schema": {
"type": "object",
"properties": {
"repo": {"type": "string", "description": "owner/repo format"},
"state": {"type": "string", "enum": ["open", "closed", "all"]},
"limit": {"type": "integer", "description": "Max issues to return (default: 10)"}
},
"required": ["repo"]
}
},
{
"name": "get_issue_detail",
"description": "Fetches the details of a specific GitHub issue.",
"input_schema": {
"type": "object",
"properties": {
"repo": {"type": "string", "description": "owner/repo format"},
"issue_number": {"type": "integer", "description": "Issue number"}
},
"required": ["repo", "issue_number"]
}
}
]
def process_tool_call(tool_name: str, tool_input: dict[str, Any]) -> str:
if tool_name == "list_github_issues":
# Real impl: requests.get(f"https://api.github.com/repos/{repo}/issues", ...)
return json.dumps([
{"number": 42, "title": "TypeError in data processor", "state": "open"},
{"number": 41, "title": "Add streaming support", "state": "open"},
])
elif tool_name == "get_issue_detail":
return json.dumps({
"number": tool_input["issue_number"],
"body": "Reproduce: pass an empty list as input. Stack trace attached.",
"comments": 3
})
return "Unknown tool"
def run_issue_agent(query: str) -> str:
messages = [{"role": "user", "content": query}]
for _ in range(10):
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return next(
(block.text for block in response.content if hasattr(block, "text")),
"No response"
)
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = process_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
return "Loop limit exceeded"
What’s Still Unresolved: Honest Limitations
Here’s what I find genuinely frustrating about Tool Use in practice.
Context accumulation: The agentic loop keeps growing the context. After 10 turns, everything from the first message to the tenth tool result is in there. Long-running agents need a context management strategy. You summarize intermediate results or prune stale messages by hand, and there’s no standard pattern for this yet.
Non-deterministic tool selection: Same question, different tool selection on different runs. Even with temperature=0, you can’t guarantee identical behavior across invocations. This makes testing harder than it should be.
Description quality is everything: Vague description → wrong tool selection or no tool use at all. Writing good tool descriptions is its own prompt engineering discipline. No framework solves this for you.
I think Tool Use is underappreciated. Agent frameworks offer impressive abstractions, but this pattern is what’s running underneath all of them. PydanticAI’s type-safe tool definitions are a convenient layer that auto-generates the JSON schema, but understanding the underlying mechanism is what gets you unstuck when things break.
When to Use Tool Use, and When to Avoid It
Here’s the decision rule I landed on after building with it. You don’t need to bolt a tool onto every chatbot call.
When Tool Use fits:
- Accuracy matters more than fluency. Date math, currency conversion, numeric computation — anything that must not be wrong — should be delegated to a function rather than generated by the model.
- You need real-time data the model doesn’t have. Information past the training cutoff, internal databases, external API responses: tools are the only way in.
- You’re executing actions with side effects. Writing files, sending email, creating tickets. The model decides “what” while verified code handles execution. That separation is what keeps it safe.
- You need multi-step results stitched together. Fetch an issue list, read the detail, summarize it. The agentic loop handles this kind of chained work naturally.
When to avoid Tool Use:
- Simple Q&A the model already knows. Adding a tool to “how do I sort a list in Python” just inflates token overhead for nothing.
- Latency-sensitive real-time UX. The agentic loop adds a round trip per tool call. If a single response has to be fast, cap the loop iterations hard or skip tools entirely.
- High-volume batches under a tight cost ceiling. The ~250-token fixed overhead per tool definition and the accumulating context multiply across every call. For millions of records, a single tool-free call can be cheaper.
- Pipelines that require determinism. Tool selection itself is non-deterministic. If your workflow needs the exact same call sequence every time, rule-based code beats it.
The test is simple. Ask yourself whether the model would get it wrong by answering directly, or whether it needs to fetch something it doesn’t know. Either one points to Tool Use; otherwise a plain call is fine. The point where you graduate to heavier multi-agent orchestration is when you compose multiple agents with Claude Agent Teams, but nail single-agent Tool Use first.
Official Sources I Relied On
Every pattern in this post was validated against Anthropic’s official docs. Primary sources for readers who want to dig deeper:
- Tool use with Claude — overview: the official explanation of
tool_use/tool_resultblocks,stop_reason, and the agentic loop. - Tool use — tool definition reference: primary material on the
name/description/input_schemaschema and token overhead. - Claude Agent SDK overview: the higher layer that abstracts the tool loop instead of hand-rolling it.
- anthropic/claude-agent-sdk-python (GitHub): the official Python SDK repo with runnable example code.
If you want to package tools as a reusable server over MCP, building a Python MCP server with FastMCP covers the next step: putting this Tool Use pattern on a standard protocol.
The Five Things That Matter
Validated findings from anthropic 0.101.0:
- Tool definitions:
name+description+input_schema. Description quality determines whether the tool gets used correctly. - Agentic loop: Detect
stop_reason == "tool_use"→ execute tool → appendtool_result→ repeat. Simple pattern, but the message structure has to be exactly right. - Error handling: Use
is_error: trueso the model recognizes failures and responds appropriately. Never return empty strings. - Cost: ~250 tokens overhead per tool definition. Combine with Prompt Caching. Watch context accumulation in multi-turn agents.
- Parallel tool calls:
ThreadPoolExecutorworks for idempotent read tools. Apply selectively.
Tool Use is the most direct path from chatbot to agent. You don’t need a complex framework. This pattern alone is enough to build practical agents.
Frequently Asked Questions
How is Tool Use different from a plain chatbot call?
What does a tool definition require?
What role should tool execution results be added under?
How much does Tool Use cost and how can I reduce it?
Was this helpful?
Your support helps me create better content. Buy me a coffee.