Anthropic's Code Execution with MCP: A Paradigm Shift in AI Tool Integration
Explore how Anthropic's Code Execution with MCP achieves 98.7% token reduction and 60% faster execution through sandboxed code-based tool orchestration.
Overview
In November 2025, Anthropic unveiled a groundbreaking innovation that fundamentally changes how AI agents interact with external tools and systems. Code Execution with MCP (Model Context Protocol) represents a paradigm shift from traditional sequential tool calling to code-based orchestration, achieving remarkable efficiency gains: a 98.7% reduction in token consumption (from 150,000 to just 2,000 tokens) and 60% faster execution in complex multi-tool workflows.
This innovation addresses one of the most critical bottlenecks in AI agent systems: the exponential growth of context window consumption as agents orchestrate multiple tools. By allowing AI models to write and execute code that calls tools directly within a secure sandbox, Anthropic has effectively solved the “context explosion” problem that has plagued enterprise AI implementations.
For developers building production AI systems, this represents more than just an optimization—it’s a fundamental architectural shift that enables entirely new classes of applications, from complex data processing pipelines to privacy-preserving enterprise workflows.
What is the Model Context Protocol (MCP)?
Before diving into Code Execution, it’s essential to understand the foundation: the Model Context Protocol.
Launched in November 2024, MCP is an open-source standard created by Anthropic to provide a unified interface for connecting AI models to external data sources, tools, and systems. Think of it as a “USB-C for AI”—a universal connector that eliminates the need for custom integrations for each tool or data source.
The protocol standardizes how AI assistants discover, authenticate, and interact with external systems through three core primitives:
- Resources: Data sources (files, databases, APIs) that models can read
- Tools: Functions that models can execute to take actions
- Prompts: Reusable templates for common interactions
MCP quickly gained traction, with over 10,000 community-built servers and integration into major platforms including Zed, Replit, Codeium, and Sourcegraph. However, as usage scaled, a critical limitation became apparent: the traditional tool-calling approach simply couldn’t scale efficiently for complex, multi-step workflows.
Limitations of Traditional Tool Calling
To understand why Code Execution represents such a breakthrough, we need to examine the fundamental limitations of traditional tool calling.
The Sequential Call Problem
In the conventional approach, when an AI model needs to use multiple tools, it follows this pattern:
- Model analyzes the task and decides which tool to call
- Model sends the tool call request to the application
- Application executes the tool and returns results
- Results are added to the conversation context
- Model reads all previous context plus new results
- Model decides on the next action
- Repeat for each tool call
For a simple task requiring 15 tool calls, this creates a cascade of problems:
Token Explosion: Each tool’s output is added to the context window, which must be re-read on every subsequent call. A workflow that processes 100 database records can balloon to 150,000 tokens—consuming the entire context window of most models.
Latency Multiplication: Each tool call requires a full model inference cycle. With network latency and processing time, 15 tool calls can take 30-45 seconds, making real-time applications impossible.
Context Pollution: Intermediate results (like individual database rows or file contents) permanently occupy context space, even when only summary information is needed for the final output.
Control Flow Limitations: Complex logic like loops, conditionals, and error handling must be implemented through sequential model decisions, which is both slow and unreliable.
The Real-World Impact
In production environments, these limitations manifest in critical ways:
- Cost: Processing large datasets can cost hundreds of dollars in API calls due to repeated context processing
- Reliability: Long chains of sequential calls increase failure points exponentially
- Privacy: All intermediate data must pass through the model provider’s API, creating compliance issues for sensitive data
- Scalability: Context window limits create hard caps on workflow complexity
These aren’t theoretical problems—they’re the primary reasons why many ambitious AI agent projects fail to reach production.
The Core Innovation of Code Execution with MCP
Code Execution with MCP introduces a fundamentally different approach: instead of the model making sequential tool calls, it writes code that orchestrates the tools, and that code executes in a secure sandbox.
Here’s the revolutionary workflow:
- Tool Discovery: Available tools are discovered from the filesystem, not registered through API calls
- Code Generation: The AI model writes Python or TypeScript code that uses the tools
- Sandboxed Execution: Code runs in an isolated environment with access to tool wrappers
- Summary Return: Only the final summary or result is returned to the model—not intermediate outputs
This seemingly simple change unlocks massive benefits:
Progressive Disclosure
Tools are loaded on-demand as the code imports them, rather than all tools being described upfront. For a system with 100 available tools, only the 3-5 actually used consume context tokens.
Local Control Flow
Loops, conditionals, error handling, and complex logic execute locally in code—not through sequential model decisions. This is both faster and more reliable.
# Traditional approach: 15+ model calls to process 100 records
# Code Execution approach: Single code generation, local execution
for record in database.query("SELECT * FROM users LIMIT 100"):
if record['status'] == 'active':
result = api.update_user(record['id'], {'last_checked': now()})
if result.error:
log_error(result.error)
else:
success_count += 1
return f"Updated {success_count} active users"
Privacy Preservation
Intermediate data (like individual database records or file contents) never leaves the sandbox. Only the final summary is sent to the model provider, enabling compliant processing of sensitive data.
State Persistence
Variables and data structures persist within the execution session, enabling complex multi-stage workflows without re-fetching data.
Technical Architecture
Understanding the technical implementation reveals how Anthropic achieved these dramatic improvements while maintaining security and developer ergonomics.
Filesystem-Based Tool Discovery
Traditional MCP servers register tools through API declarations. Code Execution instead uses filesystem-based discovery:
mcp-server/
├── tools/
│ ├── database/
│ │ ├── query.ts
│ │ └── update.ts
│ ├── api/
│ │ └── fetch.ts
│ └── file/
│ ├── read.ts
│ └── write.ts
└── index.ts
Each .ts file exports a function with standardized metadata:
// tools/database/query.ts
export async function query(sql: string): Promise<Record[]> {
// Implementation
}
query.description = "Execute SQL query and return records";
query.parameters = {
sql: { type: "string", description: "SQL query to execute" }
};
This approach enables progressive loading: tools are only loaded when imported, reducing initial context consumption by 60-80%.
Tool Wrapper Generation
When code execution begins, the sandbox automatically generates lightweight wrappers for each tool:
// Auto-generated wrapper for database.query
import { mcp } from '@anthropic/sandbox-runtime';
export async function query(sql: string): Promise<Record[]> {
return await mcp.callTool('database.query', { sql });
}
These wrappers provide a native programming interface while routing calls through the MCP protocol under the hood. Developers write normal code; the sandbox handles protocol translation.
Sandboxed Execution Environment
Security is paramount when executing AI-generated code. Anthropic’s sandbox runtime provides multiple layers of isolation:
Process Isolation: Uses bubblewrap (Linux) or seatbelt (macOS) to create a restricted process namespace
Filesystem Restrictions: Code can only access whitelisted paths, preventing unauthorized file access
Network Control: Outbound connections are limited to approved MCP servers
Resource Limits: CPU, memory, and execution time are capped to prevent resource exhaustion
API Rate Limiting: Tool calls are throttled to prevent abuse
Example sandbox configuration:
import { createSandbox } from '@anthropic/sandbox-runtime';
const sandbox = createSandbox({
runtime: 'node',
timeout: 30000, // 30 second max execution
memory: '512MB',
filesystem: {
readOnly: ['/tools'],
readWrite: ['/tmp']
},
network: {
allowedHosts: ['mcp.example.com']
}
});
const result = await sandbox.execute(code, {
tools: ['database', 'api']
});
Dramatic Performance Improvements
Anthropic’s benchmarks demonstrate the real-world impact of this architectural shift.
98.7% Token Reduction
In a representative workflow processing 100 database records with filtering and API updates:
Traditional Tool Calling:
- Initial context: 5,000 tokens
- 100 database reads: 50,000 tokens (500 tokens per record)
- 15 API updates: 15,000 tokens
- Model reasoning between calls: 80,000 tokens
- Total: 150,000 tokens
Code Execution:
- Initial context: 1,000 tokens
- Generated code: 500 tokens
- Final summary: 500 tokens
- Total: 2,000 tokens
This 98.7% reduction translates directly to:
- 75x lower API costs ($7.50 vs $0.10 per workflow at Claude 3.5 Sonnet pricing)
- Ability to process 75x more data within the same context window
- Support for longer conversations without context overflow
60% Faster Execution
Latency improvements are equally dramatic:
Traditional Approach:
- 15 sequential tool calls × 2 seconds each = 30 seconds
- Plus network latency and queuing: 35-45 seconds total
Code Execution:
- Code generation: 3 seconds
- Local execution: 10 seconds (tools called in parallel/locally)
- Summary generation: 2 seconds
- Total: 15 seconds (60% faster)
For user-facing applications, this transforms experiences that feel broken (45-second waits) into responsive interactions (15 seconds).
Key Features and Benefits
Beyond raw performance, Code Execution with MCP enables capabilities that were previously impractical or impossible.
Progressive Tool Loading
Instead of describing all available tools upfront, tools are loaded only when imported:
// Only loads and describes the 'database' tool
import { query } from './tools/database';
const records = await query("SELECT * FROM users");
For enterprise systems with hundreds of available tools, this reduces initial context consumption by 80-95%, making comprehensive tool libraries practical.
Local Control Flow
Complex programming constructs execute locally without model involvement:
// This entire loop executes locally—not 100 sequential model calls
const results = [];
for (const user of users) {
try {
if (user.status === 'active' && user.lastLogin < thirtyDaysAgo) {
const updated = await api.updateUser(user.id, {
status: 'inactive'
});
results.push(updated);
}
} catch (error) {
logger.error(`Failed to update user ${user.id}: ${error.message}`);
}
}
return `Deactivated ${results.length} inactive users`;
This is not only faster but also more reliable—control flow is deterministic code execution, not probabilistic model decisions.
Privacy Preservation
Sensitive data never leaves the sandbox:
// Medical records are processed locally, only counts are returned
import { queryDatabase } from './tools/database';
import { encryptData } from './tools/crypto';
const patients = await queryDatabase("SELECT * FROM patients WHERE condition = 'diabetes'");
// Patient data stays in sandbox
const encrypted = patients.map(p => encryptData(p.medicalRecord));
// Only summary is returned to model
return {
totalPatients: patients.length,
avgAge: patients.reduce((sum, p) => sum + p.age, 0) / patients.length,
encryptedRecordsStored: encrypted.length
};
This enables HIPAA, GDPR, and other compliance requirements that prohibit sending sensitive data to third-party APIs.
State Persistence
Variables persist across the execution session:
// First execution: Load and cache data
const productCatalog = await database.loadProducts();
return "Loaded 10,000 products";
// Second execution: Reuse cached data (no re-fetch)
const filteredProducts = productCatalog.filter(p => p.price < 100);
return `Found ${filteredProducts.length} products under $100`;
This enables multi-turn conversations that build on previous work without redundant data fetching.
Real-World Use Cases
Early adopters have deployed Code Execution with MCP across diverse domains.
Development Tool Integration
Zed Editor: Uses Code Execution to implement “AI Pair Programmer” that can refactor entire codebases. By generating code that traverses file trees, applies transformations, and runs tests, Zed achieves complex refactoring in seconds rather than minutes.
Replit: Enables “natural language deployments” where users describe infrastructure requirements and the AI generates Terraform code that provisions resources, all within a secure sandbox.
Sourcegraph: Implements semantic code search across millions of repositories by generating code that queries their graph database and post-processes results locally.
Enterprise Data Processing
Block (formerly Square): Processes financial transactions for fraud detection. Code Execution allows their AI to analyze thousands of transactions locally, apply custom business logic, and return only risk scores—keeping transaction details private.
Apollo GraphQL: Generates API integration code that queries GraphQL endpoints, transforms data, and updates internal systems—automating workflows that previously required manual engineering work.
Cognizant: Built an “Intelligent Document Processor” that extracts data from invoices, validates against business rules, and updates ERP systems—processing 100+ documents per run with 98% token savings.
Document and Database Operations
A common pattern is the “cross-system workflow”:
// Read from Google Drive
import { listFiles, readFile } from './tools/google-drive';
const invoices = await listFiles({ folder: 'Invoices', since: '2025-01-01' });
// Process and transform
const processed = [];
for (const invoice of invoices) {
const content = await readFile(invoice.id);
const data = parseInvoice(content);
if (data.amount > 10000) {
// Write to Salesforce
const sfRecord = await salesforce.createOpportunity({
name: `Large Invoice ${data.invoiceNumber}`,
amount: data.amount,
stage: 'Proposal'
});
processed.push(sfRecord);
}
}
return `Processed ${invoices.length} invoices, created ${processed.length} opportunities`;
This workflow would require 200+ tool calls traditionally (list files, read each, create records). With Code Execution, it’s a single code generation plus local execution.
Security Considerations
Executing AI-generated code introduces significant security challenges that must be addressed systematically.
Key Risk Factors
Anthropic’s security research identified critical vulnerabilities:
Command Injection (43% vulnerability rate): AI models frequently generate code vulnerable to injection attacks:
// Vulnerable code AI might generate
const userInput = request.params.filename;
await exec(`cat ${userInput}`); // Can execute arbitrary commands
// Secure version
const allowedFiles = ['data.csv', 'report.txt'];
if (!allowedFiles.includes(userInput)) {
throw new Error('Invalid file');
}
await readFile(userInput);
Context Window Attacks: Malicious tool descriptions can inject code into the generation process
Data Leakage: Without proper sandboxing, code could exfiltrate sensitive data
Resource Exhaustion: Infinite loops or memory-intensive operations can DoS the system
Essential Security Measures
Anthropic recommends a defense-in-depth approach:
Sandbox Isolation: Mandatory for production deployments
const sandbox = createSandbox({
runtime: 'deno', // Secure-by-default runtime
permissions: {
read: ['/app/data'],
write: ['/tmp'],
net: ['mcp.example.com'],
env: false,
run: false // No subprocess execution
}
});
Input Validation: All tool parameters must be validated before execution
export async function query(sql: string): Promise<Record[]> {
// Whitelist-based validation
if (!sql.match(/^SELECT .+ FROM (users|products|orders)$/i)) {
throw new Error('Invalid SQL query');
}
return executeQuery(sql);
}
Rate Limiting: Prevent abuse through API quotas
const limiter = createRateLimiter({
toolCalls: { max: 100, window: '1m' },
execution: { max: 10, window: '1m' },
tokens: { max: 1000000, window: '1h' }
});
Audit Logging: All tool calls and code executions must be logged for security monitoring
Centralized Governance: MCP server registry with approved tools only
Implementation Guide
Getting started with Code Execution requires setting up both server and client components.
Setting Up MCP Server
Create a basic MCP server with filesystem-based tools:
// server/index.ts
import { MCPServer } from '@anthropic/mcp-sdk';
import { loadTools } from '@anthropic/mcp-tools';
const server = new MCPServer({
name: 'enterprise-tools',
version: '1.0.0'
});
// Automatically load tools from filesystem
const tools = await loadTools('./tools');
server.registerTools(tools);
server.listen(3000);
Creating Tool Wrappers
Define tools with proper metadata:
// tools/database/query.ts
import { z } from 'zod';
import { createTool } from '@anthropic/mcp-tools';
export const query = createTool({
name: 'database.query',
description: 'Execute SQL query against production database',
parameters: z.object({
sql: z.string().describe('SQL SELECT query'),
limit: z.number().max(1000).default(100)
}),
async execute({ sql, limit }) {
// Validate query
if (!sql.toUpperCase().startsWith('SELECT')) {
throw new Error('Only SELECT queries allowed');
}
// Execute with safety limits
const results = await db.query(sql + ` LIMIT ${limit}`);
return results;
}
});
Enabling Sandboxing
Configure the sandbox runtime:
// client/sandbox.ts
import { createSandbox } from '@anthropic/sandbox-runtime';
import Anthropic from '@anthropic/sdk';
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
const sandbox = createSandbox({
runtime: 'node',
timeout: 30000,
memory: '1GB',
filesystem: {
readOnly: ['/tools', '/node_modules'],
readWrite: ['/tmp', '/workspace']
},
network: {
allowedHosts: [
'mcp.company.com',
'api.internal.company.com'
]
},
env: {
NODE_ENV: 'production',
MCP_SERVER_URL: 'http://mcp.company.com'
}
});
// Execute AI-generated code
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 4096,
messages: [{
role: 'user',
content: 'Query the database for all active users and update their last_checked timestamp'
}],
tools: [{
type: 'code_execution',
sandbox: sandbox.config
}]
});
// Extract and execute code
const codeBlock = response.content.find(c => c.type === 'code');
const result = await sandbox.execute(codeBlock.code);
console.log(result.output);
Current Limitations
While Code Execution represents a major advancement, several limitations exist in the current implementation:
Infrastructure Complexity: Running sandboxed execution requires additional infrastructure (container runtime, resource management) compared to simple API-based tool calling. This increases operational overhead.
Performance Overhead for Simple Cases: For workflows requiring only 1-2 tool calls, the code generation and sandbox setup overhead can make Code Execution slower than direct tool calling.
Security Vulnerabilities: Despite sandboxing, AI-generated code has a 43% vulnerability rate in Anthropic’s testing. Production deployments require additional security measures beyond the sandbox.
Remote Server Limitations: Currently, Code Execution works best with local MCP servers. Remote server support (planned for 2025) requires additional authentication and networking complexity.
Debugging Challenges: When generated code fails, debugging requires examining AI-generated code, which can be complex and non-idiomatic.
Language Support: Initial release supports Python and TypeScript/Node.js. Other languages (Go, Rust, Java) are planned but not yet available.
Future Outlook
Anthropic’s roadmap for Code Execution and MCP indicates significant expansion in 2025-2026.
2025 Roadmap
Remote Server Support: Full support for remote MCP servers with OAuth 2.1 authentication, enabling enterprise SaaS integrations.
Multi-Language Execution: SDKs for Go, Rust, and Java are in development, enabling code generation in the best language for each task.
Enhanced Sandboxing: Integration with Firecracker and gVisor for improved isolation and performance.
Enterprise Features:
- Centralized tool governance and approval workflows
- Audit logging and compliance reporting
- Role-based access control for tools
- Cost management and chargeback
Observability: Distributed tracing, metrics, and debugging tools for production deployments.
Industry Adoption
Early indicators suggest rapid adoption:
- 10,000+ MCP servers built by the community
- Major IDE integrations: Zed, Replit, Codeium, Cursor
- Enterprise partnerships: Block, Apollo, Cognizant, BrowserBase
- Analyst predictions: Gartner forecasts 60% of enterprise AI projects will use code execution patterns by end of 2026
The protocol’s open-source nature and compelling performance benefits suggest it could become the de facto standard for AI-system integration—similar to how REST APIs became ubiquitous for web services.
Conclusion
Anthropic’s Code Execution with MCP represents a fundamental paradigm shift in how AI agents interact with tools and external systems. By moving from sequential tool calling to sandboxed code orchestration, it achieves:
- 98.7% token reduction, slashing costs by 75x
- 60% faster execution for complex workflows
- Privacy preservation through local data processing
- Enhanced reliability via deterministic control flow
- Progressive scaling from simple to complex tool ecosystems
For developers and enterprises building production AI systems, this innovation removes critical bottlenecks that previously limited agent capabilities. Workflows that were impossible due to context limits or cost prohibitive due to token consumption are now practical.
However, success requires careful attention to security (sandboxing, input validation, rate limiting), thoughtful tool design (clear interfaces, proper error handling), and operational maturity (monitoring, debugging, governance).
As the ecosystem matures and remote server support arrives in 2025, Code Execution with MCP has the potential to become the foundational layer for AI-powered automation—enabling the next generation of intelligent applications that seamlessly orchestrate complex, multi-system workflows while maintaining security, privacy, and cost-efficiency.
For teams considering adoption, the recommendation is clear: start with well-defined, high-value use cases (data processing, cross-system workflows), implement robust sandboxing and monitoring, and progressively expand as the platform matures. The performance benefits alone justify the investment, and the architectural advantages position you for the AI-native future.
References
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕