Anthropic's Code Execution with MCP: A Paradigm Shift in AI Tool Integration

Overview

In November 2025, Anthropic unveiled a groundbreaking innovation that fundamentally changes how AI agents interact with external tools and systems. Code Execution with MCP (Model Context Protocol) represents a paradigm shift from traditional sequential tool calling to code-based orchestration, achieving remarkable efficiency gains: a 98.7% reduction in token consumption (from 150,000 to just 2,000 tokens) and 60% faster execution in complex multi-tool workflows.

This innovation addresses one of the most critical bottlenecks in AI agent systems: the exponential growth of context window consumption as agents orchestrate multiple tools. By allowing AI models to write and execute code that calls tools directly within a secure sandbox, Anthropic has effectively solved the “context explosion” problem that has plagued enterprise AI implementations.

For developers building production AI systems, this represents more than just an optimization—it’s a fundamental architectural shift that enables entirely new classes of applications, from complex data processing pipelines to privacy-preserving enterprise workflows.

What is the Model Context Protocol (MCP)?

Before diving into Code Execution, it’s essential to understand the foundation: the Model Context Protocol.

Launched in November 2024, MCP is an open-source standard created by Anthropic to provide a unified interface for connecting AI models to external data sources, tools, and systems. Think of it as a “USB-C for AI”—a universal connector that eliminates the need for custom integrations for each tool or data source.

The protocol standardizes how AI assistants discover, authenticate, and interact with external systems through three core primitives:

Resources: Data sources (files, databases, APIs) that models can read
Tools: Functions that models can execute to take actions
Prompts: Reusable templates for common interactions

MCP quickly gained traction, with over 10,000 community-built servers and integration into major platforms including Zed, Replit, Codeium, and Sourcegraph. However, as usage scaled, a critical limitation became apparent: the traditional tool-calling approach simply couldn’t scale efficiently for complex, multi-step workflows.

Limitations of Traditional Tool Calling

To understand why Code Execution represents such a breakthrough, we need to examine the fundamental limitations of traditional tool calling.

The Sequential Call Problem

In the conventional approach, when an AI model needs to use multiple tools, it follows this pattern:

Model analyzes the task and decides which tool to call
Model sends the tool call request to the application
Application executes the tool and returns results
Results are added to the conversation context
Model reads all previous context plus new results
Model decides on the next action
Repeat for each tool call

For a simple task requiring 15 tool calls, this creates a cascade of problems:

Token Explosion: Each tool’s output is added to the context window, which must be re-read on every subsequent call. A workflow that processes 100 database records can balloon to 150,000 tokens—consuming the entire context window of most models.

Latency Multiplication: Each tool call requires a full model inference cycle. With network latency and processing time, 15 tool calls can take 30-45 seconds, making real-time applications impossible.

Context Pollution: Intermediate results (like individual database rows or file contents) permanently occupy context space, even when only summary information is needed for the final output.

Control Flow Limitations: Complex logic like loops, conditionals, and error handling must be implemented through sequential model decisions, which is both slow and unreliable.

The Real-World Impact

In production environments, these limitations manifest in critical ways:

Cost: Processing large datasets can cost hundreds of dollars in API calls due to repeated context processing
Reliability: Long chains of sequential calls increase failure points exponentially
Privacy: All intermediate data must pass through the model provider’s API, creating compliance issues for sensitive data
Scalability: Context window limits create hard caps on workflow complexity

These aren’t theoretical problems—they’re the primary reasons why many ambitious AI agent projects fail to reach production.

The Core Innovation of Code Execution with MCP

Code Execution with MCP introduces a fundamentally different approach: instead of the model making sequential tool calls, it writes code that orchestrates the tools, and that code executes in a secure sandbox.

Here’s the revolutionary workflow:

Tool Discovery: Available tools are discovered from the filesystem, not registered through API calls
Code Generation: The AI model writes Python or TypeScript code that uses the tools
Sandboxed Execution: Code runs in an isolated environment with access to tool wrappers
Summary Return: Only the final summary or result is returned to the model—not intermediate outputs

This seemingly simple change unlocks massive benefits:

Progressive Disclosure

Tools are loaded on-demand as the code imports them, rather than all tools being described upfront. For a system with 100 available tools, only the 3-5 actually used consume context tokens.

Local Control Flow

Loops, conditionals, error handling, and complex logic execute locally in code—not through sequential model decisions. This is both faster and more reliable.

# Traditional approach: 15+ model calls to process 100 records
# Code Execution approach: Single code generation, local execution

for record in database.query("SELECT * FROM users LIMIT 100"):
    if record['status'] == 'active':
        result = api.update_user(record['id'], {'last_checked': now()})
        if result.error:
            log_error(result.error)
        else:
            success_count += 1

return f"Updated {success_count} active users"

Privacy Preservation

Intermediate data (like individual database records or file contents) never leaves the sandbox. Only the final summary is sent to the model provider, enabling compliant processing of sensitive data.

State Persistence

Variables and data structures persist within the execution session, enabling complex multi-stage workflows without re-fetching data.

Technical Architecture

Understanding the technical implementation reveals how Anthropic achieved these dramatic improvements while maintaining security and developer ergonomics.

Filesystem-Based Tool Discovery

Traditional MCP servers register tools through API declarations. Code Execution instead uses filesystem-based discovery:

mcp-server/
├── tools/
│   ├── database/
│   │   ├── query.ts
│   │   └── update.ts
│   ├── api/
│   │   └── fetch.ts
│   └── file/
│       ├── read.ts
│       └── write.ts
└── index.ts

Each .ts file exports a function with standardized metadata:

// tools/database/query.ts
export async function query(sql: string): Promise<Record[]> {
  // Implementation
}

query.description = "Execute SQL query and return records";
query.parameters = {
  sql: { type: "string", description: "SQL query to execute" }
};

This approach enables progressive loading: tools are only loaded when imported, reducing initial context consumption by 60-80%.

Tool Wrapper Generation

When code execution begins, the sandbox automatically generates lightweight wrappers for each tool:

// Auto-generated wrapper for database.query
import { mcp } from '@anthropic/sandbox-runtime';

export async function query(sql: string): Promise<Record[]> {
  return await mcp.callTool('database.query', { sql });
}

These wrappers provide a native programming interface while routing calls through the MCP protocol under the hood. Developers write normal code; the sandbox handles protocol translation.

Sandboxed Execution Environment

Security is paramount when executing AI-generated code. Anthropic’s sandbox runtime provides multiple layers of isolation:

Process Isolation: Uses bubblewrap (Linux) or seatbelt (macOS) to create a restricted process namespace

Filesystem Restrictions: Code can only access whitelisted paths, preventing unauthorized file access

Network Control: Outbound connections are limited to approved MCP servers

Resource Limits: CPU, memory, and execution time are capped to prevent resource exhaustion

API Rate Limiting: Tool calls are throttled to prevent abuse

Example sandbox configuration:

import { createSandbox } from '@anthropic/sandbox-runtime';

const sandbox = createSandbox({
  runtime: 'node',
  timeout: 30000,  // 30 second max execution
  memory: '512MB',
  filesystem: {
    readOnly: ['/tools'],
    readWrite: ['/tmp']
  },
  network: {
    allowedHosts: ['mcp.example.com']
  }
});

const result = await sandbox.execute(code, {
  tools: ['database', 'api']
});

Dramatic Performance Improvements

Anthropic’s benchmarks demonstrate the real-world impact of this architectural shift.

98.7% Token Reduction

In a representative workflow processing 100 database records with filtering and API updates:

Traditional Tool Calling:

Initial context: 5,000 tokens
100 database reads: 50,000 tokens (500 tokens per record)
15 API updates: 15,000 tokens
Model reasoning between calls: 80,000 tokens
Total: 150,000 tokens

Code Execution:

Initial context: 1,000 tokens
Generated code: 500 tokens
Final summary: 500 tokens
Total: 2,000 tokens

This 98.7% reduction translates directly to:

75x lower API costs ($7.50 vs $0.10 per workflow at Claude 3.5 Sonnet pricing)
Ability to process 75x more data within the same context window
Support for longer conversations without context overflow

60% Faster Execution

Latency improvements are equally dramatic:

Traditional Approach:

15 sequential tool calls × 2 seconds each = 30 seconds
Plus network latency and queuing: 35-45 seconds total

Code Execution:

Code generation: 3 seconds
Local execution: 10 seconds (tools called in parallel/locally)
Summary generation: 2 seconds
Total: 15 seconds (60% faster)

For user-facing applications, this transforms experiences that feel broken (45-second waits) into responsive interactions (15 seconds).

Key Features and Benefits

Beyond raw performance, Code Execution with MCP enables capabilities that were previously impractical or impossible.

Progressive Tool Loading

Instead of describing all available tools upfront, tools are loaded only when imported:

// Only loads and describes the 'database' tool
import { query } from './tools/database';

const records = await query("SELECT * FROM users");

For enterprise systems with hundreds of available tools, this reduces initial context consumption by 80-95%, making comprehensive tool libraries practical.

Local Control Flow

Complex programming constructs execute locally without model involvement:

// This entire loop executes locally—not 100 sequential model calls
const results = [];
for (const user of users) {
  try {
    if (user.status === 'active' && user.lastLogin < thirtyDaysAgo) {
      const updated = await api.updateUser(user.id, {
        status: 'inactive'
      });
      results.push(updated);
    }
  } catch (error) {
    logger.error(`Failed to update user ${user.id}: ${error.message}`);
  }
}

return `Deactivated ${results.length} inactive users`;

This is not only faster but also more reliable—control flow is deterministic code execution, not probabilistic model decisions.

Privacy Preservation

Sensitive data never leaves the sandbox:

// Medical records are processed locally, only counts are returned
import { queryDatabase } from './tools/database';
import { encryptData } from './tools/crypto';

const patients = await queryDatabase("SELECT * FROM patients WHERE condition = 'diabetes'");

// Patient data stays in sandbox
const encrypted = patients.map(p => encryptData(p.medicalRecord));

// Only summary is returned to model
return {
  totalPatients: patients.length,
  avgAge: patients.reduce((sum, p) => sum + p.age, 0) / patients.length,
  encryptedRecordsStored: encrypted.length
};

This enables HIPAA, GDPR, and other compliance requirements that prohibit sending sensitive data to third-party APIs.

State Persistence

Variables persist across the execution session:

// First execution: Load and cache data
const productCatalog = await database.loadProducts();
return "Loaded 10,000 products";

// Second execution: Reuse cached data (no re-fetch)
const filteredProducts = productCatalog.filter(p => p.price < 100);
return `Found ${filteredProducts.length} products under $100`;

This enables multi-turn conversations that build on previous work without redundant data fetching.

Real-World Use Cases

Early adopters have deployed Code Execution with MCP across diverse domains.

Development Tool Integration

Zed Editor: Uses Code Execution to implement “AI Pair Programmer” that can refactor entire codebases. By generating code that traverses file trees, applies transformations, and runs tests, Zed achieves complex refactoring in seconds rather than minutes.

Replit: Enables “natural language deployments” where users describe infrastructure requirements and the AI generates Terraform code that provisions resources, all within a secure sandbox.

Sourcegraph: Implements semantic code search across millions of repositories by generating code that queries their graph database and post-processes results locally.

Enterprise Data Processing

Block (formerly Square): Processes financial transactions for fraud detection. Code Execution allows their AI to analyze thousands of transactions locally, apply custom business logic, and return only risk scores—keeping transaction details private.

Apollo GraphQL: Generates API integration code that queries GraphQL endpoints, transforms data, and updates internal systems—automating workflows that previously required manual engineering work.

Cognizant: Built an “Intelligent Document Processor” that extracts data from invoices, validates against business rules, and updates ERP systems—processing 100+ documents per run with 98% token savings.

Document and Database Operations

A common pattern is the “cross-system workflow”:

// Read from Google Drive
import { listFiles, readFile } from './tools/google-drive';

const invoices = await listFiles({ folder: 'Invoices', since: '2025-01-01' });

// Process and transform
const processed = [];
for (const invoice of invoices) {
  const content = await readFile(invoice.id);
  const data = parseInvoice(content);

  if (data.amount > 10000) {
    // Write to Salesforce
    const sfRecord = await salesforce.createOpportunity({
      name: `Large Invoice ${data.invoiceNumber}`,
      amount: data.amount,
      stage: 'Proposal'
    });
    processed.push(sfRecord);
  }
}

return `Processed ${invoices.length} invoices, created ${processed.length} opportunities`;

This workflow would require 200+ tool calls traditionally (list files, read each, create records). With Code Execution, it’s a single code generation plus local execution.

Security Considerations

Executing AI-generated code introduces significant security challenges that must be addressed systematically.

Key Risk Factors

Anthropic’s security research identified critical vulnerabilities:

Command Injection (43% vulnerability rate): AI models frequently generate code vulnerable to injection attacks:

// Vulnerable code AI might generate
const userInput = request.params.filename;
await exec(`cat ${userInput}`);  // Can execute arbitrary commands

// Secure version
const allowedFiles = ['data.csv', 'report.txt'];
if (!allowedFiles.includes(userInput)) {
  throw new Error('Invalid file');
}
await readFile(userInput);

Context Window Attacks: Malicious tool descriptions can inject code into the generation process

Data Leakage: Without proper sandboxing, code could exfiltrate sensitive data

Resource Exhaustion: Infinite loops or memory-intensive operations can DoS the system

Essential Security Measures

Anthropic recommends a defense-in-depth approach:

Sandbox Isolation: Mandatory for production deployments

const sandbox = createSandbox({
  runtime: 'deno',  // Secure-by-default runtime
  permissions: {
    read: ['/app/data'],
    write: ['/tmp'],
    net: ['mcp.example.com'],
    env: false,
    run: false  // No subprocess execution
  }
});

Input Validation: All tool parameters must be validated before execution

export async function query(sql: string): Promise<Record[]> {
  // Whitelist-based validation
  if (!sql.match(/^SELECT .+ FROM (users|products|orders)$/i)) {
    throw new Error('Invalid SQL query');
  }
  return executeQuery(sql);
}

Rate Limiting: Prevent abuse through API quotas

const limiter = createRateLimiter({
  toolCalls: { max: 100, window: '1m' },
  execution: { max: 10, window: '1m' },
  tokens: { max: 1000000, window: '1h' }
});

Audit Logging: All tool calls and code executions must be logged for security monitoring

Centralized Governance: MCP server registry with approved tools only

Implementation Guide

Getting started with Code Execution requires setting up both server and client components.

Setting Up MCP Server

Create a basic MCP server with filesystem-based tools:

// server/index.ts
import { MCPServer } from '@anthropic/mcp-sdk';
import { loadTools } from '@anthropic/mcp-tools';

const server = new MCPServer({
  name: 'enterprise-tools',
  version: '1.0.0'
});

// Automatically load tools from filesystem
const tools = await loadTools('./tools');

server.registerTools(tools);

server.listen(3000);

Creating Tool Wrappers

Define tools with proper metadata:

// tools/database/query.ts
import { z } from 'zod';
import { createTool } from '@anthropic/mcp-tools';

export const query = createTool({
  name: 'database.query',
  description: 'Execute SQL query against production database',

  parameters: z.object({
    sql: z.string().describe('SQL SELECT query'),
    limit: z.number().max(1000).default(100)
  }),

  async execute({ sql, limit }) {
    // Validate query
    if (!sql.toUpperCase().startsWith('SELECT')) {
      throw new Error('Only SELECT queries allowed');
    }

    // Execute with safety limits
    const results = await db.query(sql + ` LIMIT ${limit}`);

    return results;
  }
});

Enabling Sandboxing

Configure the sandbox runtime:

// client/sandbox.ts
import { createSandbox } from '@anthropic/sandbox-runtime';
import Anthropic from '@anthropic/sdk';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});

const sandbox = createSandbox({
  runtime: 'node',
  timeout: 30000,
  memory: '1GB',

  filesystem: {
    readOnly: ['/tools', '/node_modules'],
    readWrite: ['/tmp', '/workspace']
  },

  network: {
    allowedHosts: [
      'mcp.company.com',
      'api.internal.company.com'
    ]
  },

  env: {
    NODE_ENV: 'production',
    MCP_SERVER_URL: 'http://mcp.company.com'
  }
});

// Execute AI-generated code
const response = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 4096,
  messages: [{
    role: 'user',
    content: 'Query the database for all active users and update their last_checked timestamp'
  }],
  tools: [{
    type: 'code_execution',
    sandbox: sandbox.config
  }]
});

// Extract and execute code
const codeBlock = response.content.find(c => c.type === 'code');
const result = await sandbox.execute(codeBlock.code);

console.log(result.output);

Current Limitations

While Code Execution represents a major advancement, several limitations exist in the current implementation:

Infrastructure Complexity: Running sandboxed execution requires additional infrastructure (container runtime, resource management) compared to simple API-based tool calling. This increases operational overhead.

Performance Overhead for Simple Cases: For workflows requiring only 1-2 tool calls, the code generation and sandbox setup overhead can make Code Execution slower than direct tool calling.

Security Vulnerabilities: Despite sandboxing, AI-generated code has a 43% vulnerability rate in Anthropic’s testing. Production deployments require additional security measures beyond the sandbox.

Remote Server Limitations: Currently, Code Execution works best with local MCP servers. Remote server support (planned for 2025) requires additional authentication and networking complexity.

Debugging Challenges: When generated code fails, debugging requires examining AI-generated code, which can be complex and non-idiomatic.

Language Support: Initial release supports Python and TypeScript/Node.js. Other languages (Go, Rust, Java) are planned but not yet available.

Future Outlook

Anthropic’s roadmap for Code Execution and MCP indicates significant expansion in 2025-2026.

2025 Roadmap

Remote Server Support: Full support for remote MCP servers with OAuth 2.1 authentication, enabling enterprise SaaS integrations.

Multi-Language Execution: SDKs for Go, Rust, and Java are in development, enabling code generation in the best language for each task.

Enhanced Sandboxing: Integration with Firecracker and gVisor for improved isolation and performance.

Enterprise Features:

Centralized tool governance and approval workflows
Audit logging and compliance reporting
Role-based access control for tools
Cost management and chargeback

Observability: Distributed tracing, metrics, and debugging tools for production deployments.

Industry Adoption

Early indicators suggest rapid adoption:

10,000+ MCP servers built by the community
Major IDE integrations: Zed, Replit, Codeium, Cursor
Enterprise partnerships: Block, Apollo, Cognizant, BrowserBase
Analyst predictions: Gartner forecasts 60% of enterprise AI projects will use code execution patterns by end of 2026

The protocol’s open-source nature and compelling performance benefits suggest it could become the de facto standard for AI-system integration—similar to how REST APIs became ubiquitous for web services.

Conclusion

Anthropic’s Code Execution with MCP represents a fundamental paradigm shift in how AI agents interact with tools and external systems. By moving from sequential tool calling to sandboxed code orchestration, it achieves:

98.7% token reduction, slashing costs by 75x
60% faster execution for complex workflows
Privacy preservation through local data processing
Enhanced reliability via deterministic control flow
Progressive scaling from simple to complex tool ecosystems

For developers and enterprises building production AI systems, this innovation removes critical bottlenecks that previously limited agent capabilities. Workflows that were impossible due to context limits or cost prohibitive due to token consumption are now practical.

However, success requires careful attention to security (sandboxing, input validation, rate limiting), thoughtful tool design (clear interfaces, proper error handling), and operational maturity (monitoring, debugging, governance).

As the ecosystem matures and remote server support arrives in 2025, Code Execution with MCP has the potential to become the foundational layer for AI-powered automation—enabling the next generation of intelligent applications that seamlessly orchestrate complex, multi-system workflows while maintaining security, privacy, and cost-efficiency.

For teams considering adoption, the recommendation is clear: start with well-defined, high-value use cases (data processing, cross-system workflows), implement robust sandboxing and monitoring, and progressively expand as the platform matures. The performance benefits alone justify the investment, and the architectural advantages position you for the AI-native future.

Reading Complete!

Overview

What is the Model Context Protocol (MCP)?

Limitations of Traditional Tool Calling

The Sequential Call Problem

The Real-World Impact

The Core Innovation of Code Execution with MCP

Progressive Disclosure

Local Control Flow

Privacy Preservation

State Persistence

Technical Architecture

Filesystem-Based Tool Discovery

Tool Wrapper Generation

Sandboxed Execution Environment

Dramatic Performance Improvements

98.7% Token Reduction

60% Faster Execution

Key Features and Benefits

Progressive Tool Loading

Local Control Flow

Privacy Preservation

State Persistence

Real-World Use Cases

Development Tool Integration

Enterprise Data Processing

Document and Database Operations

Security Considerations

Key Risk Factors

Essential Security Measures

Implementation Guide

Setting Up MCP Server

Creating Tool Wrappers

Enabling Sandboxing

Current Limitations

Future Outlook

2025 Roadmap

Industry Adoption

Conclusion

References

Read in Other Languages

Was this helpful?

About the Author

Kim Jangwook

Related Articles

LangGraph Multi-Agent Systems: Production-Grade Orchestration

OpenAI AgentKit Complete Guide Part 2: Production Deployment and Advanced Patterns

Building an MCP Server for BigQuery with Prefix Filtering