mcp2cli — Cut MCP Token Costs by 96–99% with CLI-Based Tool Discovery

mcp2cli — Cut MCP Token Costs by 96–99% with CLI-Based Tool Discovery

Connecting MCP servers injects all tool schemas into context every turn—362,000 tokens wasted for 120 tools over 25 turns. mcp2cli solves this with CLI-based on-demand discovery, cutting costs by 96–99%. Here's how it works and when to use it.

Overview

As Model Context Protocol (MCP) becomes the standard for connecting AI agents to external tools and APIs, a new bottleneck has emerged: tool schema token waste.

When you connect MCP servers, every tool’s JSON schema is injected into the LLM’s context window on every single conversation turn—whether or not the model uses those tools. With 30 tools, that’s approximately 3,600 tokens burned per turn doing nothing. Scale to 120 tools over a 25-turn conversation and you’re looking at 362,000 tokens consumed by schemas alone.

mcp2cli solves this with CLI-based on-demand tool discovery. Instead of preloading all schemas upfront, the model queries --list and --help only when needed—cutting token waste by 96–99%.

The Problem: Cost of Upfront Schema Injection

How Traditional MCP Integration Works

[Conversation Start]
System Prompt + ALL tool schemas (30 tools × 121 tokens = 3,630 tokens)

Turn 1: User message + ALL schemas re-injected
Turn 2: User message + ALL schemas re-injected
Turn 3: User message + ALL schemas re-injected
...
Turn 15: User message + ALL schemas re-injected

30 tools × 15 turns = 54,450 tokens consumed by schemas alone—regardless of whether the model called any tool on that turn.

Measured Token Costs

ScenarioNative MCPmcp2cliSavings
30 tools, 15 turns54,525 tokens2,309 tokens96%
80 tools, 20 turns193,240 tokens3,871 tokens98%
120 tools, 25 turns362,350 tokens5,181 tokens99%
200-endpoint API, 25 turns358,425 tokens3,925 tokens99%

The more tools you have and the longer the conversation, the greater the savings. At enterprise scale, this changes the cost structure entirely.

How mcp2cli Works

Core Idea: Schema Preload → On-Demand Discovery

[Traditional approach]
All tool schemas → always in context

[mcp2cli approach]
Tool list only (~16 tokens/tool) → model calls --help only when needed (~120 tokens/tool)

The LLM receives only tool names and brief descriptions via mcp2cli --list, then calls mcp2cli [tool-name] --help only when it actually wants to use that tool.

Four-Stage Processing Pipeline

1. Spec Loading
   Read MCP server URL or OpenAPI spec file

2. Tool Definition Extraction
   Parse tool names, parameters, and descriptions from schema

3. Argument Parser Generation
   Dynamically create CLI commands for each tool (no codegen, runtime only)

4. Execution
   Forward as HTTP or tool-call request to the MCP server

Installation and Basic Usage

# Install
pip install mcp2cli

# List available tools (~16 tokens/tool)
mcp2cli --mcp https://server.url/sse --list

# Get specific tool details (~120 tokens, only when needed)
mcp2cli --mcp https://server.url/sse search-files --help

# Use with OpenAPI spec
mcp2cli --spec api.json --base-url https://api.com list-items

# TOON format (Token-Optimized Output Notation)
mcp2cli --mcp https://server.url/sse search-files --toon

Zero Codegen: Why It Matters

mcp2cli reads specs at runtime and generates the CLI dynamically. No code generation means:

  • New tools added to the MCP server appear automatically on the next invocation
  • No spec files to commit or maintain
  • Intelligent 1-hour TTL caching prevents unnecessary reloads

Engineering Manager’s Perspective: Adoption Strategy

Calculating the Business Impact

Assume your team operates an AI agent with 100 MCP tools integrated.

Native MCP (1,000 conversations/day, 20 turns avg):
  100 tools × 121 tokens × 20 turns × 1,000 conversations = 242,000,000 tokens/day

After mcp2cli (98% savings):
  ~4,840,000 tokens/day

Difference: 237,160,000 tokens/day
At Claude Sonnet 4.6 pricing ($3/MTok): ~$711/day saved, ~$21,000/month

Beyond cost, keeping the context window clean directly affects model reasoning quality and latency.

Understanding the Trade-offs

mcp2cli isn’t a silver bullet. The Hacker News discussion (133 upvotes, 92 comments) surfaced key concerns:

Additional round-trips: The model needs a separate --help call the first time it uses a tool. For short tasks, this can actually increase latency.

Discovery error potential: The model might try incorrect tool names or misinterpret --help output.

Optimal use cases: Tools 20+, conversations 10+ turns, with most tools unused on most turns.

Adoption Roadmap

Step 1: Measure
   Track actual tool schema token consumption in current AI agents
   (Check system prompt token count in conversation logs)

Step 2: Pilot
   Apply mcp2cli to the one agent with the most MCP tool integrations
   A/B test: compare cost, accuracy, and latency

Step 3: Analyze
   Identify which tools are actually used frequently
   Consider hybrid: preload frequent tools, on-demand for the rest

Step 4: Scale
   Roll out to all agents after validating effectiveness

What the Hacker News Community Said

Reactions were mixed, which is worth understanding:

Positive responses:

  • “Applying the lazy loading pattern to LLM tool discovery is elegant”
  • “This could be a game changer for large-scale MCP environments”

Critical responses:

  • “Token savings don’t automatically guarantee better outputs”
  • “Extra round-trips for tool discovery increase latency and introduce potential for errors”
  • “Benchmarks skew toward ideal scenarios”

In practice, validate against your actual workloads rather than trusting benchmarks at face value.

Production Considerations

MCP Server Type Compatibility

✅ HTTP/SSE MCP servers: Full support
✅ stdio MCP servers: Supported
✅ OpenAPI JSON/YAML: Supported
⚠️  Auth-required servers: Built-in OAuth support, requires configuration

Caching Strategy

# Default caching: 1-hour TTL
mcp2cli --mcp server.url --cache-ttl 3600 --list

# Force refresh
mcp2cli --mcp server.url --no-cache --list

Use --no-cache in development where specs change frequently; increase TTL in stable production environments.

Takeaway

The problem mcp2cli solves is simple but real. As the MCP ecosystem matures and the number of integrated servers and tools grows, schema injection costs don’t scale linearly—they grow with every tool and every turn.

  • 30 tools: May not justify the change
  • 80+ tools: Monthly costs start to look noticeably different
  • 120+ tools: This becomes a survival strategy, not just optimization

Beyond token savings, keeping the context window clean has a positive effect on actual model reasoning quality. Reducing noise in the context window is becoming as important as prompt engineering itself.


References

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.