Code execution with MCP is powerful—agents that write code to call tools directly can be 100x cheaper and handle workflows that would blow past context limits. But there's a gap many teams miss: observability doesn't come for free.
This example demonstrates the Traced Gateway Pattern—a simple instrumentation layer that gives you full visibility without sacrificing efficiency.
The Anthropic article Code execution with MCP explains how to make agents more efficient. What it doesn't cover is what happens when you lose visibility into your agent's operations.
The Anthropic article teaches:
- Progressive disclosure (tools as files on filesystem)
- On-demand tool loading
- Code generation for tool calling
- Context efficiency through local data processing
This example adds:
- The Traced Gateway Pattern for instrumenting code execution
- How to trace what happens inside the "black box"
- A toggle (
--no-mcp-tracing) to demonstrate the observability gap
This demo simulates the MCP pattern for educational purposes. Here's what's actually happening:
Real:
- Filesystem exploration of
servers/directory (reads actual local files) - LLM code generation via Anthropic API (real Claude call)
- Langfuse tracing (creates real observable traces)
- The execution pattern and architecture
Simulated/Mocked:
- MCP tool implementations - no actual MCP server connections
- Tool data -
get_sheet.pygenerates random mock data (100 fake rows) withtime.sleep()to simulate API latency - No actual Google Drive or Salesforce API calls
The purpose is to demonstrate the pattern - progressive disclosure, on-demand tool loading, code generation, and local data processing - without requiring real external service credentials. The observability instrumentation is fully real and shows exactly how you would trace a production implementation.
With traditional tool calling, every operation flows through the LLM:
LLM → Tool Call → Result → LLM → Tool Call → Result → LLM
↑ ↑ ↑
(visible) (visible) (visible)
Your tracing tool sees everything. With code execution:
LLM → Generate Code → [SANDBOX EXECUTION] → Summary
↑
(invisible black box)
You've traded visibility for efficiency. And that trade has real consequences:
- Silent runaway costs — An agent with a bug calls Salesforce 1,000 times instead of 10. You don't notice until the API bill arrives.
- Undetectable regressions — Tool usage patterns shift after a prompt change, but your evals only check final outputs.
- Unanswerable questions — Finance asks why costs spiked. Your traces show "code_execution: success" with no children.
- Impossible debugging — A user reports wrong records updated. The trace shows success but no details about what actually happened.
This isn't a minor inconvenience—it's a production risk.
The fix is simple: enforce a traced gateway at the execution boundary. Every MCP tool call must pass through an instrumented wrapper:
def call_mcp_tool(tool_name: str, input_data: dict) -> dict:
"""Every MCP call passes through here—with tracing."""
with langfuse.start_as_current_observation(
as_type="tool",
name=f"mcp.{tool_name}",
input=input_data,
) as span:
result = TOOL_IMPLEMENTATIONS[tool_name](input_data)
span.update(output={"success": True})
return resultThe agent's generated code calls call_mcp_tool(), which is injected into the sandbox. No direct tool access—everything flows through the Traced Gateway.
With the pattern implemented, your traces show the full hierarchy:
Agent: efficient-mcp-workflow
├─ Tool: explore_servers ← Progressive disclosure
├─ Tool: read_tool_definition ← On-demand loading (x2)
├─ Generation: generate_code ← LLM generates code
└─ Span: code_execution ← Sandbox execution
├─ Tool: mcp.google_drive.get_sheet ← MCP tool VISIBLE!
└─ Tool: mcp.salesforce.batch_update ← MCP tool VISIBLE!
Without instrumentation, the code_execution span would show no children.
# Install dependencies
pip install langfuse anthropic python-dotenv
# Configure environment
cp .env.example .env
# Edit .env with your API keys:
# LANGFUSE_SECRET_KEY=sk-lf-...
# LANGFUSE_PUBLIC_KEY=pk-lf-...
# ANTHROPIC_API_KEY=sk-ant-...
# Run with full tracing (default)
python main.py
# Run WITHOUT MCP tracing to see the black box
python main.py --no-mcp-tracingEvery MCP tool must be called through the instrumented gateway. No exceptions:
# mcp_client.py
def call_mcp_tool(tool_name: str, input_data: dict) -> dict:
with langfuse.start_as_current_observation(
as_type="tool",
name=f"mcp.{tool_name}",
input=input_data,
) as span:
# ... execute and traceThe sandbox execution environment needs the tracing context. We inject call_mcp_tool into the exec globals:
exec_globals = {
"__builtins__": __builtins__,
"call_mcp_tool": call_mcp_tool, # Injected with tracing
}
exec(generated_code, exec_globals)Wrapping code execution in a parent span ensures MCP calls nest correctly:
with langfuse.start_as_current_observation(name="code_execution"):
exec(code, exec_globals) # MCP calls inside become childrenThe --no-mcp-tracing flag demonstrates what happens without instrumentation:
# With tracing
code_execution
├─ mcp.google_drive.get_sheet
└─ mcp.salesforce.batch_update
# Without tracing
code_execution
└─ (nothing visible)
mcp-tracing-example/
├── main.py # Workflow demonstrating all 4 steps
├── mcp_client.py # Instrumented MCP client + code execution
├── servers/ # MCP tools as explorable files
│ ├── google_drive/
│ │ ├── get_sheet.py
│ │ └── list_files.py
│ └── salesforce/
│ ├── batch_update.py
│ └── update_record.py
├── .env.example
└── README.md
The Anthropic article's efficiency claims hold:
| Approach | Tokens | Why |
|---|---|---|
| Traditional | ~11,300 | All tools + all data through model |
| Code Execution | ~1,500 | Only needed tools + summary only |
Savings: ~87% - and with this example, you maintain full observability.
MCP code execution makes agents cheaper and more powerful. But observability doesn't come for free—you have to re-introduce it deliberately at the execution boundary.
If code executes outside the LLM, tracing must move there too.
The Traced Gateway pattern solves this. Efficiency and observability aren't a trade-off—you can have both.
- Code execution with MCP: Building more efficient agents - Anthropic (Nov 2025)
- Model Context Protocol - MCP Documentation
- LangFuse - Observability Platform