Observing Code Execution Agents with MCP

Code execution with MCP is powerful—agents that write code to call tools directly can be 100x cheaper and handle workflows that would blow past context limits. But there's a gap many teams miss: observability doesn't come for free.

This example demonstrates the Traced Gateway Pattern—a simple instrumentation layer that gives you full visibility without sacrificing efficiency.

What This Example Adds Beyond the Anthropic Article

The Anthropic article Code execution with MCP explains how to make agents more efficient. What it doesn't cover is what happens when you lose visibility into your agent's operations.

The Anthropic article teaches:

Progressive disclosure (tools as files on filesystem)
On-demand tool loading
Code generation for tool calling
Context efficiency through local data processing

This example adds:

The Traced Gateway Pattern for instrumenting code execution
How to trace what happens inside the "black box"
A toggle (--no-mcp-tracing) to demonstrate the observability gap

What's Real vs. Simulated

This demo simulates the MCP pattern for educational purposes. Here's what's actually happening:

Real:

Filesystem exploration of servers/ directory (reads actual local files)
LLM code generation via Anthropic API (real Claude call)
Langfuse tracing (creates real observable traces)
The execution pattern and architecture

Simulated/Mocked:

MCP tool implementations - no actual MCP server connections
Tool data - get_sheet.py generates random mock data (100 fake rows) with time.sleep() to simulate API latency
No actual Google Drive or Salesforce API calls

The purpose is to demonstrate the pattern - progressive disclosure, on-demand tool loading, code generation, and local data processing - without requiring real external service credentials. The observability instrumentation is fully real and shows exactly how you would trace a production implementation.

The Observability Problem

With traditional tool calling, every operation flows through the LLM:

LLM → Tool Call → Result → LLM → Tool Call → Result → LLM
        ↑                    ↑                    ↑
    (visible)            (visible)           (visible)

Your tracing tool sees everything. With code execution:

LLM → Generate Code → [SANDBOX EXECUTION] → Summary
                            ↑
                    (invisible black box)

You've traded visibility for efficiency. And that trade has real consequences:

Silent runaway costs — An agent with a bug calls Salesforce 1,000 times instead of 10. You don't notice until the API bill arrives.
Undetectable regressions — Tool usage patterns shift after a prompt change, but your evals only check final outputs.
Unanswerable questions — Finance asks why costs spiked. Your traces show "code_execution: success" with no children.
Impossible debugging — A user reports wrong records updated. The trace shows success but no details about what actually happened.

This isn't a minor inconvenience—it's a production risk.

The Solution: The Traced Gateway Pattern

The fix is simple: enforce a traced gateway at the execution boundary. Every MCP tool call must pass through an instrumented wrapper:

def call_mcp_tool(tool_name: str, input_data: dict) -> dict:
    """Every MCP call passes through here—with tracing."""
    with langfuse.start_as_current_observation(
        as_type="tool",
        name=f"mcp.{tool_name}",
        input=input_data,
    ) as span:
        result = TOOL_IMPLEMENTATIONS[tool_name](input_data)
        span.update(output={"success": True})
        return result

The agent's generated code calls call_mcp_tool(), which is injected into the sandbox. No direct tool access—everything flows through the Traced Gateway.

What the Traced Gateway Reveals

With the pattern implemented, your traces show the full hierarchy:

Agent: efficient-mcp-workflow
├─ Tool: explore_servers              ← Progressive disclosure
├─ Tool: read_tool_definition         ← On-demand loading (x2)
├─ Generation: generate_code          ← LLM generates code
└─ Span: code_execution               ← Sandbox execution
    ├─ Tool: mcp.google_drive.get_sheet   ← MCP tool VISIBLE!
    └─ Tool: mcp.salesforce.batch_update  ← MCP tool VISIBLE!

Without instrumentation, the code_execution span would show no children.

Quick Start

# Install dependencies
pip install langfuse anthropic python-dotenv

# Configure environment
cp .env.example .env
# Edit .env with your API keys:
#   LANGFUSE_SECRET_KEY=sk-lf-...
#   LANGFUSE_PUBLIC_KEY=pk-lf-...
#   ANTHROPIC_API_KEY=sk-ant-...

# Run with full tracing (default)
python main.py

# Run WITHOUT MCP tracing to see the black box
python main.py --no-mcp-tracing

Key Learnings

1. The Traced Gateway is Non-Negotiable

Every MCP tool must be called through the instrumented gateway. No exceptions:

# mcp_client.py
def call_mcp_tool(tool_name: str, input_data: dict) -> dict:
    with langfuse.start_as_current_observation(
        as_type="tool",
        name=f"mcp.{tool_name}",
        input=input_data,
    ) as span:
        # ... execute and trace

2. Context Propagation Through Sandbox

The sandbox execution environment needs the tracing context. We inject call_mcp_tool into the exec globals:

exec_globals = {
    "__builtins__": __builtins__,
    "call_mcp_tool": call_mcp_tool,  # Injected with tracing
}
exec(generated_code, exec_globals)

3. Hierarchical Spans Show Causality

Wrapping code execution in a parent span ensures MCP calls nest correctly:

with langfuse.start_as_current_observation(name="code_execution"):
    exec(code, exec_globals)  # MCP calls inside become children

4. Toggle Tracing to Prove the Gap

The --no-mcp-tracing flag demonstrates what happens without instrumentation:

# With tracing
code_execution
├─ mcp.google_drive.get_sheet
└─ mcp.salesforce.batch_update

# Without tracing
code_execution
└─ (nothing visible)

File Structure

mcp-tracing-example/
├── main.py              # Workflow demonstrating all 4 steps
├── mcp_client.py        # Instrumented MCP client + code execution
├── servers/             # MCP tools as explorable files
│   ├── google_drive/
│   │   ├── get_sheet.py
│   │   └── list_files.py
│   └── salesforce/
│       ├── batch_update.py
│       └── update_record.py
├── .env.example
└── README.md

Token Efficiency (for context)

The Anthropic article's efficiency claims hold:

Approach	Tokens	Why
Traditional	~11,300	All tools + all data through model
Code Execution	~1,500	Only needed tools + summary only

Savings: ~87% - and with this example, you maintain full observability.

Key Takeaway

MCP code execution makes agents cheaper and more powerful. But observability doesn't come for free—you have to re-introduce it deliberately at the execution boundary.

If code executes outside the LLM, tracing must move there too.

The Traced Gateway pattern solves this. Efficiency and observability aren't a trade-off—you can have both.

References

Code execution with MCP: Building more efficient agents - Anthropic (Nov 2025)
Model Context Protocol - MCP Documentation
LangFuse - Observability Platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observing Code Execution Agents with MCP

What This Example Adds Beyond the Anthropic Article

What's Real vs. Simulated

The Observability Problem

The Solution: The Traced Gateway Pattern

What the Traced Gateway Reveals

Quick Start

Key Learnings

1. The Traced Gateway is Non-Negotiable

2. Context Propagation Through Sandbox

3. Hierarchical Spans Show Causality

4. Toggle Tracing to Prove the Gap

File Structure

Token Efficiency (for context)

Key Takeaway

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
servers		servers
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
Tracing Code Execution Agents with MCP The Observa 2f59eace13ee8037b0ddceebb086c59c.md		Tracing Code Execution Agents with MCP The Observa 2f59eace13ee8037b0ddceebb086c59c.md
main.py		main.py
mcp_client.py		mcp_client.py

Folders and files

Latest commit

History

Repository files navigation

Observing Code Execution Agents with MCP

What This Example Adds Beyond the Anthropic Article

What's Real vs. Simulated

The Observability Problem

The Solution: The Traced Gateway Pattern

What the Traced Gateway Reveals

Quick Start

Key Learnings

1. The Traced Gateway is Non-Negotiable

2. Context Propagation Through Sandbox

3. Hierarchical Spans Show Causality

4. Toggle Tracing to Prove the Gap

File Structure

Token Efficiency (for context)

Key Takeaway

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages