Problem
The current workflow for debugging a failed agent session is manual: run agent-strace replay or agent-strace stats, read the output, and reason about what went wrong. For complex sessions with hundreds of tool calls this is slow — you're scanning a trace log the same way you'd scan a stack trace without a debugger.
The gap: traces are captured but not queryable. You can't ask "why did the agent call the same tool three times?" or "what was in context when it made the wrong file edit?" without reading the full replay.
Proposed solution
Expose the .agent-traces/ session store as an MCP server. This lets any MCP-compatible client (Claude Code, Cursor, VS Code Copilot) query traces conversationally — the agent reads its own execution history and surfaces what went wrong.
Example interaction:
User: Look at the most recent session and tell me why it called bash three times in a row.
Agent (via MCP): The agent called bash at t=12s, t=34s, and t=41s.
- First call: `npm test` — exit code 1, stderr: "Cannot find module './auth'"
- Second call: `ls src/` — checking if the file exists
- Third call: `npm test` again — same failure
It never resolved the missing module. The session ended without a fix.
No manual log reading. The debugging agent does the analysis.
MCP tools to expose
| Tool |
Description |
list_sessions |
List captured sessions with metadata (timestamp, tool call count, cost, duration) |
get_session |
Return full event stream for a session as structured JSON |
search_events |
Filter events by tool name, exit code, file path, or time range |
get_session_summary |
Plain-English summary of what the agent did (wraps agent-strace explain) |
diff_sessions |
Compare two sessions — what changed between runs |
Implementation sketch
# agent_strace/mcp_server.py
from mcp.server import Server
from mcp.types import Tool, TextContent
from .store import SessionStore
server = Server("agent-trace")
store = SessionStore()
@server.list_tools()
async def list_tools():
return [
Tool(name="list_sessions", description="List captured agent sessions", inputSchema={...}),
Tool(name="get_session", description="Get full event stream for a session", inputSchema={...}),
Tool(name="search_events", description="Filter events by tool, file, or exit code", inputSchema={...}),
Tool(name="get_session_summary", description="Plain-English summary of a session", inputSchema={...}),
Tool(name="diff_sessions", description="Compare two sessions", inputSchema={...}),
]
@server.call_tool()
async def call_tool(name: str, arguments: dict):
if name == "list_sessions":
sessions = store.list()
return [TextContent(type="text", text=format_sessions(sessions))]
# ...
CLI entry point:
agent-strace mcp # start MCP server (stdio transport)
agent-strace mcp --port 8080 # SSE transport
Claude Code config:
{
"mcpServers": {
"agent-trace": {
"command": "agent-strace",
"args": ["mcp"]
}
}
}
Why this matters
- Debugging a failed agent session currently requires reading raw JSON or replay output. An MCP interface lets the debugging agent do that work.
diff_sessions is particularly useful for iterative agent development: run the agent, change a prompt or tool, run again, ask "what changed?"
- Zero new dependencies — the MCP SDK is already a common dependency in this ecosystem, and the session store is already structured JSON.
Acceptance criteria
Problem
The current workflow for debugging a failed agent session is manual: run
agent-strace replayoragent-strace stats, read the output, and reason about what went wrong. For complex sessions with hundreds of tool calls this is slow — you're scanning a trace log the same way you'd scan a stack trace without a debugger.The gap: traces are captured but not queryable. You can't ask "why did the agent call the same tool three times?" or "what was in context when it made the wrong file edit?" without reading the full replay.
Proposed solution
Expose the
.agent-traces/session store as an MCP server. This lets any MCP-compatible client (Claude Code, Cursor, VS Code Copilot) query traces conversationally — the agent reads its own execution history and surfaces what went wrong.Example interaction:
No manual log reading. The debugging agent does the analysis.
MCP tools to expose
list_sessionsget_sessionsearch_eventsget_session_summaryagent-strace explain)diff_sessionsImplementation sketch
CLI entry point:
Claude Code config:
{ "mcpServers": { "agent-trace": { "command": "agent-strace", "args": ["mcp"] } } }Why this matters
diff_sessionsis particularly useful for iterative agent development: run the agent, change a prompt or tool, run again, ask "what changed?"Acceptance criteria
agent-strace mcpstarts an MCP server over stdiolist_sessions,get_session,search_events,get_session_summarytools implementeddiff_sessionscompares two sessions and returns a structured diff