Problem
Callers building real-time context fill indicators (fuel gauges) must handle backend-specific differences that should be abstracted by the library. A broader review of the MCP diagnostic server (the first non-trivial consumer of agentrun) reveals additional abstraction leaks beyond context fill that force every consumer to manage engine-level complexity.
Current state — context fill
| Aspect |
CLI (Claude) |
ACP (OpenCode) |
| Message type carrying signal |
MessageText, MessageThinking |
MessageContextWindow |
ContextUsedTokens |
Per-call fill (input+cache) |
Session-level fill |
ContextSizeTokens |
0 (not reported) |
Reported (e.g., 1048576) |
| Can compute percentage? |
No |
Yes |
What callers have to do today
// Must handle two different message types
switch msg.Type {
case agentrun.MessageContextWindow:
// ACP: dedicated message, has both size and used
pct := float64(msg.Usage.ContextUsedTokens) / float64(msg.Usage.ContextSizeTokens)
case agentrun.MessageText, agentrun.MessageThinking:
// CLI: usage piggybacks on content messages, no ContextSizeTokens
if msg.Usage != nil && msg.Usage.ContextUsedTokens > 0 {
// absolute number only — can't compute percentage
}
}
What callers should be able to do
case agentrun.MessageContextWindow:
pct := float64(msg.Usage.ContextUsedTokens) / float64(msg.Usage.ContextSizeTokens)
updateFuelGauge(pct)
Same message type, same fields, all backends.
Abstraction leaks found in the MCP server
Reviewing cmd/agentrun-mcp/ as the first real consumer exposed four additional complexity leaks beyond context fill:
1. spawnPerTurn branching — duplicated in every consumer
The MCP server branches on spawnPerTurn in three places (doRunTurn, doSessionStart, doSessionSend):
if spawnPerTurn {
drainSpawnPerTurn(ctx, proc, handler)
} else {
turnErr = agentrun.RunTurn(ctx, proc, input.Prompt, handler)
}
Every consumer (MCP server, Foundry, future orchestrators) must know whether a backend is spawn-per-turn vs streaming and branch accordingly. RunTurn should handle both engine types internally.
2. makeEngine returns a boolean consumers must carry forever
makeEngine returns (agentrun.Engine, bool, error) where the bool is spawnPerTurn. This gets stored on sessionEntry and checked on every session_send. The engine knows its own semantics, but that knowledge doesn't survive the agentrun.Engine interface boundary — consumers track it externally.
3. No unified turn summary
After a turn, callers want: text, thinking, tool calls, usage, stop reason, denials. Currently they iterate all messages and switch on type. The MCP server returns raw message arrays with no summarization. A TurnSummary type or helper would let consumers avoid reimplementing message-iteration logic.
4. Context fill requires filtering on 3+ message types
Per the core issue — CLI puts ContextUsedTokens on MessageText/MessageThinking, ACP puts it on MessageContextWindow. The MCP server passes these through without normalization. A caller monitoring fill has to know which types carry the signal per backend.
Underlying causes — context fill
-
Different message types carry the signal. ACP emits a dedicated MessageContextWindow. CLI piggybacks ContextUsedTokens onto content messages. Callers must know which types to listen on per backend.
-
CLI lacks ContextSizeTokens. Claude CLI doesn't report context window capacity mid-turn. Without the denominator, callers can't compute fill percentage. However, Claude CLI's result event includes modelUsage.<model>.contextWindow (e.g., 200000), which could be captured from init or first-event metadata.
-
Semantic gap. CLI's ContextUsedTokens is per-API-call input-side fill (InputTokens + CacheReadTokens + CacheWriteTokens). ACP's is session-level context fill. Both answer "how full is the context?" but the measurement differs.
Design considerations
Context fill unification
- CLI backends that report per-call usage mid-turn (Claude) should synthesize
MessageContextWindow messages — same as ACP already does natively.
ContextSizeTokens could be populated from model metadata if available (Claude CLI contextWindow field, ACP usage_update.size). When not available, consumers fall back to absolute values.
- The synthesized
MessageContextWindow should not duplicate data already on content messages — it's a separate signal.
- Backends that don't report per-call usage mid-turn (Codex, OpenCode CLI) would simply not emit
MessageContextWindow, same as today.
RunTurn unification
RunTurn should accept all engine types and handle the send+drain vs drain-only branching internally.
- The
Process interface or engine metadata should expose send semantics so RunTurn can branch — consumers should not.
- This eliminates
spawnPerTurn as an external concept entirely.
Turn summary
- A
TurnSummary helper or type in the root package would collect text, thinking, tool calls, usage, stop reason, and denials from a message stream.
- Both the MCP server and Foundry currently reimplement this iteration logic.
Proposed simplification summary
| Change |
Where |
Impact |
Synthesize MessageContextWindow from CLI |
engine/cli/process.go |
Unified message type for context fill |
Populate ContextSizeTokens from model metadata |
engine/cli/claude/parse.go |
Percentage-based fuel gauge for CLI |
RunTurn handles all engine types |
runturn.go |
Eliminates spawnPerTurn branching in every consumer |
| Expose send semantics on Process/Engine |
agentrun.go |
Consumers don't carry external booleans |
TurnSummary helper |
root package |
Consumers iterate once, get structured result |
Related
Problem
Callers building real-time context fill indicators (fuel gauges) must handle backend-specific differences that should be abstracted by the library. A broader review of the MCP diagnostic server (the first non-trivial consumer of agentrun) reveals additional abstraction leaks beyond context fill that force every consumer to manage engine-level complexity.
Current state — context fill
MessageText,MessageThinkingMessageContextWindowContextUsedTokensContextSizeTokensWhat callers have to do today
What callers should be able to do
Same message type, same fields, all backends.
Abstraction leaks found in the MCP server
Reviewing
cmd/agentrun-mcp/as the first real consumer exposed four additional complexity leaks beyond context fill:1.
spawnPerTurnbranching — duplicated in every consumerThe MCP server branches on
spawnPerTurnin three places (doRunTurn,doSessionStart,doSessionSend):Every consumer (MCP server, Foundry, future orchestrators) must know whether a backend is spawn-per-turn vs streaming and branch accordingly.
RunTurnshould handle both engine types internally.2.
makeEnginereturns a boolean consumers must carry forevermakeEnginereturns(agentrun.Engine, bool, error)where the bool isspawnPerTurn. This gets stored onsessionEntryand checked on everysession_send. The engine knows its own semantics, but that knowledge doesn't survive theagentrun.Engineinterface boundary — consumers track it externally.3. No unified turn summary
After a turn, callers want: text, thinking, tool calls, usage, stop reason, denials. Currently they iterate all messages and switch on type. The MCP server returns raw message arrays with no summarization. A
TurnSummarytype or helper would let consumers avoid reimplementing message-iteration logic.4. Context fill requires filtering on 3+ message types
Per the core issue — CLI puts
ContextUsedTokensonMessageText/MessageThinking, ACP puts it onMessageContextWindow. The MCP server passes these through without normalization. A caller monitoring fill has to know which types carry the signal per backend.Underlying causes — context fill
Different message types carry the signal. ACP emits a dedicated
MessageContextWindow. CLI piggybacksContextUsedTokensonto content messages. Callers must know which types to listen on per backend.CLI lacks
ContextSizeTokens. Claude CLI doesn't report context window capacity mid-turn. Without the denominator, callers can't compute fill percentage. However, Claude CLI's result event includesmodelUsage.<model>.contextWindow(e.g.,200000), which could be captured from init or first-event metadata.Semantic gap. CLI's
ContextUsedTokensis per-API-call input-side fill (InputTokens + CacheReadTokens + CacheWriteTokens). ACP's is session-level context fill. Both answer "how full is the context?" but the measurement differs.Design considerations
Context fill unification
MessageContextWindowmessages — same as ACP already does natively.ContextSizeTokenscould be populated from model metadata if available (Claude CLIcontextWindowfield, ACPusage_update.size). When not available, consumers fall back to absolute values.MessageContextWindowshould not duplicate data already on content messages — it's a separate signal.MessageContextWindow, same as today.RunTurnunificationRunTurnshould accept all engine types and handle the send+drain vs drain-only branching internally.Processinterface or engine metadata should expose send semantics soRunTurncan branch — consumers should not.spawnPerTurnas an external concept entirely.Turn summary
TurnSummaryhelper or type in the root package would collect text, thinking, tool calls, usage, stop reason, and denials from a message stream.Proposed simplification summary
MessageContextWindowfrom CLIengine/cli/process.goContextSizeTokensfrom model metadataengine/cli/claude/parse.goRunTurnhandles all engine typesrunturn.gospawnPerTurnbranching in every consumeragentrun.goTurnSummaryhelperRelated
ContextUsedTokenson mid-turn messages (implemented, immediate improvement)