Skip to content

Unify real-time context fill signal across CLI and ACP backends #40

@dmora

Description

@dmora

Problem

Callers building real-time context fill indicators (fuel gauges) must handle backend-specific differences that should be abstracted by the library. A broader review of the MCP diagnostic server (the first non-trivial consumer of agentrun) reveals additional abstraction leaks beyond context fill that force every consumer to manage engine-level complexity.

Current state — context fill

Aspect CLI (Claude) ACP (OpenCode)
Message type carrying signal MessageText, MessageThinking MessageContextWindow
ContextUsedTokens Per-call fill (input+cache) Session-level fill
ContextSizeTokens 0 (not reported) Reported (e.g., 1048576)
Can compute percentage? No Yes

What callers have to do today

// Must handle two different message types
switch msg.Type {
case agentrun.MessageContextWindow:
    // ACP: dedicated message, has both size and used
    pct := float64(msg.Usage.ContextUsedTokens) / float64(msg.Usage.ContextSizeTokens)

case agentrun.MessageText, agentrun.MessageThinking:
    // CLI: usage piggybacks on content messages, no ContextSizeTokens
    if msg.Usage != nil && msg.Usage.ContextUsedTokens > 0 {
        // absolute number only — can't compute percentage
    }
}

What callers should be able to do

case agentrun.MessageContextWindow:
    pct := float64(msg.Usage.ContextUsedTokens) / float64(msg.Usage.ContextSizeTokens)
    updateFuelGauge(pct)

Same message type, same fields, all backends.

Abstraction leaks found in the MCP server

Reviewing cmd/agentrun-mcp/ as the first real consumer exposed four additional complexity leaks beyond context fill:

1. spawnPerTurn branching — duplicated in every consumer

The MCP server branches on spawnPerTurn in three places (doRunTurn, doSessionStart, doSessionSend):

if spawnPerTurn {
    drainSpawnPerTurn(ctx, proc, handler)
} else {
    turnErr = agentrun.RunTurn(ctx, proc, input.Prompt, handler)
}

Every consumer (MCP server, Foundry, future orchestrators) must know whether a backend is spawn-per-turn vs streaming and branch accordingly. RunTurn should handle both engine types internally.

2. makeEngine returns a boolean consumers must carry forever

makeEngine returns (agentrun.Engine, bool, error) where the bool is spawnPerTurn. This gets stored on sessionEntry and checked on every session_send. The engine knows its own semantics, but that knowledge doesn't survive the agentrun.Engine interface boundary — consumers track it externally.

3. No unified turn summary

After a turn, callers want: text, thinking, tool calls, usage, stop reason, denials. Currently they iterate all messages and switch on type. The MCP server returns raw message arrays with no summarization. A TurnSummary type or helper would let consumers avoid reimplementing message-iteration logic.

4. Context fill requires filtering on 3+ message types

Per the core issue — CLI puts ContextUsedTokens on MessageText/MessageThinking, ACP puts it on MessageContextWindow. The MCP server passes these through without normalization. A caller monitoring fill has to know which types carry the signal per backend.

Underlying causes — context fill

  1. Different message types carry the signal. ACP emits a dedicated MessageContextWindow. CLI piggybacks ContextUsedTokens onto content messages. Callers must know which types to listen on per backend.

  2. CLI lacks ContextSizeTokens. Claude CLI doesn't report context window capacity mid-turn. Without the denominator, callers can't compute fill percentage. However, Claude CLI's result event includes modelUsage.<model>.contextWindow (e.g., 200000), which could be captured from init or first-event metadata.

  3. Semantic gap. CLI's ContextUsedTokens is per-API-call input-side fill (InputTokens + CacheReadTokens + CacheWriteTokens). ACP's is session-level context fill. Both answer "how full is the context?" but the measurement differs.

Design considerations

Context fill unification

  • CLI backends that report per-call usage mid-turn (Claude) should synthesize MessageContextWindow messages — same as ACP already does natively.
  • ContextSizeTokens could be populated from model metadata if available (Claude CLI contextWindow field, ACP usage_update.size). When not available, consumers fall back to absolute values.
  • The synthesized MessageContextWindow should not duplicate data already on content messages — it's a separate signal.
  • Backends that don't report per-call usage mid-turn (Codex, OpenCode CLI) would simply not emit MessageContextWindow, same as today.

RunTurn unification

  • RunTurn should accept all engine types and handle the send+drain vs drain-only branching internally.
  • The Process interface or engine metadata should expose send semantics so RunTurn can branch — consumers should not.
  • This eliminates spawnPerTurn as an external concept entirely.

Turn summary

  • A TurnSummary helper or type in the root package would collect text, thinking, tool calls, usage, stop reason, and denials from a message stream.
  • Both the MCP server and Foundry currently reimplement this iteration logic.

Proposed simplification summary

Change Where Impact
Synthesize MessageContextWindow from CLI engine/cli/process.go Unified message type for context fill
Populate ContextSizeTokens from model metadata engine/cli/claude/parse.go Percentage-based fuel gauge for CLI
RunTurn handles all engine types runturn.go Eliminates spawnPerTurn branching in every consumer
Expose send semantics on Process/Engine agentrun.go Consumers don't carry external booleans
TurnSummary helper root package Consumers iterate once, get structured result

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions