Summary
When OpenHands drives Claude Code, Gemini CLI, or Codex via ACP, the UsageUpdate.cost and PromptResponse.usage fields we receive are unreliable, so the metrics we propagate into SDK agent reports are wrong. Benchmarks work around this by routing ACP traffic through a LiteLLM proxy and using virtual-key spend as ground truth (--use-proxy-costs).
This tracker collects the upstream issues and PRs I've filed so we can see where each provider stands. Only the three ACP providers OpenHands actually supports are in scope: Claude Code, Gemini CLI, Codex (no Amp/Cursor).
Situation per provider
Codex (zed-industries/codex-acp + openai/codex)
The least broken path. Codex CLI emits TokenCountEvent with token counts but not cost. codex-acp currently populates neither PromptResponse.usage nor UsageUpdate.cost. PRs are open to wire both up; the cost side depends on Codex CLI exposing session cost upstream.
Claude Code (anthropics/claude-code via agentclientprotocol/claude-agent-acp)
Telemetry is populated, but priced wrong: UsageUpdate.cost uses Sonnet rates even for Opus 4.6 runs, so Opus costs are under-reported by ~3/5. Evidence: benchmarks#583 — Opus 4.6 token accumulation doesn't match reported cost. Retroactive correction of historical artifacts depends on an upstream fix, because we need to know what the real rate should have been.
Gemini CLI (google-gemini/gemini-cli)
No telemetry at all over ACP — response.usage is None and there's no UsageUpdate. Tokens are available only in a non-standard _meta.quota.token_count field that the ACP SDK strips on serialization.
Spec (context)
Ground-truth workaround today
Benchmarks route ACP traffic through a LiteLLM proxy and use virtual-key spend as the authoritative cost number:
SDK-reported ACP costs should be treated as indicative only until all three upstreams ship.
Replaces
Summary
When OpenHands drives Claude Code, Gemini CLI, or Codex via ACP, the
UsageUpdate.costandPromptResponse.usagefields we receive are unreliable, so the metrics we propagate into SDK agent reports are wrong. Benchmarks work around this by routing ACP traffic through a LiteLLM proxy and using virtual-key spend as ground truth (--use-proxy-costs).This tracker collects the upstream issues and PRs I've filed so we can see where each provider stands. Only the three ACP providers OpenHands actually supports are in scope: Claude Code, Gemini CLI, Codex (no Amp/Cursor).
Situation per provider
Codex (
zed-industries/codex-acp+openai/codex)The least broken path. Codex CLI emits
TokenCountEventwith token counts but not cost. codex-acp currently populates neitherPromptResponse.usagenorUsageUpdate.cost. PRs are open to wire both up; the cost side depends on Codex CLI exposing session cost upstream.Claude Code (
anthropics/claude-codeviaagentclientprotocol/claude-agent-acp)Telemetry is populated, but priced wrong:
UsageUpdate.costuses Sonnet rates even for Opus 4.6 runs, so Opus costs are under-reported by ~3/5. Evidence: benchmarks#583 — Opus 4.6 token accumulation doesn't match reported cost. Retroactive correction of historical artifacts depends on an upstream fix, because we need to know what the real rate should have been.Gemini CLI (
google-gemini/gemini-cli)No telemetry at all over ACP —
response.usageis None and there's noUsageUpdate. Tokens are available only in a non-standard_meta.quota.token_countfield that the ACP SDK strips on serialization.Spec (context)
Ground-truth workaround today
Benchmarks route ACP traffic through a LiteLLM proxy and use virtual-key spend as the authoritative cost number:
SDK-reported ACP costs should be treated as indicative only until all three upstreams ship.
Replaces