Tracker: upstream ACP cost and token accuracy gaps

## Summary

When OpenHands drives Claude Code, Gemini CLI, or Codex via ACP, the `UsageUpdate.cost` and `PromptResponse.usage` fields we receive are unreliable, so the metrics we propagate into SDK agent reports are wrong. Benchmarks work around this by routing ACP traffic through a LiteLLM proxy and using virtual-key spend as ground truth (`--use-proxy-costs`).

This tracker collects the upstream issues and PRs I've filed so we can see where each provider stands. Only the three ACP providers OpenHands actually supports are in scope: **Claude Code, Gemini CLI, Codex** (no Amp/Cursor).

## Situation per provider

### Codex (`zed-industries/codex-acp` + `openai/codex`)

The least broken path. Codex CLI emits `TokenCountEvent` with token counts but not cost. codex-acp currently populates neither `PromptResponse.usage` nor `UsageUpdate.cost`. PRs are open to wire both up; the cost side depends on Codex CLI exposing session cost upstream.

- [openai/codex#16258](https://github.com/openai/codex/issues/16258) — Expose cumulative session cost in TokenCountEvent / event stream *(root dep)*
- [zed-industries/codex-acp#209](https://github.com/zed-industries/codex-acp/issues/209) — UsageUpdate.cost is still not populated for codex-acp
- [zed-industries/codex-acp#210](https://github.com/zed-industries/codex-acp/pull/210) — Populate PromptResponse.usage from Codex token usage *(draft PR)*
- [zed-industries/codex-acp#216](https://github.com/zed-industries/codex-acp/pull/216) — feat: populate UsageUpdate.cost with estimated session cost *(open PR)*

### Claude Code (`anthropics/claude-code` via `agentclientprotocol/claude-agent-acp`)

Telemetry is populated, but priced wrong: `UsageUpdate.cost` uses Sonnet rates even for Opus 4.6 runs, so Opus costs are under-reported by ~3/5. Evidence: benchmarks#583 — Opus 4.6 token accumulation doesn't match reported cost. Retroactive correction of historical artifacts depends on an upstream fix, because we need to know what the real rate should have been.

- [anthropics/claude-code#39850](https://github.com/anthropics/claude-code/issues/39850) — ACP UsageUpdate.cost reports Sonnet pricing for Opus 4.6 model

### Gemini CLI (`google-gemini/gemini-cli`)

No telemetry at all over ACP — `response.usage` is None and there's no `UsageUpdate`. Tokens are available only in a non-standard `_meta.quota.token_count` field that the ACP SDK strips on serialization.

- [google-gemini/gemini-cli#24280](https://github.com/google-gemini/gemini-cli/issues/24280) — ACP: populate PromptResponse.usage and send UsageUpdate with cost

### Spec (context)

- [agentclientprotocol/agent-client-protocol#890](https://github.com/agentclientprotocol/agent-client-protocol/issues/890) — Clarify that PromptResponse.usage and UsageUpdate.cost are expected for all servers *(closed)*

## Ground-truth workaround today

Benchmarks route ACP traffic through a LiteLLM proxy and use virtual-key spend as the authoritative cost number:
- Canonical divergence write-up: [benchmarks#603](https://github.com/OpenHands/benchmarks/issues/603)
- Implementation: [benchmarks#595](https://github.com/OpenHands/benchmarks/pull/595), follow-ups #605/#606/#611/#638
- SDK proxy routing: [sdk#2326](https://github.com/OpenHands/software-agent-sdk/pull/2326)
- Eval surface: [evaluation#432](https://github.com/OpenHands/evaluation/pull/432)

SDK-reported ACP costs should be treated as indicative only until all three upstreams ship.

## Replaces

- #2583 (acp-codex zero telemetry) — subsumed by the Codex section above
- #2592 (Opus 4.6 retroactive cost fix) — subsumed by the Claude Code section above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker: upstream ACP cost and token accuracy gaps #2880

Summary

Situation per provider

Codex (`zed-industries/codex-acp` + `openai/codex`)

Claude Code (`anthropics/claude-code` via `agentclientprotocol/claude-agent-acp`)

Gemini CLI (`google-gemini/gemini-cli`)

Spec (context)

Ground-truth workaround today

Replaces

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tracker: upstream ACP cost and token accuracy gaps #2880

Description

Summary

Situation per provider

Codex (zed-industries/codex-acp + openai/codex)

Claude Code (anthropics/claude-code via agentclientprotocol/claude-agent-acp)

Gemini CLI (google-gemini/gemini-cli)

Spec (context)

Ground-truth workaround today

Replaces

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Codex (`zed-industries/codex-acp` + `openai/codex`)

Claude Code (`anthropics/claude-code` via `agentclientprotocol/claude-agent-acp`)

Gemini CLI (`google-gemini/gemini-cli`)