Skip to content

fix(zen): stop double-counting reasoning_tokens in oa-compat usage#24441

Open
tiffanychum wants to merge 1 commit intoanomalyco:devfrom
tiffanychum:fix/24268-zen-reasoning-tokens-double-count
Open

fix(zen): stop double-counting reasoning_tokens in oa-compat usage#24441
tiffanychum wants to merge 1 commit intoanomalyco:devfrom
tiffanychum:fix/24268-zen-reasoning-tokens-double-count

Conversation

@tiffanychum
Copy link
Copy Markdown

@tiffanychum tiffanychum commented Apr 26, 2026

Issue for this PR

Closes #24268

Re-opens #24367 (auto-closed by needs:compliance for missing template sections); also addresses follow-up review feedback from the reporter.

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Stops Zen from double-counting reasoning tokens for OpenAI-compatible providers (Moonshot, Kimi, etc.) so the user-facing usage panel and cost calculation match what the upstream API actually billed.

Per the OpenAI chat-completions usage spec, completion_tokens already includes completion_tokens_details.reasoning_tokens. Zen's oaCompatHelper.normalizeUsage (in packages/console/app/src/routes/zen/util/provider/openai-compatible.ts) was reporting outputTokens = completion_tokens and reasoningTokens = reasoning_tokens without subtracting, so when downstream calculateCost bills outputCost + reasoningCost separately at the same cost.output rate, reasoning was billed twice.

The reporter's evidence:

  • API: prompt_tokens=22, completion_tokens=77, completion_tokens_details.reasoning_tokens=78 for a single "Hi"
  • Local opencode db: output: 436, reasoning: 790 for a real session (correctly excludes reasoning from output)
  • Zen Usage console (broken): output: 1226, reasoning: 790, total: 2016 — i.e. completion_tokens (1226) + reasoning_tokens (790), double-counting
  • Cost was being computed off the inflated 2016 total

The fix mirrors openaiHelper.normalizeUsage from the OpenAI Responses helper (openai.ts) and subtracts reasoning_tokens from completion_tokens before returning. It then enforces the invariant outputTokens + reasoningTokens === completion_tokens, which is what the upstream API actually charges against. To make that hold even under the OA-compat provider quirk where reasoning_tokens > completion_tokens (e.g. Moonshot Kimi K2.6 returning reasoning=78, completion=77), the PR clamps reasoningTokens down to completion_tokens. So for the reporter's "Hi" case:

  • Before this PR: outputTokens=77, reasoningTokens=78 → bills 155 (double-counts; also exceeds the 77 the upstream API charged)
  • Subtracting only: outputTokens=0, reasoningTokens=78 → bills 78 (still 1 unit over what upstream charged)
  • This PR: outputTokens=0, reasoningTokens=77 → bills 77 (matches upstream exactly)

Clamping reasoning (not just flooring output at 0) is the refinement raised by the reporter @ceshine in review of the previous attempt at this fix in #24367, and it keeps the invariant the upstream API charges against. Once reasoning is clamped, the Math.max(0, …) on output is no longer needed since reasoning <= completion guarantees output >= 0.

I deliberately did not also touch openaiHelper.normalizeUsage (which already subtracts but does not clamp), since the OpenAI Responses API doesn't exhibit the reasoning > completion quirk and changing it is out of scope.

How did you verify your code works?

Added 4 unit tests in packages/console/app/test/zen-usage.test.ts that lock in the wire-level invariant; all 4 fail against the old code and pass after this fix:

  • Reporter's real session: completion=1226, reasoning=790outputTokens=436, reasoningTokens=790, sum=1226
  • Reporter's "Hi" with inverted ratio: completion=77, reasoning=78outputTokens=0, reasoningTokens=77, sum=77 (clamped, matches upstream)
  • No reasoning at all: outputTokens=77 left untouched, reasoningTokens undefined
  • Parity with openaiHelper.normalizeUsage for the same logical usage

Then ran the full console/app suite and full repo typecheck:

  • bun test in packages/console/app/ — 7/7 pass
  • bun turbo typecheck from repo root — all 13 tasks successful, no new lint errors on the modified file

Screenshots / recordings

N/A — backend billing-math change, no UI surface.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

…nomalyco#24268)

The OpenAI chat-completions usage spec says `completion_tokens` already
includes `completion_tokens_details.reasoning_tokens`. Zen's downstream
`calculateCost` bills `outputCost + reasoningCost` separately, so when
the oa-compat normalizer reported `outputTokens = completion_tokens` and
`reasoningTokens = reasoning_tokens`, reasoning was billed twice.

Mirror the OpenAI Responses helper (openai.ts) and subtract reasoning
from completion before returning. Clamp at 0 because some providers
(e.g. Moonshot Kimi K2.6) report `reasoning_tokens > completion_tokens`.

Adds unit tests for the reporter's exact payloads:
- Kimi K2.6 "Hi": prompt 22 / completion 77 / reasoning 78 -> output 0
- Real session: prompt N / completion 1226 / reasoning 790 -> output 436
- No-reasoning case: outputTokens unchanged
- Parity with `openaiHelper.normalizeUsage` for the same logical usage
@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Apr 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: OpenCode Go/Zen appears to overcount output tokens when reasoning tokens are present

1 participant