Bug Description
When streaming is enabled (stream: true), the final SSE chunk always returns usage: {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0} instead of actual token counts. Non-streaming responses work correctly.
This breaks token/cost tracking in downstream clients like OpenCode.
Root Cause
Two issues in server.py (_stream_litellm function):
1. Missing stream_options parameter (line ~1448)
The LiteLLM acompletion() call sets stream: True but never passes stream_options: {"include_usage": True}. Without this, the upstream provider does not include token counts in streaming chunks.
Before:
call_kwargs: Dict[str, Any] = {"model": litellm_model, "messages": messages, "stream": True}
Fix:
call_kwargs: Dict[str, Any] = {
"model": litellm_model,
"messages": messages,
"stream": True,
"stream_options": {"include_usage": True},
}
2. Usage-only final chunks are silently dropped (lines ~1485-1488)
Some providers send a final chunk with usage data but empty choices. The current code skips these with if choice is None: continue, discarding the usage data.
Before:
async for chunk in response:
choice = chunk.choices[0] if chunk.choices else None
if choice is None:
continue
# ... usage extraction happens AFTER this skip
Fix: Extract usage BEFORE the choice is None check, and yield usage-only chunks:
async for chunk in response:
usage = None
if hasattr(chunk, "usage") and chunk.usage:
usage = {
"prompt_tokens": chunk.usage.prompt_tokens or 0,
"completion_tokens": chunk.usage.completion_tokens or 0,
}
choice = chunk.choices[0] if chunk.choices else None
if choice is None:
if usage:
yield {}, usage, None
continue
# ... rest unchanged
Verification
After applying both fixes, streaming responses correctly return token counts:
Before: "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
After: "usage": {"prompt_tokens": 40, "completion_tokens": 10, "total_tokens": 50}
Impact
- OpenCode TUI context meter shows "0 tokens, 0% used"
- Cost tracking shows $0.00 for all sessions
- Any OpenAI-compatible client relying on streaming usage data is affected
Affected Version
NadirClaw v0.13.0
Bug Description
When streaming is enabled (
stream: true), the final SSE chunk always returnsusage: {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}instead of actual token counts. Non-streaming responses work correctly.This breaks token/cost tracking in downstream clients like OpenCode.
Root Cause
Two issues in
server.py(_stream_litellmfunction):1. Missing
stream_optionsparameter (line ~1448)The LiteLLM
acompletion()call setsstream: Truebut never passesstream_options: {"include_usage": True}. Without this, the upstream provider does not include token counts in streaming chunks.Before:
Fix:
2. Usage-only final chunks are silently dropped (lines ~1485-1488)
Some providers send a final chunk with usage data but empty
choices. The current code skips these withif choice is None: continue, discarding the usage data.Before:
Fix: Extract usage BEFORE the
choice is Nonecheck, and yield usage-only chunks:Verification
After applying both fixes, streaming responses correctly return token counts:
Impact
Affected Version
NadirClaw v0.13.0