Skip to content

Streaming responses return zero token counts #2

@eugenio

Description

@eugenio

Bug Description

When streaming is enabled (stream: true), the final SSE chunk always returns usage: {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0} instead of actual token counts. Non-streaming responses work correctly.

This breaks token/cost tracking in downstream clients like OpenCode.

Root Cause

Two issues in server.py (_stream_litellm function):

1. Missing stream_options parameter (line ~1448)

The LiteLLM acompletion() call sets stream: True but never passes stream_options: {"include_usage": True}. Without this, the upstream provider does not include token counts in streaming chunks.

Before:

call_kwargs: Dict[str, Any] = {"model": litellm_model, "messages": messages, "stream": True}

Fix:

call_kwargs: Dict[str, Any] = {
    "model": litellm_model,
    "messages": messages,
    "stream": True,
    "stream_options": {"include_usage": True},
}

2. Usage-only final chunks are silently dropped (lines ~1485-1488)

Some providers send a final chunk with usage data but empty choices. The current code skips these with if choice is None: continue, discarding the usage data.

Before:

async for chunk in response:
    choice = chunk.choices[0] if chunk.choices else None
    if choice is None:
        continue
    # ... usage extraction happens AFTER this skip

Fix: Extract usage BEFORE the choice is None check, and yield usage-only chunks:

async for chunk in response:
    usage = None
    if hasattr(chunk, "usage") and chunk.usage:
        usage = {
            "prompt_tokens": chunk.usage.prompt_tokens or 0,
            "completion_tokens": chunk.usage.completion_tokens or 0,
        }

    choice = chunk.choices[0] if chunk.choices else None
    if choice is None:
        if usage:
            yield {}, usage, None
        continue
    # ... rest unchanged

Verification

After applying both fixes, streaming responses correctly return token counts:

Before: "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
After:  "usage": {"prompt_tokens": 40, "completion_tokens": 10, "total_tokens": 50}

Impact

  • OpenCode TUI context meter shows "0 tokens, 0% used"
  • Cost tracking shows $0.00 for all sessions
  • Any OpenAI-compatible client relying on streaming usage data is affected

Affected Version

NadirClaw v0.13.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions