Skip to content

Context Window Exhaustion Causes Unterminated Tool Calls #48

@yqxu

Description

@yqxu

Symptom
When accumulated conversation fills the context window (ctx=32768), the model occasionally starts a <|DSML|tool_calls> block but fails to close it before hitting the context limit. The server then rejects the generation:

ctx=22528..31488:8960 gen=1280 DSML_START finish=error error="unterminated tool call"
Total context position at exit: 31488 + 1280 = 32768 — exactly the ctx hard limit.

What happens in code
The decode while loop at ds4_server.c:5409 exits when ds4_session_pos >= ds4_session_ctx.
The post-loop safety check at ds4_server.c:5583-5589 (added in 404f0a1) detects saw_tool_start && !saw_tool_end and flags it as an error.
The user gets a failure with no clear signal about what went wrong.
Why it happens
Tool calls in ds4 are represented as plain text strings (<|DSML|tool_calls>...</|DSML|tool_calls>). The server detects them via post-hoc strstr() scanning on the generated text (ds4_server.c:5525-5533). The model samples tokens freely with zero awareness of its remaining context budget. When it starts a tool call late in the generation, nothing prevents it from running out of room before writing the closing tag.

Affected code paths
ds4_server.c:5390-5574 — main decode loop, context budget gating
ds4_server.c:5523-5534 — text-based tool_calls_started / tool_calls_finished detection
ds4_server.c:5582-5589 — unterminated tool call rejection

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions