Symptom
When accumulated conversation fills the context window (ctx=32768), the model occasionally starts a <|DSML|tool_calls> block but fails to close it before hitting the context limit. The server then rejects the generation:
ctx=22528..31488:8960 gen=1280 DSML_START finish=error error="unterminated tool call"
Total context position at exit: 31488 + 1280 = 32768 — exactly the ctx hard limit.
What happens in code
The decode while loop at ds4_server.c:5409 exits when ds4_session_pos >= ds4_session_ctx.
The post-loop safety check at ds4_server.c:5583-5589 (added in 404f0a1) detects saw_tool_start && !saw_tool_end and flags it as an error.
The user gets a failure with no clear signal about what went wrong.
Why it happens
Tool calls in ds4 are represented as plain text strings (<|DSML|tool_calls>...</|DSML|tool_calls>). The server detects them via post-hoc strstr() scanning on the generated text (ds4_server.c:5525-5533). The model samples tokens freely with zero awareness of its remaining context budget. When it starts a tool call late in the generation, nothing prevents it from running out of room before writing the closing tag.
Affected code paths
ds4_server.c:5390-5574 — main decode loop, context budget gating
ds4_server.c:5523-5534 — text-based tool_calls_started / tool_calls_finished detection
ds4_server.c:5582-5589 — unterminated tool call rejection
Symptom
When accumulated conversation fills the context window (ctx=32768), the model occasionally starts a <|DSML|tool_calls> block but fails to close it before hitting the context limit. The server then rejects the generation:
ctx=22528..31488:8960 gen=1280 DSML_START finish=error error="unterminated tool call"
Total context position at exit: 31488 + 1280 = 32768 — exactly the ctx hard limit.
What happens in code
The decode while loop at ds4_server.c:5409 exits when ds4_session_pos >= ds4_session_ctx.
The post-loop safety check at ds4_server.c:5583-5589 (added in 404f0a1) detects saw_tool_start && !saw_tool_end and flags it as an error.
The user gets a failure with no clear signal about what went wrong.
Why it happens
Tool calls in ds4 are represented as plain text strings (<|DSML|tool_calls>...</|DSML|tool_calls>). The server detects them via post-hoc strstr() scanning on the generated text (ds4_server.c:5525-5533). The model samples tokens freely with zero awareness of its remaining context budget. When it starts a tool call late in the generation, nothing prevents it from running out of room before writing the closing tag.
Affected code paths
ds4_server.c:5390-5574 — main decode loop, context budget gating
ds4_server.c:5523-5534 — text-based tool_calls_started / tool_calls_finished detection
ds4_server.c:5582-5589 — unterminated tool call rejection