fix(watchdog): two-phase timeout + consume prompt_progress keepalives for llama.cpp#946
Merged
Aaronontheweb merged 4 commits intonetclaw-dev:devfrom May 9, 2026
Conversation
The processing watchdog was killing healthy LLM requests during slot contention and cold prefill (91K tokens, ~10 min silent). Three fixes: 1. Split watchdog into PrefillTimeout (1800s default) and InterDeltaTimeout (FirstTokenTimeout, 600s). Start generous, promote on first delta. 2. Request `return_progress: true` from llama-server and fix ParseStreamingUpdates to yield keepalives for content-less data events (e.g. prompt_progress) instead of silently dropping them. 3. Forward SSE comment lines as keepalives and send watchdog-refresh on first buffered text/thinking delta.
…restart Promote() then Refresh() with the same timeout restarted the timer twice on first delta. Restructured to call only one per delta. Extracted shared RestartLlmTimer() to eliminate identical method bodies.
…ants Extract LlmCall, ToolExecution, Compaction constants on ProcessingWatchdog and replace all 7 call sites in LlmSessionActor.
Aaronontheweb
commented
May 9, 2026
| body["stream_options"] = new JsonObject { ["include_usage"] = true }; | ||
| // llama-server sends prefill progress as SSE data events when enabled. | ||
| // Harmless on servers that don't support it (unknown fields are ignored). | ||
| body["return_progress"] = true; |
Collaborator
Author
There was a problem hiding this comment.
Encourages llama-server to send us progress updates during prefill et al
| { | ||
| // Content-less data events (e.g. prompt_progress during prefill) — yield | ||
| // keepalive so the watchdog timer resets while the server is working. | ||
| yield return KeepaliveUpdate; |
Collaborator
Author
There was a problem hiding this comment.
should prevent netclaw's watchdog from aggressively nuking sessions when the model is sending back progress reports, thinking-only updates, etc.
prompt_progress keepalives for llama.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FirstTokenTimeout(600s) intoPrefillTimeout(1800s, covers queue wait + prefill) andFirstTokenTimeout(600s, inter-delta silence). Watchdog starts generous, promotes to the tighter budget on first streaming delta.prompt_progressevents: Requestreturn_progress: truein streaming payloads and fixParseStreamingUpdatesto yield keepalives for content-less data events instead of silently dropping them.data:SSE lines (comment keepalives, event-type lines) so the watchdog resets during prefill/queuing."llm-call","tool-execution","compaction"literals with constants onProcessingWatchdog.Context
Session
D0AC6CKBK5K/1778174942.852979hit two consecutive 600s watchdog timeouts on 2026-05-08. Server-side logs proved the server was healthy: 7m15s queued (slot contention), 2m44s cold prefill of 91K tokens (KV cache bug forced full re-processing), cancelled at 84.4% — ~18 seconds from completing.The watchdog couldn't distinguish "server is busy prefilling" from "server is dead" because both look like SSE silence. llama-server already sends
prompt_progressevents during prefill (PR #15827), butParseStreamingUpdateswas silently dropping them at thecontents.Count == 0 && finishReason is nullguard.Test plan
PrefillTimeoutand passStreamsReasoningAndTextDeltas_FromOfficialSpectrumupdated to assert keepalive from content-less initial chunkdotnet slopwatch analyzeclean./scripts/Add-FileHeaders.ps1 -Verifyclean--parallel 1+ concurrent requests to verify progress events refresh watchdog