fix(rag): add sse keepalive heartbeat to prevent nginx 60s timeout#434
Merged
Conversation
the embedding phase (multi-query expansion via cpu-only ollama) can take several minutes. nginx's proxy_read_timeout (60s default) was killing the idle sse connection before any event was flushed, leaving the ui stuck on "retrieving context…" forever. send a sse comment (: keepalive) every 20s to keep the connection alive through the entire pipeline duration.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
AndreLiar
added a commit
that referenced
this pull request
Jun 15, 2026
) (#435) the embedding phase (multi-query expansion via cpu-only ollama) can take several minutes. nginx's proxy_read_timeout (60s default) was killing the idle sse connection before any event was flushed, leaving the ui stuck on "retrieving context…" forever. send a sse comment (: keepalive) every 20s to keep the connection alive through the entire pipeline duration.
AndreLiar
added a commit
that referenced
this pull request
Jun 15, 2026
* fix(rag): add sse keepalive heartbeat to prevent nginx 60s timeout (#434) the embedding phase (multi-query expansion via cpu-only ollama) can take several minutes. nginx's proxy_read_timeout (60s default) was killing the idle sse connection before any event was flushed, leaving the ui stuck on "retrieving context…" forever. send a sse comment (: keepalive) every 20s to keep the connection alive through the entire pipeline duration. * feat(settings): render qr code in mfa setup flow (#441) the setup form was showing the raw otpauth:// url as plain text. users had to manually copy a long secret into their authenticator app. now shows a scannable qr code image (qrcode.react) + keeps the secret as a manual-entry fallback below. also clarifies the step 1 label to mention microsoft authenticator / google authenticator explicitly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
The RAG
/streamendpoint uses SSE. The embedding phase (multi-query expansion via CPU-only ollama) takes 3–5 minutes on the current prod infra. Nginx's defaultproxy_read_timeoutis 60 seconds — it kills any SSE connection where no body bytes are written within that window.Result: the SSE stream was dropped silently 60s after opening. The frontend stayed on "Retrieving context…" indefinitely, then reset to 0 messages. The backend actually completed the pipeline successfully (confirmed in logs:
totalTime: 323330ms, messages saved to DB) but no client was connected anymore to receive the events.Fix
Send an SSE comment (
: keepalive\n\n) every 20 seconds from the moment the stream opens. SSE comments are valid per spec, ignored by browsers, and count as body bytes — so nginx resets its idle timer on each one, keeping the connection alive for the full pipeline duration.Test plan
Notes
The underlying performance issue (CPU-only ollama at ~5 min/query) is a separate infra concern. This fix makes the feature functional while that is addressed.