Skip to content

feat: replace WebSocket push channel with SSE#94

Merged
TerrifiedBug merged 23 commits intomainfrom
sse-push-channel-ca4
Mar 11, 2026
Merged

feat: replace WebSocket push channel with SSE#94
TerrifiedBug merged 23 commits intomainfrom
sse-push-channel-ca4

Conversation

@TerrifiedBug
Copy link
Owner

Summary

  • Replace WebSocket push with Server-Sent Events — new SSE route handler at /api/agent/push, in-memory PushRegistry with keepalive, and Go SSE client with exponential backoff reconnection
  • Migrate all server-side callers (fleet.ts, deploy.ts, pipeline.ts, deploy-agent.ts) from wsRegistrypushRegistry; update config endpoint to return pushUrl instead of websocketUrl
  • Delete all WebSocket artifactsserver.ts custom server wrapper, ws-registry.ts, ws-auth.ts, ws-types.ts, agent/internal/ws/ package, gorilla/websocket dependency
  • Clean up dependencies and Docker build — remove ws, @types/ws, esbuild, tsx; revert dev script to next dev; remove esbuild bundling step from Dockerfile

Net result: -1,072 lines removed, +535 added (27 files changed). Eliminates the custom server wrapper, esbuild bundling, and gorilla/websocket dependency. SSE works through standard Next.js route handlers with no special infrastructure.

Test Plan

  • TypeScript compiles cleanly (tsc --noEmit)
  • ESLint passes (pnpm lint)
  • Go agent builds (go build ./...)
  • All Go tests pass (21 tests across 4 packages) — includes 4 new SSE client integration tests
  • Next.js production build succeeds (pnpm build)
  • No remaining WebSocket references in source (verified via grep)
  • No leftover /api/agent/ws route directory
  • Manual: Deploy and verify agent connects via SSE push channel
  • Manual: Verify fleet dashboard shows "Live" badge for connected agents
  • Manual: Trigger config change and verify agent receives push notification

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file docker agent feature labels Mar 11, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR replaces the WebSocket push channel with Server-Sent Events (SSE), eliminating the custom server.ts wrapper, esbuild bundling step, and gorilla/websocket dependency. The result is a standard Next.js route handler at /api/agent/push backed by an in-memory PushRegistry with keepalive, and a new Go SSE client with exponential backoff reconnection. All server-side callers (deploy.ts, fleet.ts, pipeline.ts, deploy-agent.ts) are correctly migrated from wsRegistrypushRegistry.

Key observations:

  • The PushRegistry correctly handles reconnects: it closes the old controller before registering the new one, and the unregister guard (current?.controller !== controller) prevents stale close-handlers from removing a newer connection.
  • Authentication on /api/agent/push uses the existing authenticateAgent() function, consistent with all other agent endpoints.
  • The SSE route response is missing an X-Accel-Buffering: no header — without it, Nginx deployments that have not applied the updated proxy_buffering off config will buffer SSE chunks, causing events to be delayed or never delivered.
  • The Go SSE client uses http.DefaultClient with no connect/response-header timeout, whereas the previous WebSocket client had a HandshakeTimeout: 10s. If the server is partially unavailable (accepts TCP but stalls on HTTP headers), the SSE goroutine will block until Close() is called.

Confidence Score: 4/5

  • Safe to merge — the SSE implementation is architecturally sound and well-tested; two minor hardening gaps noted.
  • The migration is clean and correct end-to-end: auth, registry lifecycle, keepalive, and reconnection all work as expected. The two issues (missing X-Accel-Buffering header and no HTTP connect timeout) are hardening gaps rather than correctness bugs — SSE will work in most deployments, but may misbehave behind default Nginx configs or under partial server outages.
  • src/app/api/agent/push/route.ts (missing X-Accel-Buffering header) and agent/internal/push/client.go (no HTTP response-header timeout).

Important Files Changed

Filename Overview
src/app/api/agent/push/route.ts New SSE route handler — auth, registration, and cleanup are correct; missing X-Accel-Buffering: no header could break streaming behind default Nginx.
src/server/services/push-registry.ts Well-implemented in-memory SSE registry with keepalive, correct stale-connection guard in unregister, and graceful old-connection eviction on re-register.
agent/internal/push/client.go Correct SSE client with exponential backoff; uses http.DefaultClient with no connect/response-header timeout (regression from WebSocket's 10s HandshakeTimeout).
agent/internal/push/client_test.go Good integration tests covering message parsing, auth header, reconnect, and keepalive; slightly time-based (sleep-driven) but functional.
src/app/api/agent/config/route.ts Correctly switches from wss://…/api/agent/ws to http(s)://…/api/agent/push URL construction; no WebSocket protocol conversion needed.
docker/server/Dockerfile Correctly removes esbuild bundling step and the server.ts rename trick; build simplifies to standard pnpm build without the custom server wrapper.
agent/internal/agent/agent.go Clean migration of ws → push package references; Close() and channel handling unchanged in behavior.

Sequence Diagram

sequenceDiagram
    participant Agent as Go Agent
    participant Config as /api/agent/config
    participant Push as /api/agent/push
    participant Registry as PushRegistry
    participant Callers as deploy/fleet/pipeline

    Agent->>Config: GET (Bearer token)
    Config-->>Agent: { pushUrl, pipelines, ... }

    Agent->>Push: GET (Bearer token, long-lived)
    Push->>Registry: register(nodeId, controller)
    Push-->>Agent: SSE stream open (": connected")

    loop every 30s
        Registry->>Push: enqueue(": keepalive")
        Push-->>Agent: ": keepalive"
    end

    Callers->>Registry: send(nodeId, { type: "config_changed", ... })
    Registry->>Push: controller.enqueue(event)
    Push-->>Agent: event: config_changed\ndata: {...}

    Agent->>Config: GET (re-poll triggered)
    Config-->>Agent: updated pipeline config

    Note over Agent,Push: On disconnect
    Agent--xPush: TCP close
    Push->>Registry: unregister(nodeId, controller)

    Note over Agent,Push: On reconnect
    Agent->>Push: GET (new connection)
    Push->>Registry: register(nodeId, newController)
    Registry->>Push: close(oldController)
Loading

Last reviewed commit: dd82ba4

Comment on lines +36 to +42
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing X-Accel-Buffering: no header

Nginx buffers upstream responses by default. Without the X-Accel-Buffering: no response header, SSE events will be held in Nginx's buffer and not forwarded to the agent until the buffer is flushed or the connection closes — effectively breaking real-time push.

The docs PR updates the Nginx config with proxy_buffering off, but that requires every operator to update their reverse-proxy config. Setting X-Accel-Buffering: no in the response headers makes SSE work correctly out-of-the-box regardless of the proxy config.

Suggested change
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
"X-Accel-Buffering": "no",
},
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/api/agent/push/route.ts
Line: 36-42

Comment:
**Missing `X-Accel-Buffering: no` header**

Nginx buffers upstream responses by default. Without the `X-Accel-Buffering: no` response header, SSE events will be held in Nginx's buffer and not forwarded to the agent until the buffer is flushed or the connection closes — effectively breaking real-time push.

The docs PR updates the Nginx config with `proxy_buffering off`, but that requires every operator to update their reverse-proxy config. Setting `X-Accel-Buffering: no` in the response headers makes SSE work correctly out-of-the-box regardless of the proxy config.

```suggestion
  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache, no-transform",
      Connection: "keep-alive",
      "X-Accel-Buffering": "no",
    },
  });
```

How can I resolve this? If you propose a fix, please make it concise.

}
req.Header.Set("Authorization", "Bearer "+c.token)

resp, err := http.DefaultClient.Do(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No connect timeout on the HTTP client

The old WebSocket client used HandshakeTimeout: 10 * time.Second. http.DefaultClient has no timeout, so if the server accepts the TCP connection but never sends an HTTP response (partial outage, overloaded proxy), this goroutine will block indefinitely on Do(req) until Close() is called.

Consider using a custom client with a Timeout for the initial response headers, or at least a ResponseHeaderTimeout:

var sseHttpClient = &http.Client{
    Transport: &http.Transport{
        ResponseHeaderTimeout: 15 * time.Second,
    },
}

Then replace http.DefaultClient.Do(req) with sseHttpClient.Do(req). Note: the body-read timeout should remain context-driven (so Close() can cancel a live connection), which is already handled correctly by the context cancellation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/internal/push/client.go
Line: 82

Comment:
**No connect timeout on the HTTP client**

The old WebSocket client used `HandshakeTimeout: 10 * time.Second`. `http.DefaultClient` has no timeout, so if the server accepts the TCP connection but never sends an HTTP response (partial outage, overloaded proxy), this goroutine will block indefinitely on `Do(req)` until `Close()` is called.

Consider using a custom client with a `Timeout` for the initial response headers, or at least a `ResponseHeaderTimeout`:

```go
var sseHttpClient = &http.Client{
    Transport: &http.Transport{
        ResponseHeaderTimeout: 15 * time.Second,
    },
}
```

Then replace `http.DefaultClient.Do(req)` with `sseHttpClient.Do(req)`. Note: the body-read timeout should remain context-driven (so `Close()` can cancel a live connection), which is already handled correctly by the context cancellation.

How can I resolve this? If you propose a fix, please make it concise.

@TerrifiedBug TerrifiedBug merged commit 725d23c into main Mar 11, 2026
11 checks passed
@TerrifiedBug TerrifiedBug deleted the sse-push-channel-ca4 branch March 11, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent dependencies Pull requests that update a dependency file docker documentation Improvements or additions to documentation feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant