Skip to content

[serve] Record WebSocket disconnect proxy metrics#64185

Open
RitwijParmar wants to merge 3 commits into
ray-project:masterfrom
RitwijParmar:codex/ws-disconnect-proxy-metrics
Open

[serve] Record WebSocket disconnect proxy metrics#64185
RitwijParmar wants to merge 3 commits into
ray-project:masterfrom
RitwijParmar:codex/ws-disconnect-proxy-metrics

Conversation

@RitwijParmar

Copy link
Copy Markdown

Why are these changes needed?

Fixes #64077.

When an ASGI sender closes a WebSocket mid stream, GenericProxy.proxy_request can be finalized at yield message before it sees the final ResponseStatus. The ongoing request gauge is decremented, but request counters, error counters, latency metrics, and access logs are skipped.

This moves proxy request accounting into the wrapper finally block, so a finalized WebSocket stream is still counted once. If no status reached the wrapper, it records WebSocket code 1006.

The HTTP proxy also now lets GeneratorExit propagate instead of routing it through the normal exception response path. That avoids yielding while the async generator is being closed.

Related issue number

#64077

Checks

  • python3 -m ruff check python/ray/serve/_private/proxy.py python/ray/serve/tests/unit/test_proxy.py
  • python3 -m ruff format --check python/ray/serve/_private/proxy.py python/ray/serve/tests/unit/test_proxy.py
  • python3 -m compileall -q python/ray/serve/_private/proxy.py python/ray/serve/tests/unit/test_proxy.py
  • git diff --check

I tried the focused pytest for the new test, but this local checkout cannot import ray._raylet because the compiled Ray extension is not built here.

@RitwijParmar RitwijParmar requested a review from a team as a code owner June 17, 2026 18:11
@RitwijParmar RitwijParmar force-pushed the codex/ws-disconnect-proxy-metrics branch from 3b31aea to 469024f Compare June 17, 2026 18:11

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ensures that proxy metrics and access logs are recorded correctly when a client disconnects, particularly for WebSocket connections (which are assigned a status code of "1006"). It also handles GeneratorExit explicitly to prevent it from being treated as an unexpected server error. The review feedback highlights a critical issue where raising GeneratorExit leaves status as None for non-websocket requests, potentially causing an AttributeError when status.code is accessed later. A suggestion is provided to defensively set status to a "499" status code (client closed request) when catching GeneratorExit.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1366 to +1367
except GeneratorExit:
raise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

By adding except GeneratorExit: raise to prevent GeneratorExit from being caught as an unexpected 500 error, status remains None if the generator is closed before the response starts (e.g., when an HTTP client disconnects early).

In the finally block of send_request_to_replica, if status is None and the request type is not "websocket", it falls through to the else block:

else:
    status_code = status.code

This will raise an unhandled AttributeError: 'NoneType' object has no attribute 'code', masking the original GeneratorExit and potentially causing resource leaks or broken cleanup.

To fix this, we should defensively set status to an appropriate status code (such as "499" for client closed request) when catching GeneratorExit before re-raising it.

Suggested change
except GeneratorExit:
raise
except GeneratorExit:
if status is None:
status = ResponseStatus(code="499", is_error=True)
raise

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 469024f. Configure here.

Comment thread python/ray/serve/_private/proxy.py
Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>
@RitwijParmar RitwijParmar force-pushed the codex/ws-disconnect-proxy-metrics branch from 469024f to e52b9e5 Compare June 17, 2026 18:19
@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Jun 17, 2026
Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>
@RitwijParmar RitwijParmar force-pushed the codex/ws-disconnect-proxy-metrics branch from e15b5f2 to c973a14 Compare June 17, 2026 20:56
Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>
@RitwijParmar RitwijParmar force-pushed the codex/ws-disconnect-proxy-metrics branch from 48a7592 to 0923bc3 Compare June 20, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] WebSocket requests dropped from proxy request metrics when client disconnects mid-stream (+ "async generator ignored GeneratorExit" log spam)

1 participant