Rescue subscription errors instead of killing WebSocket connection#4
Rescue subscription errors instead of killing WebSocket connection#4bradgessler wants to merge 2 commits into
Conversation
ActionCable::Connection::Subscriptions::Error (e.g. AlreadySubscribedError) can be raised when a client sends a duplicate subscribe command, which is common during Turbo morph/page refresh cycles. Previously this propagated to the generic rescue clause, tearing down the entire WebSocket connection. This is especially problematic with the PostgreSQL subscription adapter, where connection teardown removes all LISTEN subscriptions. With rapid reconnect/resubscribe cycles, the LISTEN thread never stays alive long enough to receive NOTIFY broadcasts from other processes. The fix wraps connection.handle_incoming in a targeted rescue so subscription errors are logged but the connection stays alive.
5e15d3a to
c5826db
Compare
|
rails/rails@7a8f26dcff changed this behaviour to raise an exception, but it seems like some adapters legitimately raise this error in some cases. WDYT? |
|
I think this is better solved upstream, e.g. rails/rails#57504 |
|
@bradgessler As a quick workaround, you can add
Or switch to @anycable/web and forget about that 🙂. @samuel-williams-shopify In general, I think, it worths adding exception handling to the message processing loop (like Rails does that). Maybe, not Exception, but just StandardError. For example, we don't want to drop connection because of ActiveRecord::NotFoundError, right? (We lack a proper mechanism to communicate errors with clients 🤷♂️) |
|
This is huge! Thanks for all who landed it! @palkan I actually gave up on ActionCable because it doesn't really have a protocol and drops messages 🤣 A few months ago I finally got mad at ActionCable & SoildCable for quietly dropping message and built out v2 of https://github.com/firehoseio/next instead (I built v1 almost 15 years ago!) using @ioquatix's async-web-socket library. I have a weird thing where I want the lease number of moving parts possible in my envs, and this was the most minimal thing I could think of since it runs on Postgres and doesn't require running any additional processes. It's running at https://opengraphplus.com if you want to see it in action. AnyCable is great, but I had the curse of knowledge of Firehose v1 and wanted less moving parts than AnyCable. I still can't believe how much stuff runs on straight-up ActionCable without any protocol for detecting dropped messages. |
Problem
ActionCable::Connection::Subscriptions::Error(specificallyAlreadySubscribedError) can be raised when a client sends a duplicatesubscribecommand. This is common during Turbo morph/page refresh cycles, where the<turbo-cable-stream-source>element is re-inserted into the DOM and triggers a re-subscribe.Currently, this exception propagates to the generic
rescue => errorclause inhandle_incoming_websocket, which logs "Abnormal client failure!" and tears down the entire WebSocket connection in theensureblock.This is especially problematic with the PostgreSQL subscription adapter, where connection teardown removes all
LISTENsubscriptions. With rapid reconnect/resubscribe cycles, theLISTENthread never stays alive long enough to receiveNOTIFYbroadcasts from other processes (e.g., background job workers), effectively breaking cross-process ActionCable broadcasts.Fix
Wrap
connection.handle_incomingin a targetedrescue ActionCable::Connection::Subscriptions::Errorso subscription-level errors are logged as warnings but the WebSocket connection stays alive.Testing
Observed in production at https://og.plus with:
broadcasts_refreshesBefore the fix: zero
LISTENconnections inpg_stat_activity, broadcasts silently lost.After the fix: WebSocket connections survive duplicate subscribe attempts,
LISTENstays active, broadcasts delivered.