Tolerate connection close failures in #restart#16
Conversation
ActionCable::Server::Base#restart iterates @connections and calls the iteration and propagates out. Since restart is wired to Rails' before_class_unload reloader callback, that raise tears down the unload callback chain and 500s the request that triggered the reload. We hit this in development under async-cable, whose Socket#transmit pushes to a Thread::Queue and raises ClosedQueueError when the queue is already closed (stale connection, second restart in flight, client disconnect racing with restart): async-cable/lib/async/cable/socket.rb:52:in `Thread::Queue#push` async-cable/lib/async/cable/socket.rb:52:in `transmit` action_cable/connection/base.rb:117:in `transmit` action_cable/connection/base.rb:126:in `close` action_cable/server/base.rb:75:in `block in restart` action_cable/server/base.rb:74:in `each` action_cable/server/base.rb:74:in `restart` The root cause is adapter-specific, but the fix belongs at the restart boundary too: it runs from a reloader callback and shouldn't propagate per-connection failures regardless of which adapter is serving the socket. We catch and log around each close so one bad connection cannot abort the iteration. The remaining connections still close and the worker pool, executor, and pub-sub adapter still shut down.
|
Thanks for the detailed report and the proposed fix.
I'm not sure about that. I believe that |
|
Yeah maybe it's not up to this to adopt to ”miss behaving” adapters. For context I was planning to suggest something to async-cable as well. My current wall of hacks for local dev with falcon / async-cable includes module CableSocketSafeTransmit
def transmit(data)
super
rescue ClosedQueueError
end
endpaired with a debouncer, some variant of rails/rails#57423 and a hack on hotwire spark to make it thread safe. So I've been heavily in ”just make the damn dev reload thing work”–mode and tbh have not deeply looked into the design / intentions. |
Added something here socketry/async-cable#6 and i'll let you peeps 🧠 over what is the correct thing =) also thanks so much for the effort of pushing fiber–native things. So nice to be running on falcon in prod now. |
|
There is something fundamentally wrong with this order of operations: Trying to transmit after a socket has been closed is objectively a logic bug. |
|
Closed by #17 |
|
Thanks everyone and thanks @joeljunstrom for proposing a fix. |
ActionCable::Server::Base#restartiterates@connectionsand calls the iteration and propagates outSince restart is wired to Rails
before_class_unloadreloader callback, that raise tears down the unload callback chain and 500s the request that triggered the reload.We hit this in development under async-cable, whose Socket#transmit pushes to a
Thread::Queueand raisesClosedQueueErrorwhen the queue is already closed (stale connection, second restart in flight, client disconnect racing with restart):The root cause is adapter-specific, but the fix probably belongs at the restart boundary too.
It runs from a reloader callback and shouldn't propagate per-connection failures regardless of which adapter is serving the socket.
We catch and log around each close so one bad connection cannot abort the iteration.
The remaining connections still close and the worker pool, executor, and pub-sub adapter still shut down.