Skip to content

Tolerate connection close failures in #restart#16

Closed
joeljunstrom wants to merge 1 commit into
anycable:mainfrom
joeljunstrom:fault-tolerant-restart
Closed

Tolerate connection close failures in #restart#16
joeljunstrom wants to merge 1 commit into
anycable:mainfrom
joeljunstrom:fault-tolerant-restart

Conversation

@joeljunstrom
Copy link
Copy Markdown
Contributor

@joeljunstrom joeljunstrom commented May 21, 2026

ActionCable::Server::Base#restart iterates @connections and calls the iteration and propagates out

Since restart is wired to Rails before_class_unload reloader callback, that raise tears down the unload callback chain and 500s the request that triggered the reload.

We hit this in development under async-cable, whose Socket#transmit pushes to a Thread::Queue and raises ClosedQueueError when the queue is already closed (stale connection, second restart in flight, client disconnect racing with restart):

  async-cable/lib/async/cable/socket.rb:52:in `Thread::Queue#push`
  async-cable/lib/async/cable/socket.rb:52:in `transmit`
  action_cable/connection/base.rb:117:in `transmit`
  action_cable/connection/base.rb:126:in `close`
  action_cable/server/base.rb:75:in `block in restart`
  action_cable/server/base.rb:74:in `each`
  action_cable/server/base.rb:74:in `restart`

The root cause is adapter-specific, but the fix probably belongs at the restart boundary too.
It runs from a reloader callback and shouldn't propagate per-connection failures regardless of which adapter is serving the socket.

We catch and log around each close so one bad connection cannot abort the iteration.

The remaining connections still close and the worker pool, executor, and pub-sub adapter still shut down.

ActionCable::Server::Base#restart iterates @connections and calls
the iteration and propagates out. Since restart is wired to Rails'
before_class_unload reloader callback, that raise tears down the
unload callback chain and 500s the request that triggered the
reload.

We hit this in development under async-cable, whose Socket#transmit
pushes to a Thread::Queue and raises ClosedQueueError when the
queue is already closed (stale connection, second restart in
flight, client disconnect racing with restart):

  async-cable/lib/async/cable/socket.rb:52:in `Thread::Queue#push`
  async-cable/lib/async/cable/socket.rb:52:in `transmit`
  action_cable/connection/base.rb:117:in `transmit`
  action_cable/connection/base.rb:126:in `close`
  action_cable/server/base.rb:75:in `block in restart`
  action_cable/server/base.rb:74:in `each`
  action_cable/server/base.rb:74:in `restart`

The root cause is adapter-specific, but the fix belongs at the
restart boundary too: it runs from a reloader callback and
shouldn't propagate per-connection failures regardless of which
adapter is serving the socket.

We catch and log around each close so one bad connection cannot
abort the iteration.

The remaining connections still close and the worker pool, executor,
and pub-sub adapter still shut down.
@palkan
Copy link
Copy Markdown
Member

palkan commented May 21, 2026

Thanks for the detailed report and the proposed fix.

The root cause is adapter-specific, but the fix probably belongs at the restart boundary too.

I'm not sure about that. I believe that #close should tolerate "already closed" scenarios and raise exceptions only if there is something fundamentally wrong (ideally, never). Writing and closing could happen concurrently in other scenarios, too, not only during restarts, so adding a guard (return unless alive?, similarly how it's done in Action Cable's server/socket.rb) would make sense at the async-cable level. /cc @ioquatix

@joeljunstrom
Copy link
Copy Markdown
Contributor Author

joeljunstrom commented May 21, 2026

Yeah maybe it's not up to this to adopt to ”miss behaving” adapters.

For context I was planning to suggest something to async-cable as well.

My current wall of hacks for local dev with falcon / async-cable includes

  module CableSocketSafeTransmit
    def transmit(data)
      super
    rescue ClosedQueueError
    end
  end

paired with a debouncer, some variant of rails/rails#57423 and a hack on hotwire spark to make it thread safe.

So I've been heavily in ”just make the damn dev reload thing work”–mode and tbh have not deeply looked into the design / intentions.

@joeljunstrom
Copy link
Copy Markdown
Contributor Author

(return unless alive?, similarly how it's done in Action Cable's server/socket.rb) would make sense at the async-cable level. /cc @ioquatix

Added something here socketry/async-cable#6 and i'll let you peeps 🧠 over what is the correct thing =)

also thanks so much for the effort of pushing fiber–native things. So nice to be running on falcon in prod now.

@ioquatix
Copy link
Copy Markdown
Contributor

There is something fundamentally wrong with this order of operations:

socket.close
socket.transmit("closing")
socket.close

Trying to transmit after a socket has been closed is objectively a logic bug.

@palkan
Copy link
Copy Markdown
Member

palkan commented May 22, 2026

Closed by #17

@palkan palkan closed this May 22, 2026
@joeljunstrom joeljunstrom deleted the fault-tolerant-restart branch May 22, 2026 08:23
@ioquatix
Copy link
Copy Markdown
Contributor

Thanks everyone and thanks @joeljunstrom for proposing a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants