fix(noq): close the connection when a Connecting is dropped before handshake#682
Open
drlukeangel wants to merge 1 commit into
Open
fix(noq): close the connection when a Connecting is dropped before handshake#682drlukeangel wants to merge 1 commit into
drlukeangel wants to merge 1 commit into
Conversation
A Connecting dropped before handshake completion left its connection task, packet spaces, and channels stranded in EndpointInner forever (before handshake completion max_idle_timeout is not yet enabled and there is no handshake timeout). Drop now takes the inner connection and calls implicit_close() (guarded by !is_closed) to drain and release. The connected oneshot Receiver is wrapped in Option so poll/into_0rtt can take() it without moving out of a type implementing Drop (E0509).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
While running noq (via iroh) under sustained churn in a downstream project — peers abruptly killed
without a graceful
CONNECTION_CLOSE— we found in-progress connections were never reclaimed.Connectinghad noDropimplementation, so dropping aConnectingfuture before the handshakecompleted didn't signal the connection task. Before handshake completion the idle timeout isn't yet
in effect and there's no handshake timeout, and the background
ConnectionDriverplus theEndpointInnerevent-channel sender keep the connection alive — so the half-open connection statelingered in the endpoint maps indefinitely, once per aborted inbound handshake.
(Thanks for noq, too — having a workable multipath QUIC we could build on underneath iroh is a big
deal, and the
Connecting/driver/endpoint separation made the right fix point clear.)Fix
Implement
DropforConnecting: on drop, if the handshake hasn't completed, take the innerconnection reference, and if it isn't already closed, call
implicit_close(). That transitions theconnection to drained, dispatches the close to the endpoint (which removes the handle from its
active maps), and wakes the driver to exit cleanly — releasing the connection, its channels, and its
packet spaces.
To satisfy the borrow checker (E0509: cannot move out of a type implementing
Drop), the oneshotconnectedreceiver is wrapped inOption, andinto_0rtt/Future::polltake()it (pollrestores it on
Pending).Breaking Changes
None.
Connecting's public API is unchanged; this only adds cleanup on the drop path.Notes & open questions
Connecting, assert theconnection is removed from the endpoint maps) needs a connected-endpoint harness; glad to add one
in whatever style fits your test suite. We validated the behavior via a downstream chaos soak
(continuous kill/respawn): with the fix, the previously monotonic per-connection growth flattens
and connection structures no longer accumulate in heap profiles.
drop-driven cleanup the right scope here? Happy to follow your lead.
Checklist
Option-wrap reasoning (E0509) and the drop-path intentRelated
Found together with two other churn-related leak fixes in the iroh stack:
mapped_addrseviction on actor shutdownResolves #681