Conversation
after minutes sendFrame held state.mu during sourceConn.WriteFrame (no timeout), and handleData held state.mu during target.Write. When tunnel TCP writes got slow, all frame dispatch for that tunnel froze — SYN frames couldn't be processed, so new connections failed while existing ones continued. - Add 10s write timeout to sourceConn.WriteFrame - Refactor sendFrame to pick source under lock, write outside lock - Refactor handleData/handleFIN to drain reorderer under lock, write outside - Add 10s write timeout for upstream (Xray) writes
handleData wrote to upstream (Xray) synchronously inside the sequential frame dispatch loop. A slow upstream write blocked ALL frame processing on that tunnel — including SYN frames for new connections, causing new connections to fail while existing ones continued. Each connState now has a writeCh + upstreamWriter goroutine. handleData inserts into the reorderer and sends chunks to writeCh non-blocking, then returns immediately so the frame loop can process the next frame.
connections Root cause: health checker created SEPARATE TCP connections to instances for framing protocol probes, concurrent with TunnelPool's persistent connection. This likely disrupted the DNS tunnel, breaking the shared persistent connection that all packet_split traffic flows through. - Skip probeFramingProtocol when TunnelPool has active connection to instance - Add keepalive frames (10s interval) to detect dead tunnels and keep DNS sessions alive - Add max-age (3min) forced reconnect to prevent long-lived connection degradation - Reduce stale threshold from 20s to 15s for faster dead tunnel detection - Ignore ConnID 0 (keepalive) on both client and CentralServer
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: prevent health probe interference with TunnelPool persistent connections
Root cause: health checker created SEPARATE TCP connections to instances
for framing protocol probes, concurrent with TunnelPool's persistent
connection. This likely disrupted the DNS tunnel, breaking the shared
persistent connection that all packet_split traffic flows through.