refactor: harden DNS recovery and interactive transport behavior#76
Open
ebpfx wants to merge 1 commit into
Open
refactor: harden DNS recovery and interactive transport behavior#76ebpfx wants to merge 1 commit into
ebpfx wants to merge 1 commit into
Conversation
Tighten the client and server DNS transport path for shutdown-style networks where resolver behavior is bursty, forged, delayed, or partially dead. Key changes: - rebuild the transport stack when session creation fails instead of retrying on a poisoned resolver path - track per-query UDP health and retire stale sessions only when streams actually need transport - retire idle poisoned sessions after repeated stream-open failures without dropping busy sessions - lower the server response hold default from 1s to 200ms and default to two response workers for better interactive behavior without hurting downstream bundling too aggressively - make client polling stream-aware and tunable, with a 200ms active poll cap and a 2s idle max backoff - enable TCP_NODELAY on client local TCP sockets and server upstream TCP sockets to reduce small-packet latency - fix fractional -rps handling so rates below 1 query/sec still work correctly - document the new tuning knobs and defaults This keeps the single-session model and tunnel wire format intact while improving recovery, stability, and responsiveness under censored or unreliable resolver paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR consolidates the reliability and transport-tuning work needed to make the DNS tunnel more usable on degraded or shutdown-style networks.
The main goals are:
Why This Patch Is Needed
The original code path assumed a relatively normal DNS environment. In real shutdown conditions that assumption breaks:
This patch set addresses those problems without changing the tunnel wire format or the single-session model.
Files
README.mdclient/client.goclient/dns.goclient/udp.goman/vaydns-client.1man/vaydns-server.1vaydns-client/main.govaydns-server/main.goWhat Changed
1. Rebuild transport on failed session creation
Changes:
createSessionfails, the client now tears down and rebuilds the resolver/DNS transport stack before retrying session creation.session readyis emitted only after a session is actually usable.Reason:
Impact:
2. Detect stale per-query UDP transport from the session layer
Changes:
UDPPacketConnnow tracks the time of the last successful DNS response.Reason:
Impact:
3. Retire only idle poisoned sessions after repeated stream-open failures
Changes:
OpenStreamfailures when there are no active streams.Reason:
Impact:
4. Make client polling stream-aware and tunable
Changes:
-poll-delay-active-poll-delay-poll-max-delay-udp-transport-stale-timeout-open-stream-failure-limitHandle()paths now both contribute to active-stream tracking.Reason:
Impact:
5. Fix fractional
-rpshandlingChanges:
NewRateLimiternow uses a minimum bucket capacity of1.0token even whenrps < 1.Reason:
-rpsvalues below1were effectively broken.Impact:
6. Make server response queue overload visible
Changes:
-response-queue-sizeReason:
Impact:
7. Reduce server response hold from 1s to 200ms
Changes:
1sdownstream response wait with a configurable response delay.200ms.Reason:
200mskeeps the wait window comfortably below the client UDP timeout while still leaving enough room to bundle downstream packets.Impact:
8. Reduce edge TCP latency with
TCP_NODELAYChanges:
TCP_NODELAYon the client local TCP side.TCP_NODELAYon the server upstream TCP side.Reason:
Impact:
Defaults After This Patch
Client
poll-delay = 500msactive-poll-delay = 200mspoll-max-delay = 2sudp-transport-stale-timeout = 3sopen-stream-failure-limit = 3Server
response-delay = 200msresponse-workers = 2response-queue-size = queue-size(default0, effective default512)Backward Compatibility
Expected Improvements