Fix fallback chain skipped when initial server stalls after accepting connection#1821
Open
WouterGritter wants to merge 1 commit into
Open
Fix fallback chain skipped when initial server stalls after accepting connection#1821WouterGritter wants to merge 1 commit into
WouterGritter wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a player joins via a forced host whose first backend accepts the TCP connection but then never completes login (e.g. a firewall/anti-DDoS front that answers pings but black-holes logins), they get disconnected with "An internal error occurred in your connection" instead of being moved to the next server in the fallback chain. This issue surfaced on an issue on Velocity-CTD: GemstoneGG#938. I was able to recreate this failure state (simulating a "firewall" that answers pings but black-holes logins) with a Python script that does precisely this.
While the player idles in config waiting for the backend, two
ReadTimeoutHandlers with the same duration race:The player-side handler is created first, so it fires first and wins -> disconnect instead of failover.
This is also why just lowering
read-timeoutdidn't help (see referenced issue's chain): it shortens both equally, keeping the race tied (though this is an actual issue, see #1819 and #1820). Both this PR and reducingread-timeoutis needed to fix this problem properly; without reducing the read timeout, the client would just disconnect itself after 30 seconds, still not allowing Velocity to properly handle the exception (by, after this PR, failing over to the next server in the fallback chain).Fix
Suspend the player connection's read-timeout while establishing the initial connection (and through the fallback chain), restore it once a server is reached. The client can no longer time itself out during limbo, so the backend timeout fires and drives the existing fallback chain. As a bonus,
read-timeouttuning now actually controls failover speed.