Skip to content

Fix fallback chain skipped when initial server stalls after accepting connection#1821

Open
WouterGritter wants to merge 1 commit into
PaperMC:dev/3.0.0from
WouterGritter:fallback-read-timeout-race
Open

Fix fallback chain skipped when initial server stalls after accepting connection#1821
WouterGritter wants to merge 1 commit into
PaperMC:dev/3.0.0from
WouterGritter:fallback-read-timeout-race

Conversation

@WouterGritter

@WouterGritter WouterGritter commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

When a player joins via a forced host whose first backend accepts the TCP connection but then never completes login (e.g. a firewall/anti-DDoS front that answers pings but black-holes logins), they get disconnected with "An internal error occurred in your connection" instead of being moved to the next server in the fallback chain. This issue surfaced on an issue on Velocity-CTD: GemstoneGG#938. I was able to recreate this failure state (simulating a "firewall" that answers pings but black-holes logins) with a Python script that does precisely this.

While the player idles in config waiting for the backend, two ReadTimeoutHandlers with the same duration race:

  • backend connection timeout -> drives the fallback chain (correct)
  • player connection timeout -> hard-disconnects (wrong)

The player-side handler is created first, so it fires first and wins -> disconnect instead of failover.

This is also why just lowering read-timeout didn't help (see referenced issue's chain): it shortens both equally, keeping the race tied (though this is an actual issue, see #1819 and #1820). Both this PR and reducing read-timeout is needed to fix this problem properly; without reducing the read timeout, the client would just disconnect itself after 30 seconds, still not allowing Velocity to properly handle the exception (by, after this PR, failing over to the next server in the fallback chain).

Fix

Suspend the player connection's read-timeout while establishing the initial connection (and through the fallback chain), restore it once a server is reached. The client can no longer time itself out during limbo, so the backend timeout fires and drives the existing fallback chain. As a bonus, read-timeout tuning now actually controls failover speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant