Fix tunnels staying offline at the edge after a connector connects#211
Merged
Conversation
d08d4d8 to
14684dd
Compare
…#209) When a connector comes online (Ready False→True), its liveness reaches the edge via the upstream-status annotation, but Envoy Gateway did not re-translate against it. EG watches Connector with a generation-only predicate, so the annotation change is ignored, and the previous project-side Gateway annotation touch raced the annotation's hub→edge propagation. EG could translate while the edge still saw the connector offline, serving 503 with no recovery. Add an edge-local controller in the extension-server process that watches the replicated Connector and touches the owning Gateway when liveness changes — after the new liveness is already in the shared cache — forcing EG to re-translate against fresh data. Remove the racy project-side touch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14684dd to
18d3516
Compare
This was referenced Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
A tunnel connector could come online but the edge would keep serving HTTP 503 for it — the data plane never picked up that the tunnel was ready. The interim workaround was a CLI re-patch to force it.
Fixes #209.
Why it happened
The connector's live status reaches the edge correctly, but Envoy Gateway wasn't re-translating against it. EG only re-runs its translation — and the extension hook that programs tunnels — when a resource it watches changes through an annotation-aware path, and a
Connectorupdate doesn't qualify. So fresh liveness sat in the extension server's cache while EG kept serving the stale (offline) program.What changed
Connectorand, when its liveness changes, nudges the owningGatewayso EG re-translates — after the fresh liveness is already in the local cache.patchGateways.Impact
refresh_connection_details()workaround is no longer required (safe to leave or remove).Testing
Follow-ups (not in this PR)
🤖 Generated with Claude Code