Summary
When a tunnel connector's Ready condition transitions False → True (heartbeat renews the lease), the downstream connector's networking.datumapis.com/upstream-status annotation is not updated. The extension server reads this stale annotation and classifies the connector as offline, causing persistent HTTP 503 responses.
Sequence
- Connector created — replicator mirrors status into the downstream annotation. At this moment
Ready: False (lease not yet renewed).
- Heartbeat connects — upstream connector becomes
Ready: True, connectionDetails populated.
- Replicator does not re-reconcile —
skipUpstreamStatusSync: true suppresses downstream→upstream sync, and there is no watch on upstream status changes to re-trigger the mirror path.
- Downstream connector's annotation still contains
Ready: False with stale connectionDetails.
- Extension server reads the annotation, sees
Ready: False, marks all routes for that connector as offline → HTTP 503.
Evidence
Downstream connector annotation captured while upstream connector was Ready: True:
{
"conditions": [
{ "type": "Ready", "status": "False", "reason": "ConnectorNotReady",
"message": "Connector lease has expired. Agent may be offline." }
],
"connectionDetails": { "publicKey": { "id": "378843c806c8c93c5770abaa19bc47e04e9f56977c6e7cc28044a09ef5a1cd23", ... } }
}
Upstream connector at the same moment:
{ "type": "Ready", "status": "True", "reason": "ConnectorReady",
"message": "The connector is ready to tunnel traffic." }
Extension server log confirms all vhosts for this connector remain offline:
connector_offline_routes:26 clusters_replaced:1
Root Cause
replicationResourceConfig for the Connector type sets:
mirrorStatusToAnnotation: true,
skipUpstreamStatusSync: true,
The replicator reconciles on spec changes but has no watch/enqueue path for upstream status changes. When Ready flips after the initial replication, nothing re-queues the replicator to update the annotation.
touchDownstreamGatewayAnnotations in the connector controller fires to trigger EG re-translation, but the stale annotation means the extension server still classifies the connector as offline even after re-translation.
Fix
The replicator should watch upstream connector status changes and re-mirror the annotation when status changes. Alternatively, the connector controller should enqueue a replicator reconcile after writing Ready: True.
Summary
When a tunnel connector's
Readycondition transitionsFalse → True(heartbeat renews the lease), the downstream connector'snetworking.datumapis.com/upstream-statusannotation is not updated. The extension server reads this stale annotation and classifies the connector as offline, causing persistent HTTP 503 responses.Sequence
Ready: False(lease not yet renewed).Ready: True,connectionDetailspopulated.skipUpstreamStatusSync: truesuppresses downstream→upstream sync, and there is no watch on upstream status changes to re-trigger the mirror path.Ready: Falsewith staleconnectionDetails.Ready: False, marks all routes for that connector as offline → HTTP 503.Evidence
Downstream connector annotation captured while upstream connector was
Ready: True:{ "conditions": [ { "type": "Ready", "status": "False", "reason": "ConnectorNotReady", "message": "Connector lease has expired. Agent may be offline." } ], "connectionDetails": { "publicKey": { "id": "378843c806c8c93c5770abaa19bc47e04e9f56977c6e7cc28044a09ef5a1cd23", ... } } }Upstream connector at the same moment:
{ "type": "Ready", "status": "True", "reason": "ConnectorReady", "message": "The connector is ready to tunnel traffic." }Extension server log confirms all vhosts for this connector remain offline:
Root Cause
replicationResourceConfigfor the Connector type sets:The replicator reconciles on spec changes but has no watch/enqueue path for upstream status changes. When
Readyflips after the initial replication, nothing re-queues the replicator to update the annotation.touchDownstreamGatewayAnnotationsin the connector controller fires to trigger EG re-translation, but the stale annotation means the extension server still classifies the connector as offline even after re-translation.Fix
The replicator should watch upstream connector status changes and re-mirror the annotation when status changes. Alternatively, the connector controller should enqueue a replicator reconcile after writing
Ready: True.