You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E2E test can lease and connect to exporters is failing in PR #535 with repeated Dial rejected due to exporter status errors, followed by authentication failures with context cancellation.
Test:Core E2E Tests > Lease and connect > can lease and connect to exporters Location:/home/runner/work/jumpstarter/jumpstarter/e2e/test/e2e_test.go:406 Duration: Failed after 31.5 seconds
Key error logs:
1. Repeated Dial rejections with Available status:
2026-04-09T16:50:00Z INFO Dial rejected due to exporter status
{"peer": "10.244.0.1:21066",
"client": {"name":"test-client-oidc","namespace":"jumpstarter-lab"},
"lease": {"name":"019d7326-6a7f-711e-bfed-b5bd8bbd90f4","namespace":"jumpstarter-lab"},
"status": "Available",
"error": "rpc error: code = FailedPrecondition desc = exporter is not ready (status: Available)"}
The context is being canceled before retries can complete
Expected Behavior
The server-side retry logic from PR #440 should handle the transient Available status by retrying the Dial request until the exporter transitions to LeaseReady (within the 3-second retry window).
Actual Behavior
The Dial requests are being rejected repeatedly over a 20-second period, all with the same Available status error. After ~20 seconds, authentication attempts fail with "context canceled", eventually causing a connection failure.
Questions to Investigate
Is the context deadline too short? The "context canceled" error suggests the context might be timing out before the exporter can transition to LeaseReady
Is the server-side retry logic actually executing? (Check controller logs for retry attempts)
Why is the exporter remaining in Available status for 20+ seconds?
Is the authentication failure a symptom or cause of the connection failure?
Is there a timing issue in the client?
Are multiple concurrent Dial attempts exhausting the context deadline?
Summary
E2E test
can lease and connect to exportersis failing in PR #535 with repeatedDial rejected due to exporter statuserrors, followed by authentication failures with context cancellation.Error Details
Failed run: https://github.com/jumpstarter-dev/jumpstarter/actions/runs/24201578910/job/70646355079?pr=535#step:6:698
Test:
Core E2E Tests > Lease and connect > can lease and connect to exportersLocation:
/home/runner/work/jumpstarter/jumpstarter/e2e/test/e2e_test.go:406Duration: Failed after 31.5 seconds
Key error logs:
1. Repeated Dial rejections with Available status:
2. Authentication failure with context canceled:
Final error:
Timeline:
Context
Related PR #440 (MERGED)
PR #440 specifically addressed this race condition by adding server-side retry in the controller's
Dialhandler:Availablestatus (transient state during lease setup)However, the error logs show the Dial is still being rejected — suggesting either:
Availablestatus longer than expectedExpected Behavior
The server-side retry logic from PR #440 should handle the transient
Availablestatus by retrying the Dial request until the exporter transitions toLeaseReady(within the 3-second retry window).Actual Behavior
The Dial requests are being rejected repeatedly over a 20-second period, all with the same
Availablestatus error. After ~20 seconds, authentication attempts fail with "context canceled", eventually causing a connection failure.Questions to Investigate
LeaseReadyAvailablestatus for 20+ seconds?Steps to Reproduce
Run the e2e tests on PR #535:
The test should fail with the Dial rejection errors followed by authentication failures as shown above.
Environment
METHOD=operator(from CI logs)Related: