Fix Flaky e2e tests - Update the Cleanup order#31347
Fix Flaky e2e tests - Update the Cleanup order#31347YamunadeviShanmugam wants to merge 1 commit into
Conversation
Improve the cleanup order for fixing flakiness
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughIn ChangesCLI Resource Lifecycle Fixes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gangwgr, YamunadeviShanmugam The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Scheduling required tests: |
|
/retest |
|
@YamunadeviShanmugam: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Fix resourcesToDelete accumulation and premature OAuth token deletion in CLI
This PR fixes two related bugs in test/extended/util/client.go that affect test reliability when multiple tests share a CLI instance.
Bug 1 — resourcesToDelete never cleared after TeardownProject
TeardownProject iterates over c.resourcesToDelete to delete cluster resources at the end of each test, but the slice was never reset. Because a single *CLI instance is shared across all It blocks in a Describe, the slice accumulated every resource registered across the entire suite lifetime. Each subsequent test's TeardownProject would redundantly attempt to delete resources from all prior tests, generating 404 noise in logs and performing unnecessary API calls to the cluster.
Fix:
Reset c.resourcesToDelete to nil at the end of TeardownProject. The slice is always rebuilt from scratch via append when the next test registers resources.
Bug 2 — OAuthAccessToken deleted before framework DeferCleanup runs
GetClientConfigForUser created an OAuthAccessToken and registered it in resourcesToDelete. This meant it was deleted during TeardownProject, which is registered as a g.AfterEach. The Kubernetes framework's namespace cleanup, however, runs via ginkgo.DeferCleanup (registered inside f.BeforeEach), and the documented execution order is:
It block
→ all AfterEach nodes (including TeardownProject) ← token deleted here
→ all DeferCleanups (LIFO) ← framework cleanup here
→ f.AfterEach (namespace deletion)
This meant the bearer token embedded in the returned *rest.Config became invalid before any DeferCleanup callbacks had a chance to run. Any deferred cleanup that tried to make authenticated API calls as that user would receive 401 Unauthorized.
Fix:
Remove the token from resourcesToDelete and instead set ExpiresIn: 86400 (24 hours). The associated User object is still explicitly deleted in TeardownProject, which is sufficient to revoke effective access. The ExpiresIn TTL ensures the token is eventually cleaned up by the OpenShift OAuth token controller without needing explicit deletion.
Latent OIDC test bug fixed as a consequence
test/extended/authentication/oidc.go calls GetClientConfigForUser in a g.BeforeAll and stores the result in oauthUserConfig. This config is then used across two separate ordered It blocks:
"should not accept tokens provided by the OAuth server" (expects Unauthorized)
"should accept tokens provided by the OpenShift OAuth server" (expects 200 OK)
With the old code, TeardownProject deleted the OAuthAccessToken after the first It block. The first test passed for the wrong reason (deleted token → Unauthorized, which matched the expectation, but for the wrong cause). The second test silently used an invalid bearer token. As the current fix keeps the token alive across all ordered It blocks as intended, so both tests validate the correct behavior.
Summary by CodeRabbit