Skip to content

test: fix flaky CI failures for the arc-geneva-poc branch#1271

Open
ryanzhang-oss wants to merge 1 commit intoAzure:sehobbs/fleet-arc-geneva-pocfrom
ryanzhang-oss:fix/ci-for-arc-geneva-poc
Open

test: fix flaky CI failures for the arc-geneva-poc branch#1271
ryanzhang-oss wants to merge 1 commit intoAzure:sehobbs/fleet-arc-geneva-pocfrom
ryanzhang-oss:fix/ci-for-arc-geneva-poc

Conversation

@ryanzhang-oss
Copy link
Contributor

Applies the two CI flakiness fixes from #1270 onto the sehobbs/fleet-arc-geneva-poc branch so that PR #1269 CI passes.

Fixes included

1. e2e cost property tolerance (test/e2e/utils_test.go)
The e2e-tests (custom) BeforeSuite was failing with:

member cluster per CPU core cost property diff: got=0.141000, want=0.143000, diff=0.002000

The diff of exactly 0.002 hit the strict > 0.002 boundary. Widened tolerance to 0.005 to absorb Azure Retail Prices API fluctuations.

2. workapplier AfterSuite teardown timeout (pkg/controllers/workapplier/suite_test.go)
All 290 specs pass but AfterSuite fails with:

failed waiting for all runnables to end within grace period of 30s: context deadline exceeded

Four concurrent managers can't drain within 30s on a loaded CI runner. Set GracefulShutdownTimeout: 2*time.Minute on all 4 managers.

cc @Ealianis — this should unblock the CI on #1269.

Two separate CI flakiness fixes:

1. pkg/controllers/workapplier/suite_test.go:
   Increase GracefulShutdownTimeout from the default 30s to 2 minutes
   for all four controller managers in the integration test suite. With
   four managers running concurrently (each with multiple controllers),
   the default 30s grace period is insufficient to drain all runnables
   on a loaded CI runner, causing AfterSuite teardown to fail with
   'context deadline exceeded' even though all 290 specs pass.

2. test/e2e/utils_test.go:
   Widen the per-CPU-core and per-GB-memory cost property tolerance
   from 0.002 to 0.005. The Azure Retail Prices API can return values
   that differ from the locally-computed expected value by exactly
   0.002 (e.g. got=0.141, want=0.143), which hits the strict boundary
   of the original threshold and causes BeforeSuite to fail, aborting
   the entire custom e2e suite. A margin of 0.005 provides sufficient
   headroom while still catching genuine property provider bugs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant