Flaky test report: committed-code failures on 2026-05-05
Summary
7 distinct tests failed against committed code (Timer or Post Merge Action builds) in the past 24 hours. None reproduced locally with the original seed, confirming these are non-deterministic (timing/environment-dependent) flakes.
Summary Table (sorted by total builds affected)
| Test |
Builds Affected |
First Failure |
Recent Build |
Reproduced? |
Pattern |
| IndexingIT.testIndexingWithSegRep |
250 |
2024-03-25 |
75816 |
No |
Stable/chronic |
| SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase |
177 |
2024-04-04 |
75732 |
No |
Worsening (spike Nov 2025, elevated since) |
| ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting |
151 |
2024-03-26 |
75748 |
No |
Stable/chronic |
| FlightMetricsTests.testComprehensiveMetrics |
70 |
2025-07-25 |
75757 |
No |
Worsening (increasing since Mar 2026) |
| EhCacheDiskCacheTests.testComputeIfAbsentConcurrently |
58 |
2024-03-28 |
75765 |
No |
Worsening (spike Apr 2026, likely CPU-speed amplification) |
| CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled |
27 |
2024-04-11 |
75726 |
No |
Stable/low-rate chronic |
| SimpleSearchIT.testIndexOnlyFloatField |
14 |
2026-04-17 |
75821 |
No |
New (appeared after m7a.8xlarge migration) |
Detailed Findings
1. IndexingIT.testIndexingWithSegRep
- Module:
qa/rolling-upgrade
- Build: 75816, 75822
- Error:
java.lang.AssertionError: expected:<0> but was:<1>
- Seed:
4844AA16BEBC4FA6:C73DBC5B4AE5BA67 (build 75816)
- Reproduced locally: No
- Pattern: Chronic flake since March 2024. Consistently 4-18 builds/month. This is a rolling-upgrade BWC test that exercises segment replication during version upgrades — inherently timing-sensitive.
2. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
- Module:
qa/smoke-test-http
- Build: 75732, 75748
- Error:
java.lang.AssertionError (assertBusy timeout waiting for task cancellation)
- Seed:
6A36228CCEF16B01:BC2F83475106354E (build 75732)
- Reproduced locally: No
- Pattern: Chronic since April 2024. Notable spike to 41 builds in Nov 2025. Elevated at 16 builds in Apr 2026. The test uses
assertBusy to wait for search task cancellation — a race between cancellation propagation and the assertion timeout.
3. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
- Module:
server (internalClusterTest)
- Build: 75748
- Error:
java.lang.AssertionError: expected:<0> but was:<2> (in waitForTwoOutstandingRequests)
- Seed:
7524E0B2F5E0DDEE
- Reproduced locally: No
- Pattern: Chronic since March 2024. Steady 1-13 builds/month. The test waits for a specific number of outstanding indexing requests — a classic timing-dependent assertion.
4. FlightMetricsTests.testComprehensiveMetrics
- Module:
plugins/arrow-flight-rpc
- Build: 75757
- Error:
org.opensearch.transport.BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='29301'}
- Seed:
5B36E5FD98D10D33:754D814B831D2E8D
- Reproduced locally: No
- Pattern: Present since July 2025. Increasing from 4-6 builds/month to 9-11 builds/month in Mar-Apr 2026. Port binding failure suggests resource contention on CI runners — not a code bug.
5. EhCacheDiskCacheTests.testComputeIfAbsentConcurrently
- Module:
plugins/cache-ehcache
- Build: 75765
- Error:
java.lang.AssertionError: expected:<1> but was:<2>
- Seed:
664318BFDBF94843:E7F86F8B25B6002D
- Reproduced locally: No
- Pattern: Chronic since March 2024 at low rate (1-5 builds/month), but spiked to 14 builds in April 2026. The April spike correlates with the m7a.8xlarge CI runner migration — faster CPUs likely amplify the concurrency race in this test.
6. CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled
- Module:
server (internalClusterTest)
- Build: 75726
- Error:
java.lang.AssertionError: Expected: is <10> but: was <9>
- Seed:
86DB09E0377233F2:5628D27E227A2FB7
- Reproduced locally: No
- Pattern: Low-rate chronic flake since April 2024. Only 1-4 builds/month. The assertion expects exactly 10 documents in a snapshot but gets 9 — likely a race between indexing and snapshot creation.
7. SimpleSearchIT.testIndexOnlyFloatField
- Module:
server (internalClusterTest)
- Build: 75821
- Error:
java.lang.AssertionError: expected:<1> but was:<0>
- Seed:
CA6DEBF7601D0A1C:A84306B916CF2687
- Reproduced locally: No
- Pattern: New flake — first appeared 2026-04-17, exactly when CI runners moved to m7a.8xlarge. 9 builds in April, 5 in first 5 days of May. This is a strong candidate for CPU-speed amplification causing a refresh/visibility race.
Reproduction Method
Each test was run locally with its original seed:
./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>
All 7 passed on first attempt, confirming the failures are environment-dependent (thread scheduling, port availability, or timing windows that differ between CI and local dev).
Notes
- The April 2026 m7a.8xlarge CI runner migration correlates with increased failure rates for EhCacheDiskCacheTests and the emergence of SimpleSearchIT as a new flake.
- FlightMetricsTests failures are port-binding issues (infrastructure), not logic bugs.
- The top 3 tests (IndexingIT, SearchRestCancellationIT, ShardIndexingPressureSettingsIT) are chronic flakes that have been failing for over 2 years.
Flaky test report: committed-code failures on 2026-05-05
Summary
7 distinct tests failed against committed code (Timer or Post Merge Action builds) in the past 24 hours. None reproduced locally with the original seed, confirming these are non-deterministic (timing/environment-dependent) flakes.
Summary Table (sorted by total builds affected)
Detailed Findings
1. IndexingIT.testIndexingWithSegRep
qa/rolling-upgradejava.lang.AssertionError: expected:<0> but was:<1>4844AA16BEBC4FA6:C73DBC5B4AE5BA67(build 75816)2. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
qa/smoke-test-httpjava.lang.AssertionError(assertBusy timeout waiting for task cancellation)6A36228CCEF16B01:BC2F83475106354E(build 75732)assertBusyto wait for search task cancellation — a race between cancellation propagation and the assertion timeout.3. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
server(internalClusterTest)java.lang.AssertionError: expected:<0> but was:<2>(inwaitForTwoOutstandingRequests)7524E0B2F5E0DDEE4. FlightMetricsTests.testComprehensiveMetrics
plugins/arrow-flight-rpcorg.opensearch.transport.BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='29301'}5B36E5FD98D10D33:754D814B831D2E8D5. EhCacheDiskCacheTests.testComputeIfAbsentConcurrently
plugins/cache-ehcachejava.lang.AssertionError: expected:<1> but was:<2>664318BFDBF94843:E7F86F8B25B6002D6. CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled
server(internalClusterTest)java.lang.AssertionError: Expected: is <10> but: was <9>86DB09E0377233F2:5628D27E227A2FB77. SimpleSearchIT.testIndexOnlyFloatField
server(internalClusterTest)java.lang.AssertionError: expected:<1> but was:<0>CA6DEBF7601D0A1C:A84306B916CF2687Reproduction Method
Each test was run locally with its original seed:
All 7 passed on first attempt, confirming the failures are environment-dependent (thread scheduling, port availability, or timing windows that differ between CI and local dev).
Notes