Skip to content

Flaky test report: committed-code failures on 2026-05-05 #256

@andrross

Description

@andrross

Flaky test report: committed-code failures on 2026-05-05

Summary

7 distinct tests failed against committed code (Timer or Post Merge Action builds) in the past 24 hours. None reproduced locally with the original seed, confirming these are non-deterministic (timing/environment-dependent) flakes.

Summary Table (sorted by total builds affected)

Test Builds Affected First Failure Recent Build Reproduced? Pattern
IndexingIT.testIndexingWithSegRep 250 2024-03-25 75816 No Stable/chronic
SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase 177 2024-04-04 75732 No Worsening (spike Nov 2025, elevated since)
ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting 151 2024-03-26 75748 No Stable/chronic
FlightMetricsTests.testComprehensiveMetrics 70 2025-07-25 75757 No Worsening (increasing since Mar 2026)
EhCacheDiskCacheTests.testComputeIfAbsentConcurrently 58 2024-03-28 75765 No Worsening (spike Apr 2026, likely CPU-speed amplification)
CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled 27 2024-04-11 75726 No Stable/low-rate chronic
SimpleSearchIT.testIndexOnlyFloatField 14 2026-04-17 75821 No New (appeared after m7a.8xlarge migration)

Detailed Findings

1. IndexingIT.testIndexingWithSegRep

  • Module: qa/rolling-upgrade
  • Build: 75816, 75822
  • Error: java.lang.AssertionError: expected:<0> but was:<1>
  • Seed: 4844AA16BEBC4FA6:C73DBC5B4AE5BA67 (build 75816)
  • Reproduced locally: No
  • Pattern: Chronic flake since March 2024. Consistently 4-18 builds/month. This is a rolling-upgrade BWC test that exercises segment replication during version upgrades — inherently timing-sensitive.

2. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

  • Module: qa/smoke-test-http
  • Build: 75732, 75748
  • Error: java.lang.AssertionError (assertBusy timeout waiting for task cancellation)
  • Seed: 6A36228CCEF16B01:BC2F83475106354E (build 75732)
  • Reproduced locally: No
  • Pattern: Chronic since April 2024. Notable spike to 41 builds in Nov 2025. Elevated at 16 builds in Apr 2026. The test uses assertBusy to wait for search task cancellation — a race between cancellation propagation and the assertion timeout.

3. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting

  • Module: server (internalClusterTest)
  • Build: 75748
  • Error: java.lang.AssertionError: expected:<0> but was:<2> (in waitForTwoOutstandingRequests)
  • Seed: 7524E0B2F5E0DDEE
  • Reproduced locally: No
  • Pattern: Chronic since March 2024. Steady 1-13 builds/month. The test waits for a specific number of outstanding indexing requests — a classic timing-dependent assertion.

4. FlightMetricsTests.testComprehensiveMetrics

  • Module: plugins/arrow-flight-rpc
  • Build: 75757
  • Error: org.opensearch.transport.BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='29301'}
  • Seed: 5B36E5FD98D10D33:754D814B831D2E8D
  • Reproduced locally: No
  • Pattern: Present since July 2025. Increasing from 4-6 builds/month to 9-11 builds/month in Mar-Apr 2026. Port binding failure suggests resource contention on CI runners — not a code bug.

5. EhCacheDiskCacheTests.testComputeIfAbsentConcurrently

  • Module: plugins/cache-ehcache
  • Build: 75765
  • Error: java.lang.AssertionError: expected:<1> but was:<2>
  • Seed: 664318BFDBF94843:E7F86F8B25B6002D
  • Reproduced locally: No
  • Pattern: Chronic since March 2024 at low rate (1-5 builds/month), but spiked to 14 builds in April 2026. The April spike correlates with the m7a.8xlarge CI runner migration — faster CPUs likely amplify the concurrency race in this test.

6. CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled

  • Module: server (internalClusterTest)
  • Build: 75726
  • Error: java.lang.AssertionError: Expected: is <10> but: was <9>
  • Seed: 86DB09E0377233F2:5628D27E227A2FB7
  • Reproduced locally: No
  • Pattern: Low-rate chronic flake since April 2024. Only 1-4 builds/month. The assertion expects exactly 10 documents in a snapshot but gets 9 — likely a race between indexing and snapshot creation.

7. SimpleSearchIT.testIndexOnlyFloatField

  • Module: server (internalClusterTest)
  • Build: 75821
  • Error: java.lang.AssertionError: expected:<1> but was:<0>
  • Seed: CA6DEBF7601D0A1C:A84306B916CF2687
  • Reproduced locally: No
  • Pattern: New flake — first appeared 2026-04-17, exactly when CI runners moved to m7a.8xlarge. 9 builds in April, 5 in first 5 days of May. This is a strong candidate for CPU-speed amplification causing a refresh/visibility race.

Reproduction Method

Each test was run locally with its original seed:

./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>

All 7 passed on first attempt, confirming the failures are environment-dependent (thread scheduling, port availability, or timing windows that differ between CI and local dev).

Notes

  • The April 2026 m7a.8xlarge CI runner migration correlates with increased failure rates for EhCacheDiskCacheTests and the emergence of SimpleSearchIT as a new flake.
  • FlightMetricsTests failures are port-binding issues (infrastructure), not logic bugs.
  • The top 3 tests (IndexingIT, SearchRestCancellationIT, ShardIndexingPressureSettingsIT) are chronic flakes that have been failing for over 2 years.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions