Skip to content

Fix flaky awaitIdle in integration tests#1558

Open
gnodet wants to merge 1 commit intomasterfrom
fix-flaky-awaitIdle
Open

Fix flaky awaitIdle in integration tests#1558
gnodet wants to merge 1 commit intomasterfrom
fix-flaky-awaitIdle

Conversation

@gnodet
Copy link
Copy Markdown
Contributor

@gnodet gnodet commented Mar 17, 2026

Summary

  • Add 100ms sleep between polling iterations in TestRegistry.awaitIdle() to avoid tight busy-loop spinning on file lock acquisition
  • Increase timeout from 5s to 30s to accommodate slow CI environments

Problem

StopStatusTest.stopStatus (and other tests using awaitIdle) intermittently fail on macOS ARM64 CI runners with:

java.lang.AssertionError: Daemon <id> should have become idle within 5000

Root cause: awaitIdle() was a tight busy-loop calling getAll() on every iteration. Each getAll() call acquires a file lock on the daemon registry, causing contention. The 5000ms timeout was too short for CI runners under load.

Test plan

  • Builds successfully
  • The fix reduces lock contention and provides a more generous timeout, making the test robust on slow CI environments without significantly increasing test duration (the sleep only applies while waiting)

🤖 Generated with Claude Code

The awaitIdle() method had two problems causing flaky test failures
(especially on macOS ARM64 CI runners):

1. Tight busy-loop with no sleep - spins continuously on getAll()
   which acquires a file lock each iteration, causing contention
   and wasted CPU
2. 5000ms timeout too short for CI environments under load

Fix by adding a 100ms sleep between polls and increasing the timeout
to 30 seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant