fix(test): stop stress_resize_extremes hanging CI forever (phux-s2iw) by phall1 · Pull Request #49 · phall1/phux

phall1 · 2026-06-03T23:30:23Z

Summary

The e2e CI lane intermittently hung for >100 minutes on both_axes_shrink_storm_under_output_does_not_panic (and latently on resize_degenerate_viewports_do_not_panic), then got cancelled — the main cause of the flaky/failing e2e lane.

Root cause (it was never a deadlock)

The final cap.attach_screen(client.screenshot().await...) drains output "until 20ms of quiet." But these tests' seeds emit forever — every 5ms (printf; sleep 0.005) and 20ms (stty size; sleep 0.02) — so screenshot()'s drain loop never sees a gap and spins indefinitely. It's flaky (not 100%) only because occasional scheduler jitter yields a >20ms gap that lets it break.

Confirmed by stack sample of a hung process: the runtime thread is stuck in ClientHandle::screenshot (builder.rs:430), ~4.6% CPU (parked, not spinning the server). Two earlier hypotheses (client-read backpressure, server resize deadlock) were disproved — draining the client didn't help; the hang is purely the screenshot loop.

Fix (test-harness only, no production code)

Add ClientHandle::drain_output_bounded(max_frames) — a count-bounded drain, safe against a continuously-emitting seed where screenshot() cannot terminate.
Add ClientHandle::snapshot_text() — read the oracle's text without draining.
Replace the hanging screenshot() in both tests with drain_output_bounded(32) + snapshot_text().
The resize storm loops are untouched — they never deadlocked (resize_raw is send-only and always reached the final screenshot).

Validation

both_axes_shrink_storm: 25/25 pass, slowest 1s (was a 100-min hang).
Full file (both tests): 10/10, slowest 2s.
just ci green.

Follow-up (noted in bead phux-s2iw)

The same screenshot()-on-a-fast-continuous-seed anti-pattern is latent in stress_resize_storm.rs and stress_lifecycle_churn.rs (not currently failing CI). Separately, this run surfaced that the CI workflow recompiles the workspace ~4× across jobs and the FlakeHub/nix cache auth is failing — tracked separately.

🤖 Generated with Claude Code

The e2e lane intermittently hung for >100min on `both_axes_shrink_storm_under_output_does_not_panic` (and latently `resize_degenerate_viewports_do_not_panic`). Root cause is NOT a server deadlock: the final `cap.attach_screen(client.screenshot()...)` call drains output "until 20ms of quiet", but these tests' seeds emit every 5ms / 20ms forever, so screenshot's drain loop never sees a gap and spins indefinitely. Flaky only because occasional scheduler jitter yields a >20ms gap. Confirmed by stack sample: runtime thread stuck in `ClientHandle::screenshot` (builder.rs). Add two harness helpers: `drain_output_bounded(max_frames)` (a count-bounded drain that is safe against a continuously-emitting seed) and `snapshot_text()` (read the oracle without draining). Replace the hanging `screenshot()` in both tests with a bounded drain + snapshot. The resize storm loops are left untouched -- they never deadlocked (`resize_raw` is send-only and always reached the final screenshot). Validation: both_axes 25/25 pass, slowest 1s (was a 100min hang); full file both tests 10/10, slowest 2s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

phall1 merged commit a1c6b18 into main Jun 3, 2026
5 of 6 checks passed

phall1 deleted the fix/s2iw-stress-resize-screenshot-hang branch June 3, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(test): stop stress_resize_extremes hanging CI forever (phux-s2iw)#49

fix(test): stop stress_resize_extremes hanging CI forever (phux-s2iw)#49
phall1 merged 1 commit into
mainfrom
fix/s2iw-stress-resize-screenshot-hang

phall1 commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phall1 commented Jun 3, 2026

Summary

Root cause (it was never a deadlock)

Fix (test-harness only, no production code)

Validation

Follow-up (noted in bead phux-s2iw)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant