Skip to content

fix: de-flake CI test suite (5 root causes)#382

Open
danshapiro wants to merge 1 commit into
mainfrom
fix/terminal-directory-refresh-unhandled-rejection
Open

fix: de-flake CI test suite (5 root causes)#382
danshapiro wants to merge 1 commit into
mainfrom
fix/terminal-directory-refresh-unhandled-rejection

Conversation

@danshapiro
Copy link
Copy Markdown
Owner

@danshapiro danshapiro commented Jun 2, 2026

Summary

The Electron Build matrix's "Run tests" step was failing intermittently, which kept the matrix red and prevented the "Build Electron app" step from running. Root-caused five independent flakes (three of them genuine production bugs) and fixed each at the source.

Root causes fixed

  1. Unhandled rejection crashes the whole run (production bug)createTerminalInvalidationHandler dispatched the refresh thunks fire-and-forget from a timer, but those thunks re-throw on failure, so a transient background-refresh failure became an unhandled promise rejection that fails npm test even though every test "passed". The rejection is now contained in the handler (new onRefreshError dep) and at the App.tsx inventory dispatch site. Each contained rejection is tagged with its source (terminal-directory vs session-window) so the shared onRefreshError logs an accurate label.

  2. AgentChatView reload test: mock-queue leak (test isolation) — the "server-restart recovery" describe had no beforeEach, so a test's getAgentTimelinePage.mockResolvedValueOnce queue leaked into the next test under shuffled order. Added the same mock reset the first describe uses.

  3. AgentChatView reload test: LazyMarkdown Suspense swap race (test only) — assistant bodies render through React.lazy + Suspense; the first render shows a fallback <p> that the lazy chunk later replaces, detaching the node findByText captured before toBeInTheDocument runs. Mocked LazyMarkdown to render MarkdownRenderer synchronously, matching the existing MessageBubble / tool-coalesce / agent-chat-polish-flow tests. Production behaviour is correct.

  4. FreshAgentView redundant snapshot fetch (production bug) — the snapshot-load effect dispatches updatePaneContent to persist its own resolved resumeSessionId/status, and the whole paneContent object (plus those output fields) was in its dependency array, so the self-update retriggered the effect, firing a second snapshot fetch for the same session on every load. Now depends only on the fields that identify which snapshot to load; non-identity fields are read live via paneContentRef.current.

  5. sessions.changed background-refresh unhandled rejection (production bug) — same failure class as fix: upgrade node-pty for Node.js 25 compatibility #1: the App.tsx sessions.changed websocket handler dispatched queueActiveSessionWindowRefresh() fire-and-forget with no .catch(). That thunk re-throws when it falls through to fetchSessionWindow() without committed window data (e.g. a sessions.changed before the sidebar window commits, or after a failed direct fetch retry), so a transient refresh failure leaked an unhandled rejection. Contained with the same .catch(log.debug) guard the inventory site uses, and covered with a regression test that captures unhandledRejection. (Found by independent review of the original four-fix commit.)

Verification

Out of scope (follow-up)

Sidebar.tsx and HistoryView.tsx have pre-existing uncontained void dispatch(fetchSessionWindow(...)) fire-and-forget sites (untouched by this branch). They are the same rejection class as #1/#5 but are not implicated in the observed CI flakes. A root fix — making fetchSessionWindow resolve a result instead of throwing — is planned as a separate PR.

CI note

The flaky "Run tests" step lives in electron-build.yml, which only triggers on v* tags or PRs touching electron/** / electron-builder.yml / two bundled-node scripts. This PR touches only src/** and test/**, so the Electron Build matrix does not run here (only typecheck-client does). The fix is therefore verified locally (looped shuffled runs + full default-config suite) and will be exercised in CI on the next electron-path PR or release tag after merge.

Test plan

  • npm run test:unit full default-config suite (305 files / 3526 tests passed)
  • e2e sidebar/pane flows (18 passed)
  • typecheck-client CI check green on this PR
  • Electron Build matrix "Run tests" step green (runs on next electron-path PR / v* tag)

🤖 Generated with Claude Code

The Electron Build matrix's "Run tests" step failed intermittently from
several independent flakes. Root-caused and fixed five:

1. Unhandled rejection crashes the whole run (production bug).
   createTerminalInvalidationHandler dispatched the refresh thunks
   fire-and-forget from a timer, but those thunks re-throw on failure, so a
   transient background-refresh failure became an unhandled promise rejection
   that fails `npm test` even though every test "passed". Contain the
   rejection in the handler (new onRefreshError dep) and at the App.tsx
   inventory dispatch site. Each contained rejection is tagged with its source
   (terminal-directory vs session-window) so the shared onRefreshError logs an
   accurate label.

2. AgentChatView reload test: mock-queue leak (test isolation).
   The "server-restart recovery" describe had no beforeEach, so a test's
   getAgentTimelinePage.mockResolvedValueOnce queue leaked into the next test
   under shuffled order. Add the same mock reset the first describe uses.

3. AgentChatView reload test: LazyMarkdown Suspense swap race (test only).
   Assistant bodies render through React.lazy + Suspense; the first render
   shows a fallback <p> that the lazy chunk later replaces, detaching the node
   `findByText` captured before toBeInTheDocument runs. Mock LazyMarkdown to
   render MarkdownRenderer synchronously, matching MessageBubble/tool-coalesce/
   agent-chat-polish-flow tests. (Production behaviour is correct.)

4. FreshAgentView redundant snapshot fetch (production bug).
   The snapshot-load effect dispatches updatePaneContent to persist its own
   resolved resumeSessionId/status, and the whole paneContent object (plus
   those output fields) was in its dependency array — so the self-update
   retriggered the effect, firing a second snapshot fetch for the same session
   on every load. Depend only on the fields that identify which snapshot to
   load; read non-identity fields live via paneContentRef.current.

5. sessions.changed background-refresh unhandled rejection (production bug).
   Same failure class as #1: the App.tsx sessions.changed websocket handler
   dispatched queueActiveSessionWindowRefresh() fire-and-forget with no
   .catch(). That thunk re-throws when it falls through to fetchSessionWindow()
   without committed window data (e.g. a sessions.changed before the sidebar
   window commits, or after a failed direct fetch retry), so a transient
   refresh failure leaked an unhandled rejection. Contain it with the same
   .catch(log.debug) guard the inventory site uses, and cover it with a
   regression test that captures unhandledRejection. (Found by independent
   review of the original four-fix commit.)

All proven with looped shuffled runs (e.g. #3: 0/120, #4: 0/40 + RED->GREEN
single-fetch assertion, #5: RED leak captured -> GREEN, 20/20 shuffled).
Full default-config unit suite green (305 files / 3526 tests). Typecheck
clean; no new lint warnings. Independently reviewed (gpt-5.5, read-only):
PASSED, one debug-log labeling nit, fixed above.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants