fix: de-flake CI test suite (5 root causes) by danshapiro · Pull Request #382 · danshapiro/freshell

danshapiro · 2026-06-02T07:10:50Z

Summary

The Electron Build matrix's "Run tests" step was failing intermittently, which kept the matrix red and prevented the "Build Electron app" step from running. Root-caused five independent flakes (three of them genuine production bugs) and fixed each at the source.

Root causes fixed

Unhandled rejection crashes the whole run (production bug) — createTerminalInvalidationHandler dispatched the refresh thunks fire-and-forget from a timer, but those thunks re-throw on failure, so a transient background-refresh failure became an unhandled promise rejection that fails npm test even though every test "passed". The rejection is now contained in the handler (new onRefreshError dep) and at the App.tsx inventory dispatch site. Each contained rejection is tagged with its source (terminal-directory vs session-window) so the shared onRefreshError logs an accurate label.
AgentChatView reload test: mock-queue leak (test isolation) — the "server-restart recovery" describe had no beforeEach, so a test's getAgentTimelinePage.mockResolvedValueOnce queue leaked into the next test under shuffled order. Added the same mock reset the first describe uses.
AgentChatView reload test: LazyMarkdown Suspense swap race (test only) — assistant bodies render through React.lazy + Suspense; the first render shows a fallback <p> that the lazy chunk later replaces, detaching the node findByText captured before toBeInTheDocument runs. Mocked LazyMarkdown to render MarkdownRenderer synchronously, matching the existing MessageBubble / tool-coalesce / agent-chat-polish-flow tests. Production behaviour is correct.
FreshAgentView redundant snapshot fetch (production bug) — the snapshot-load effect dispatches updatePaneContent to persist its own resolved resumeSessionId/status, and the whole paneContent object (plus those output fields) was in its dependency array, so the self-update retriggered the effect, firing a second snapshot fetch for the same session on every load. Now depends only on the fields that identify which snapshot to load; non-identity fields are read live via paneContentRef.current.
sessions.changed background-refresh unhandled rejection (production bug) — same failure class as fix: upgrade node-pty for Node.js 25 compatibility #1: the App.tsx sessions.changed websocket handler dispatched queueActiveSessionWindowRefresh() fire-and-forget with no .catch(). That thunk re-throws when it falls through to fetchSessionWindow() without committed window data (e.g. a sessions.changed before the sidebar window commits, or after a failed direct fetch retry), so a transient refresh failure leaked an unhandled rejection. Contained with the same .catch(log.debug) guard the inventory site uses, and covered with a regression test that captures unhandledRejection. (Found by independent review of the original four-fix commit.)

Verification

All proven with looped shuffled runs (e.g. fix: resolve package.json path dynamically for production builds #3: 0/120, Claude indexer startup refresh is slow on large datasets #4: 0/40 plus a RED→GREEN single-fetch assertion, Coding CLI indexer refresh runs for minutes and spikes memory #5: RED leak captured → GREEN, 20/20 shuffled).
Full default-config unit suite green: 305 files / 3526 tests.
E2e (sidebar visibility + pane activity): 18 tests green.
Typecheck (client + server) clean; no new lint warnings.
Independently reviewed (gpt-5.5, read-only): PASSED — one debug-log labeling nit, fixed (source-tagged onRefreshError, see fix: upgrade node-pty for Node.js 25 compatibility #1).

Out of scope (follow-up)

Sidebar.tsx and HistoryView.tsx have pre-existing uncontained void dispatch(fetchSessionWindow(...)) fire-and-forget sites (untouched by this branch). They are the same rejection class as #1/#5 but are not implicated in the observed CI flakes. A root fix — making fetchSessionWindow resolve a result instead of throwing — is planned as a separate PR.

CI note

The flaky "Run tests" step lives in electron-build.yml, which only triggers on v* tags or PRs touching electron/** / electron-builder.yml / two bundled-node scripts. This PR touches only src/** and test/**, so the Electron Build matrix does not run here (only typecheck-client does). The fix is therefore verified locally (looped shuffled runs + full default-config suite) and will be exercised in CI on the next electron-path PR or release tag after merge.

Test plan

npm run test:unit full default-config suite (305 files / 3526 tests passed)
e2e sidebar/pane flows (18 passed)
typecheck-client CI check green on this PR
Electron Build matrix "Run tests" step green (runs on next electron-path PR / v* tag)

🤖 Generated with Claude Code

The Electron Build matrix's "Run tests" step failed intermittently from several independent flakes. Root-caused and fixed five: 1. Unhandled rejection crashes the whole run (production bug). createTerminalInvalidationHandler dispatched the refresh thunks fire-and-forget from a timer, but those thunks re-throw on failure, so a transient background-refresh failure became an unhandled promise rejection that fails `npm test` even though every test "passed". Contain the rejection in the handler (new onRefreshError dep) and at the App.tsx inventory dispatch site. Each contained rejection is tagged with its source (terminal-directory vs session-window) so the shared onRefreshError logs an accurate label. 2. AgentChatView reload test: mock-queue leak (test isolation). The "server-restart recovery" describe had no beforeEach, so a test's getAgentTimelinePage.mockResolvedValueOnce queue leaked into the next test under shuffled order. Add the same mock reset the first describe uses. 3. AgentChatView reload test: LazyMarkdown Suspense swap race (test only). Assistant bodies render through React.lazy + Suspense; the first render shows a fallback <p> that the lazy chunk later replaces, detaching the node `findByText` captured before toBeInTheDocument runs. Mock LazyMarkdown to render MarkdownRenderer synchronously, matching MessageBubble/tool-coalesce/ agent-chat-polish-flow tests. (Production behaviour is correct.) 4. FreshAgentView redundant snapshot fetch (production bug). The snapshot-load effect dispatches updatePaneContent to persist its own resolved resumeSessionId/status, and the whole paneContent object (plus those output fields) was in its dependency array — so the self-update retriggered the effect, firing a second snapshot fetch for the same session on every load. Depend only on the fields that identify which snapshot to load; read non-identity fields live via paneContentRef.current. 5. sessions.changed background-refresh unhandled rejection (production bug). Same failure class as #1: the App.tsx sessions.changed websocket handler dispatched queueActiveSessionWindowRefresh() fire-and-forget with no .catch(). That thunk re-throws when it falls through to fetchSessionWindow() without committed window data (e.g. a sessions.changed before the sidebar window commits, or after a failed direct fetch retry), so a transient refresh failure leaked an unhandled rejection. Contain it with the same .catch(log.debug) guard the inventory site uses, and cover it with a regression test that captures unhandledRejection. (Found by independent review of the original four-fix commit.) All proven with looped shuffled runs (e.g. #3: 0/120, #4: 0/40 + RED->GREEN single-fetch assertion, #5: RED leak captured -> GREEN, 20/20 shuffled). Full default-config unit suite green (305 files / 3526 tests). Typecheck clean; no new lint warnings. Independently reviewed (gpt-5.5, read-only): PASSED, one debug-log labeling nit, fixed above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: de-flake CI test suite (5 root causes)#382

fix: de-flake CI test suite (5 root causes)#382
danshapiro wants to merge 1 commit into
mainfrom
fix/terminal-directory-refresh-unhandled-rejection

danshapiro commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danshapiro commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root causes fixed

Verification

Out of scope (follow-up)

CI note

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danshapiro commented Jun 2, 2026 •

edited

Loading