fix: de-flake CI test suite (5 root causes)#382
Open
danshapiro wants to merge 1 commit into
Open
Conversation
The Electron Build matrix's "Run tests" step failed intermittently from several independent flakes. Root-caused and fixed five: 1. Unhandled rejection crashes the whole run (production bug). createTerminalInvalidationHandler dispatched the refresh thunks fire-and-forget from a timer, but those thunks re-throw on failure, so a transient background-refresh failure became an unhandled promise rejection that fails `npm test` even though every test "passed". Contain the rejection in the handler (new onRefreshError dep) and at the App.tsx inventory dispatch site. Each contained rejection is tagged with its source (terminal-directory vs session-window) so the shared onRefreshError logs an accurate label. 2. AgentChatView reload test: mock-queue leak (test isolation). The "server-restart recovery" describe had no beforeEach, so a test's getAgentTimelinePage.mockResolvedValueOnce queue leaked into the next test under shuffled order. Add the same mock reset the first describe uses. 3. AgentChatView reload test: LazyMarkdown Suspense swap race (test only). Assistant bodies render through React.lazy + Suspense; the first render shows a fallback <p> that the lazy chunk later replaces, detaching the node `findByText` captured before toBeInTheDocument runs. Mock LazyMarkdown to render MarkdownRenderer synchronously, matching MessageBubble/tool-coalesce/ agent-chat-polish-flow tests. (Production behaviour is correct.) 4. FreshAgentView redundant snapshot fetch (production bug). The snapshot-load effect dispatches updatePaneContent to persist its own resolved resumeSessionId/status, and the whole paneContent object (plus those output fields) was in its dependency array — so the self-update retriggered the effect, firing a second snapshot fetch for the same session on every load. Depend only on the fields that identify which snapshot to load; read non-identity fields live via paneContentRef.current. 5. sessions.changed background-refresh unhandled rejection (production bug). Same failure class as #1: the App.tsx sessions.changed websocket handler dispatched queueActiveSessionWindowRefresh() fire-and-forget with no .catch(). That thunk re-throws when it falls through to fetchSessionWindow() without committed window data (e.g. a sessions.changed before the sidebar window commits, or after a failed direct fetch retry), so a transient refresh failure leaked an unhandled rejection. Contain it with the same .catch(log.debug) guard the inventory site uses, and cover it with a regression test that captures unhandledRejection. (Found by independent review of the original four-fix commit.) All proven with looped shuffled runs (e.g. #3: 0/120, #4: 0/40 + RED->GREEN single-fetch assertion, #5: RED leak captured -> GREEN, 20/20 shuffled). Full default-config unit suite green (305 files / 3526 tests). Typecheck clean; no new lint warnings. Independently reviewed (gpt-5.5, read-only): PASSED, one debug-log labeling nit, fixed above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Electron Build matrix's "Run tests" step was failing intermittently, which kept the matrix red and prevented the "Build Electron app" step from running. Root-caused five independent flakes (three of them genuine production bugs) and fixed each at the source.
Root causes fixed
Unhandled rejection crashes the whole run (production bug) —
createTerminalInvalidationHandlerdispatched the refresh thunks fire-and-forget from a timer, but those thunks re-throw on failure, so a transient background-refresh failure became an unhandled promise rejection that failsnpm testeven though every test "passed". The rejection is now contained in the handler (newonRefreshErrordep) and at theApp.tsxinventory dispatch site. Each contained rejection is tagged with its source (terminal-directoryvssession-window) so the sharedonRefreshErrorlogs an accurate label.AgentChatView reload test: mock-queue leak (test isolation) — the "server-restart recovery" describe had no
beforeEach, so a test'sgetAgentTimelinePage.mockResolvedValueOncequeue leaked into the next test under shuffled order. Added the same mock reset the first describe uses.AgentChatView reload test: LazyMarkdown Suspense swap race (test only) — assistant bodies render through
React.lazy+Suspense; the first render shows a fallback<p>that the lazy chunk later replaces, detaching the nodefindByTextcaptured beforetoBeInTheDocumentruns. MockedLazyMarkdownto renderMarkdownRenderersynchronously, matching the existing MessageBubble / tool-coalesce / agent-chat-polish-flow tests. Production behaviour is correct.FreshAgentView redundant snapshot fetch (production bug) — the snapshot-load effect dispatches
updatePaneContentto persist its own resolvedresumeSessionId/status, and the wholepaneContentobject (plus those output fields) was in its dependency array, so the self-update retriggered the effect, firing a second snapshot fetch for the same session on every load. Now depends only on the fields that identify which snapshot to load; non-identity fields are read live viapaneContentRef.current.sessions.changedbackground-refresh unhandled rejection (production bug) — same failure class as fix: upgrade node-pty for Node.js 25 compatibility #1: theApp.tsxsessions.changedwebsocket handler dispatchedqueueActiveSessionWindowRefresh()fire-and-forget with no.catch(). That thunk re-throws when it falls through tofetchSessionWindow()without committed window data (e.g. asessions.changedbefore the sidebar window commits, or after a failed direct fetch retry), so a transient refresh failure leaked an unhandled rejection. Contained with the same.catch(log.debug)guard the inventory site uses, and covered with a regression test that capturesunhandledRejection. (Found by independent review of the original four-fix commit.)Verification
onRefreshError, see fix: upgrade node-pty for Node.js 25 compatibility #1).Out of scope (follow-up)
Sidebar.tsxandHistoryView.tsxhave pre-existing uncontainedvoid dispatch(fetchSessionWindow(...))fire-and-forget sites (untouched by this branch). They are the same rejection class as #1/#5 but are not implicated in the observed CI flakes. A root fix — makingfetchSessionWindowresolve a result instead of throwing — is planned as a separate PR.CI note
The flaky "Run tests" step lives in
electron-build.yml, which only triggers onv*tags or PRs touchingelectron/**/electron-builder.yml/ two bundled-node scripts. This PR touches onlysrc/**andtest/**, so the Electron Build matrix does not run here (onlytypecheck-clientdoes). The fix is therefore verified locally (looped shuffled runs + full default-config suite) and will be exercised in CI on the next electron-path PR or release tag after merge.Test plan
npm run test:unitfull default-config suite (305 files / 3526 tests passed)typecheck-clientCI check green on this PRv*tag)🤖 Generated with Claude Code