fix: skip hydrateSessionsWithAgentSessions when input sessions is empty#137
Open
chuqk wants to merge 1 commit into
Open
fix: skip hydrateSessionsWithAgentSessions when input sessions is empty#137chuqk wants to merge 1 commit into
chuqk wants to merge 1 commit into
Conversation
5834fa0 to
6dff55f
Compare
When the tmux-query path that produces the `sessions` argument transiently returns an empty array (e.g. a brief tmux server hiccup, or an upstream helper returning [] on a parse path), every active session in the DB fails the `windowSet.has(currentWindow)` check inside hydrateSessionsWithAgentSessions and is treated as orphaned. The post-orphan branch then calls sessionManager.killWindow(currentWindow) for each, which mass-kills the user's working tmux windows in one pass. Observed in production: ten windows were closed inside a 9-second window when `sessions` arrived empty; the agentboard log shows ten back-to-back `session_orphaned` events all reporting `windowSetSize: 0, windowSetSample: []`, each immediately followed by `window_killed`. Add an early-return guard that mirrors the one already present in completeStartupVerification() (only proceed with hydration when the local side is non-empty, otherwise wait for the next refresh to observe a real state). Export the function so a regression test can target it directly, and add the regression test: it seeds five active sessions in the DB and asserts that no `tmux kill-window` is issued when `sessions` is passed in as [].
6dff55f to
99557eb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hydrateSessionsWithAgentSessions(insrc/server/index.ts, ~line 599) walks every active DB session and, for each one whosecurrentWindowis not present in the inputwindowSet, callssessionManager.killWindow(currentWindow).If the
sessionsargument arrives empty while the DB still tracks live sessions, every active session fails the membership check at once and the post-orphan branch issuestmux kill-windowfor each. The user's working tmux windows are closed in a single pass.What I saw in production
I run agentboard against a long-lived tmux server with several tracked Claude Code windows. In one stretch of about nine seconds, agentboard's own log records ten consecutive
session_orphanedevents all reporting:Each is immediately followed by
event: "window_killed". tmux loses all ten windows in a single pass. The DB rows are then orphaned (current_window = NULL).I couldn't reproduce the original empty-input condition deterministically — it appears to be a transient tmux-query failure — but the downstream behaviour (mass kill when the input happens to be
[]) is reproducible in isolation.Fix
Add an early-return guard at the top of
hydrateSessionsWithAgentSessions, mirroring the one already present incompleteStartupVerification()(~ line 1014). That existing guard only proceeds with hydration when bothactiveSessionsandlocalSessionsare non-empty. The runtime refresh paths (lines 825, 852, 894) didn't have the equivalent check.Reproduction test
src/server/__tests__/hydrateSessionsEmptyGuard.test.ts:Bun.spawnSyncso all tmux calls are observable.current_windowvalues.hydrateSessionsWithAgentSessions([])to simulate an empty query result.tmux kill-windowis issued.On the un-patched code this test fails with
Expected length: 0 / Received length: 5— fivekill-windowcalls were issued for the seeded windows. After the guard it passes.The function is exported as part of this PR so the test can target it directly. The new test file is also registered in
scripts/test-runner.ts'sISOLATED_FILESset so itsBun.spawnSync/Bun.serve/setIntervalmocks can't race with other test files at module-load time.Test plan
bun test src/server/__tests__/hydrateSessionsEmptyGuard.test.ts— fails without the guard, passes with it.bun scripts/test-runner.ts— full suite passes locally (no regressions).A note on the failing CI checks
The 8 failures in the
cijob look like pre-existing flakes unrelated to this PR:hydrateSessionsEmptyGuard.test.tstoISOLATED_FILESinscripts/test-runner.ts— isolation reduced the count by 0.bun scripts/test-runner.tsagainst cleanupstream/master(no patch, no new test file) produces 0 failures. Applying the patch + the new test on top still produces 0 failures locally; only the unrelateddouble-attach dedup integrationtest occasionally flakes.That points at something CI-specific (Linux + headless + coverage flag + test-file enumeration order) rather than anything in this PR. The most decisive check would be a fresh CI run on
masterHEAD in this repo — happy to be told I'm missing something obvious.Disclosure
This is a vibe-coder pull request: both the investigation and the patch were driven by an AI coding agent working from my agentboard log and the source. I've read the diff and it looks correct, but I'd appreciate any pushback on the shape of the guard, the wording of the comment, or the scope of the test.
Thank you for maintaining this project — finding the root cause in the logs is exactly the kind of debugging story I love. Happy to iterate on the patch.