Skip to content

fix: skip hydrateSessionsWithAgentSessions when input sessions is empty#137

Open
chuqk wants to merge 1 commit into
gbasin:masterfrom
chuqk:fix/hydrate-empty-windowset-guard
Open

fix: skip hydrateSessionsWithAgentSessions when input sessions is empty#137
chuqk wants to merge 1 commit into
gbasin:masterfrom
chuqk:fix/hydrate-empty-windowset-guard

Conversation

@chuqk
Copy link
Copy Markdown
Contributor

@chuqk chuqk commented May 19, 2026

Summary

hydrateSessionsWithAgentSessions (in src/server/index.ts, ~line 599) walks every active DB session and, for each one whose currentWindow is not present in the input windowSet, calls sessionManager.killWindow(currentWindow).

If the sessions argument arrives empty while the DB still tracks live sessions, every active session fails the membership check at once and the post-orphan branch issues tmux kill-window for each. The user's working tmux windows are closed in a single pass.

What I saw in production

I run agentboard against a long-lived tmux server with several tracked Claude Code windows. In one stretch of about nine seconds, agentboard's own log records ten consecutive session_orphaned events all reporting:

windowSetSize: 0
windowSetSample: []
event: "session_orphaned"

Each is immediately followed by event: "window_killed". tmux loses all ten windows in a single pass. The DB rows are then orphaned (current_window = NULL).

I couldn't reproduce the original empty-input condition deterministically — it appears to be a transient tmux-query failure — but the downstream behaviour (mass kill when the input happens to be []) is reproducible in isolation.

Fix

Add an early-return guard at the top of hydrateSessionsWithAgentSessions, mirroring the one already present in completeStartupVerification() (~ line 1014). That existing guard only proceeds with hydration when both activeSessions and localSessions are non-empty. The runtime refresh paths (lines 825, 852, 894) didn't have the equivalent check.

if (activeSessions.length > 0 && sessions.length === 0) {
  logger.warn('hydrate_sessions_skipped_empty_input', {
    activeSessionCount: activeSessions.length,
  })
  return sessions
}

Reproduction test

src/server/__tests__/hydrateSessionsEmptyGuard.test.ts:

  1. Mock Bun.spawnSync so all tmux calls are observable.
  2. Seed five rows into the DB with non-null current_window values.
  3. Call hydrateSessionsWithAgentSessions([]) to simulate an empty query result.
  4. Assert: no tmux kill-window is issued.

On the un-patched code this test fails with Expected length: 0 / Received length: 5 — five kill-window calls were issued for the seeded windows. After the guard it passes.

The function is exported as part of this PR so the test can target it directly. The new test file is also registered in scripts/test-runner.ts's ISOLATED_FILES set so its Bun.spawnSync / Bun.serve / setInterval mocks can't race with other test files at module-load time.

Test plan

  • bun test src/server/__tests__/hydrateSessionsEmptyGuard.test.ts — fails without the guard, passes with it.
  • bun scripts/test-runner.ts — full suite passes locally (no regressions).

A note on the failing CI checks

The 8 failures in the ci job look like pre-existing flakes unrelated to this PR:

  • The same 8 tests fail in the same call sites both before and after adding hydrateSessionsEmptyGuard.test.ts to ISOLATED_FILES in scripts/test-runner.ts — isolation reduced the count by 0.
  • On my local machine, bun scripts/test-runner.ts against clean upstream/master (no patch, no new test file) produces 0 failures. Applying the patch + the new test on top still produces 0 failures locally; only the unrelated double-attach dedup integration test occasionally flakes.
  • Running each failing test file individually with the patch applied also produces 0 failures.

That points at something CI-specific (Linux + headless + coverage flag + test-file enumeration order) rather than anything in this PR. The most decisive check would be a fresh CI run on master HEAD in this repo — happy to be told I'm missing something obvious.

Disclosure

This is a vibe-coder pull request: both the investigation and the patch were driven by an AI coding agent working from my agentboard log and the source. I've read the diff and it looks correct, but I'd appreciate any pushback on the shape of the guard, the wording of the comment, or the scope of the test.

Thank you for maintaining this project — finding the root cause in the logs is exactly the kind of debugging story I love. Happy to iterate on the patch.

@chuqk chuqk force-pushed the fix/hydrate-empty-windowset-guard branch from 5834fa0 to 6dff55f Compare May 19, 2026 13:32
When the tmux-query path that produces the `sessions` argument transiently
returns an empty array (e.g. a brief tmux server hiccup, or an upstream
helper returning [] on a parse path), every active session in the DB fails
the `windowSet.has(currentWindow)` check inside
hydrateSessionsWithAgentSessions and is treated as orphaned. The post-orphan
branch then calls sessionManager.killWindow(currentWindow) for each, which
mass-kills the user's working tmux windows in one pass.

Observed in production: ten windows were closed inside a 9-second window
when `sessions` arrived empty; the agentboard log shows ten back-to-back
`session_orphaned` events all reporting `windowSetSize: 0,
windowSetSample: []`, each immediately followed by `window_killed`.

Add an early-return guard that mirrors the one already present in
completeStartupVerification() (only proceed with hydration when the local
side is non-empty, otherwise wait for the next refresh to observe a real
state). Export the function so a regression test can target it directly,
and add the regression test: it seeds five active sessions in the DB and
asserts that no `tmux kill-window` is issued when `sessions` is passed in
as [].
@chuqk chuqk force-pushed the fix/hydrate-empty-windowset-guard branch from 6dff55f to 99557eb Compare May 19, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant