Skip to content

feat(server): single-window tab-group routing + simplified ScreencastManager#1074

Merged
shivammittal274 merged 3 commits into
mainfrom
feat/single-window-tabgroups
May 30, 2026
Merged

feat(server): single-window tab-group routing + simplified ScreencastManager#1074
shivammittal274 merged 3 commits into
mainfrom
feat/single-window-tabgroups

Conversation

@shivammittal274
Copy link
Copy Markdown
Contributor

Summary

Two commits enabling the agent-company "one window, N agents, N tab groups" host model:

  • feat(server): per-agent tab group routing for page-creating toolsnew_page, new_hidden_page, show_page, move_page now accept an optional tabGroupId and call groupTabs after page creation. Pairs with X-BrowserOS-Default-Tab-Group-Id so hosts can pin every page-creating tool call from a given agent into a specific tab group without the agent picking groups itself.
  • refactor(server): simplify ScreencastManager to a singleton + bringToFront wake-up — drops the sessions map / pendingStarts / primeWithScreenshot / lastFrame / retry-loop complexity; one active session at a time, tearDown displaces the prior subscriber, and Page.bringToFront is called right before Page.startScreencast so backgrounded tabs actually composite. Subscribe also catches Unknown page and emits a clean detached status instead of letting the route close with 1011 (which spirals EventSource reconnects).

Why bringToFront

Page.startScreencast on a tab whose RenderWidget isn't visible returns zero frames — the compositor is paused. POC confirmed setWebLifecycleState('active') alone is not sufficient; bringToFront is the lightest wake-up that delivers frames. Side-effect: BrowserOS visibly switches its frontmost tab and steals macOS focus, so the connected status is emitted right after bringToFront to give the agent-company SSE proxy an early signal to counter-activate Electron.

Test plan

  • bun run typecheck clean
  • WS subscribe to a backgrounded tab → frames flow after bringToFront (was 0 before)
  • WS subscribe to a non-existent pageId → single detached status, no spiral
  • Cross-target subscribe sequence (A → B → A) → singleton displacement, no cross-talk
  • Tested end-to-end with agent-company's feat/single-window-and-screencast PR

Related

Companion PR: browseros-ai/agent-company#104 — single app window + per-pane screencast + visibility toggle that consume this server-side foundation.

🤖 Generated with Claude Code

shivammittal274 and others added 2 commits May 29, 2026 22:57
When the host pins the request to a specific tab group via the new
X-BrowserOS-Default-Tab-Group-Id header, every tab created by the
agent is deterministically placed in that group. Pair with the
existing X-BrowserOS-Default-Window-Id, hosts can model "one window,
N agents, N tab groups" without the agent picking groups itself.

Wiring:
- Four page-creating tools (new_page, new_hidden_page, show_page,
  move_page) accept an optional tabGroupId and call groupTabs after
  their main op. Best-effort: a stale groupId logs a warn but does
  not fail the tool.
- routes/mcp.ts reads the header and threads it through
  createMcpServer → registerTools.
- registerTools auto-injects tabGroupId the same way it injects
  windowId today: only when the host set it, the tool's schema
  accepts it, and the caller didn't explicitly pick a value.
- When defaultWindowId is set (single-window mode), four
  window-mutating tools (create_window, create_hidden_window,
  close_window, set_window_visibility) are filtered out of the
  registered surface so an agent can't break the host's invariant.

Existing callers that don't set either header keep their behaviour
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Front wake-up

The previous manager maintained a sessions map keyed by targetId with
shared subscribers, a pendingStarts dedupe, a primeWithScreenshot
lane, lastFrame cache, and a `disposed` retry loop. In practice the
agent-company renderer subscribes one pane at a time per install, and
shared sessions surfaced subtle frame cross-talk while late
subscribers got blank canvases on static pages.

Replaces all of that with a singleton — `active: ScreencastSession |
null`. A new subscribe `tearDown`s any prior session (sending
`detached` to the displaced ws) and registers a fresh
`Page.screencastFrame` listener bound to the new ws. Hidden /
backgrounded tabs don't composite — `Page.startScreencast` returns
zero frames on a tab whose RenderWidget isn't visible. So just before
`Page.startScreencast`, the manager calls `Page.bringToFront` to
foreground the tab in its window. `setWebLifecycleState('active')` is
not sufficient — verified empirically.

Subscribe also catches `Unknown page` from the resolve step and
emits a clean `detached` status instead of letting the route close
with 1011 (which would spiral EventSource reconnects).

The `connected` status is emitted AFTER `bringToFront` resolves so
the agent-company SSE proxy can use the first WS message as the
focus-restore signal — bringToFront steals macOS focus to BrowserOS
and the proxy fires a counter-activate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 30, 2026

❌ Tests failed — 1/1064 failed

Suite Passed Failed Skipped
agent 91/91 0 0
build 10/10 0 0
eval 95/95 0 0
server-agent 246/246 0 0
server-api 63/64 1 0
server-browser 4/4 0 0
server-integration 9/10 0 1
server-lib 252/253 0 1
server-root 59/62 0 3
server-tools 229/229 0 0
Failed tests
  • server-apiScreencastManager > subscribes, emits frames, displaces a prior subscriber, and stops on unsubscribe

View workflow run

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR introduces two capabilities: per-agent tab-group routing for page-creating MCP tools (new_page, new_hidden_page, show_page, move_page), and a simplified singleton ScreencastManager that replaces the sessions-map/pendingStarts/lastFrame/retry-loop model with a single active session, bringToFront compositor wake-up, and clean detached error handling.

  • Tab-group routing: X-BrowserOS-Default-Tab-Group-Id is read at the MCP route, threaded through McpServiceDeps, and injected into any tool whose Zod schema has a tabGroupId field via the generalised inputHasField helper; window-mutating tools are suppressed in single-window mode via SINGLE_WINDOW_BLOCKED_TOOLS.
  • ScreencastManager refactor: subscribe now displaces any prior subscriber via tearDown('replaced'), calls bringToFront before startScreencast to wake the compositor on backgrounded tabs, and catches resolution errors to emit a clean detached status rather than a 1011 close.
  • Tests: The existing screencast integration test is ported to a visible window (required for the bringToFront compositor path) and gains an assertion that the displaced subscriber receives detached.

Confidence Score: 3/5

The tab-group routing additions are safe; the ScreencastManager rewrite carries three unresolved concurrency hazards from the prior review round that can leave CDP screencast sessions running with no subscriber.

The three ScreencastManager concurrency issues remain unaddressed — session leak on mid-flight WS close, concurrent subscribe race producing an orphaned frame listener, and windowId:0 in the displaced detach message. Each can leave a CDP stream running with no cleanup path until a subsequent subscribe displaces it.

packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts Simplified to singleton pattern — drops sessions map, pendingStarts, lastFrame, and primeWithScreenshot; bringToFront wake-up added; three concurrency/state issues flagged in prior review remain unresolved.
packages/browseros-agent/apps/server/src/tools/navigation.ts Adds optional tabGroupId to new_page, new_hidden_page, show_page, move_page with best-effort joinTabGroup; new_hidden_page docstring says 'when later shown' but the call runs immediately.
packages/browseros-agent/apps/server/src/api/services/mcp/register-mcp.ts inputHasWindowIdField generalised to inputHasField; SINGLE_WINDOW_BLOCKED_TOOLS set added; defaultTabGroupId injection follows the same schema-driven pattern as defaultWindowId.
packages/browseros-agent/apps/server/src/api/routes/mcp.ts Reads X-BrowserOS-Default-Tab-Group-Id header and forwards to createMcpServer, mirroring the existing defaultWindowId pattern cleanly.
packages/browseros-agent/apps/server/src/api/services/mcp/mcp-server.ts Adds defaultTabGroupId to McpServiceDeps interface and passes it through to registerTools.
packages/browseros-agent/apps/server/tests/api/screencast-manager.test.ts Test updated from hidden-window to visible-window; adds assertion that displaced subscriber receives detached status.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/browseros-agent/apps/server/src/tools/navigation.ts:208-211
The `tabGroupId` description for `new_hidden_page` says "Tab group the page joins **when later shown**", but `joinTabGroup` is called immediately after the hidden page is created — not deferred to `show_page`. An agent reading the description may infer the grouping is a no-op until `show_page` is called and add redundant logic; or, if the hidden tab's group assignment matters early (e.g. for a host-side UI reflecting pending tabs), the current behaviour is correct but the doc sets the wrong expectation.

```suggestion
    tabGroupId: z
      .string()
      .optional()
      .describe('Tab group to place the new hidden tab into'),
```

Reviews (2): Last reviewed commit: "test(server): update screencast-manager ..." | Re-trigger Greptile

The old test created a hidden window and asserted a frame arrives —
that worked because the previous manager primed via captureScreenshot,
which bypasses Chromium's compositor-visibility check. The new manager
uses Page.bringToFront + Page.startScreencast, which only reliably
emits frames for visible windows. Switch the test to a visible window
and add an assertion that a displaced subscriber receives `detached`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shivammittal274
Copy link
Copy Markdown
Contributor Author

@greptile review

@shivammittal274
Copy link
Copy Markdown
Contributor Author

Note on Tests / server-api failure

The failing test is ScreencastManager > subscribes, emits frames, displaces a prior subscriber, and stops on unsubscribe, but the failure mode is CDP failed to start on port XXXX within timeout in withBrowser's spawn step — i.e. the test code never runs its assertions. The BrowserOS binary fails to bring up CDP within the 15s wait in this CI environment.

This is pre-existing — PR #1060 (which originally added this test file) had the exact same CDP failed to start failure on Tests / server-api when it was merged (job 78501091904). My commit 5dc8626e updates the test to match the new singleton manager's contract (visible window + displacement-emits-detached); the test passes locally:

tests/api/screencast-manager.test.ts:
Starting BrowserOS on CDP port 9005...
CDP is ready
 1 pass
 0 fail

I'll leave the test fix in. If the reviewer wants the CI environment issue addressed in this PR, happy to dig — but it looks like a runner/binary problem orthogonal to the manager rewrite.

@shivammittal274 shivammittal274 merged commit b0ea3bf into main May 30, 2026
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant