Skip to content

feat(server): /screencast WebSocket route + ScreencastManager#1060

Merged
shivammittal274 merged 5 commits into
mainfrom
feat/screencast
May 29, 2026
Merged

feat(server): /screencast WebSocket route + ScreencastManager#1060
shivammittal274 merged 5 commits into
mainfrom
feat/screencast

Conversation

@shivammittal274
Copy link
Copy Markdown
Contributor

@shivammittal274 shivammittal274 commented May 27, 2026

Summary

  • New /screencast WebSocket route on the agent server. Streams the agent's browser as base64 JPEG frames over Page.startScreencast, so callers can render a live canvas without popping the native window.
  • Subscribers are ack-gated and reference-counted per CDP target: first viewer triggers Page.startScreencast, last triggers stopScreencast. A slow consumer can't queue frames unboundedly.
  • ?pageId=N follows a specific Browser-internal page; omitted, it falls back to the window's active tab. Sessions are keyed off targetId, so two viewers of the same tab share one CDP stream regardless of how each requested it.
  • Late subscribers get the cached lastFrame immediately; if none exists (idle page that hasn't repainted), Page.captureScreenshot primes the canvas synthetically so the pane never sits black.
  • Knobs (q=80, 1280×800, every-nth-frame) live in @browseros/shared/constants/limits as SCREENCAST_LIMITS.

How it fits

agent-company mounts this WebSocket via an SSE proxy (paired PR https://github.com/browseros-ai/agent-company/pull/76) and renders frames into a canvas in the chat surface. The agent-side renderer lifts `pageId` off the latest browseros tool's input/output and threads it here, so the canvas follows whichever tab the agent is acting on.

API

  • `ws /screencast?windowId=N` — stream the window's active tab.
  • `ws /screencast?windowId=N&pageId=M` — stream a specific Browser-internal pageId; falls back to active tab if the page can't be resolved.

Test plan

  • `bun --env-file=.env.development test apps/server/tests/api/screencast-manager.test.ts` — integration test against live CDP: subscribe → frame within ~8s on a hidden window, second subscriber gets status + cached lastFrame, unsubscribe cleans up.
  • `bun run lint` and `bun run typecheck` clean for `apps/server`.
  • Manual: opening, closing, and reopening the agent-company pane against the same window — frames resume on each subscribe (no black canvas on idle pages).
  • Manual: agent calls `new_page google.com` after the pane is already streaming example.com — pane follows to google.com.

🤖 Generated with Claude Code

Exposes a per-windowId screencast feed over a Hono WebSocket so external
consumers (agent-company first) can render the agent's live browser as
JPEG frames without a separate CDP attach.

- New `/screencast?windowId=N` route uses hono/bun `upgradeWebSocket`.
- `ScreencastManager` ref-counts subscribers per windowId: first viewer
  starts `Page.startScreencast` against the active page in that window,
  last viewer triggers `Page.stopScreencast`. Frames broadcast as
  `{type:"frame", data:<base64 jpeg>, metadata}`; ack-gated so a slow
  consumer can't queue frames unboundedly.
- `Browser.getActivePageForWindow(windowId)` resolves windowId to a
  ProtocolApi session via `Browser.getActiveTab({windowId})` —  works
  for hidden windows where `Browser.getTabs({includeHidden:true})` may
  not yet expose the tab.
- Quality / frame-size knobs live in `@browseros/shared/constants/limits`
  as `SCREENCAST_LIMITS` (q=80, 1280x800, every nth frame).
- Integration test against live CDP covers subscribe → frame arrives →
  unsubscribe cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

❌ Tests failed — 1/1064 failed

Suite Passed Failed Skipped
agent 91/91 0 0
build 10/10 0 0
eval 95/95 0 0
server-agent 246/246 0 0
server-api 63/64 1 0
server-browser 4/4 0 0
server-integration 9/10 0 1
server-lib 252/253 0 1
server-root 59/62 0 3
server-tools 229/229 0 0
Failed tests
  • server-apiScreencastManager > subscribes, emits frames for a hidden window, and stops on last unsubscribe

View workflow run

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR adds a /screencast WebSocket endpoint that streams the browser's active tab as base64 JPEG frames via Page.startScreencast, with CDP sessions reference-counted per targetId so multiple viewers share one stream. browser.ts is extended with getActivePageForWindow, getPageSession, and ensurePageIdForTarget to route screencast through the same full attachToPage path agent tools use, ensuring consoleCollector and all domain enables are registered.

  • ScreencastManager handles subscriber lifecycle (pending-start deduplication, zombie-ws guard, primeWithScreenshot for idle pages) with stopIfIdle ensuring stopScreencast is called when the last viewer detaches.
  • Route (screencast.ts) parses windowId/pageId query params, delegates entirely to ScreencastManager, and handles onClose/onError symmetrically.
  • SCREENCAST_LIMITS centralizes quality, resolution, and frame-rate knobs in the shared constants package.

Confidence Score: 3/5

Safe to merge for single-subscriber use; the concurrent-subscriber race in ScreencastManager can silently leave a second subscriber connected but receiving no frames.

The concurrent-subscribe path shares a single pending startSession Promise across all callers for the same targetId. If the first caller's WebSocket closes before the Promise resolves, its continuation calls stopSession synchronously — removing the frame listener and deleting the session from the map — before the second caller's continuation adds its socket to the now-orphaned session. The second subscriber receives a status:connected message and possibly a cached frame, but live frames never arrive and no error is surfaced.

packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts — specifically the concurrent subscribe path through getOrStartSession when a pending start is shared between two callers and one disconnects mid-flight.

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts New ScreencastManager: handles session lifecycle, frame broadcasting, and subscriber reference counting. Contains a concurrent-subscribe race where a closing first subscriber can orphan a second concurrent subscriber with a dead frame listener.
packages/browseros-agent/apps/server/src/api/routes/screencast.ts New WebSocket route: parses windowId/pageId query params, delegates lifecycle to ScreencastManager. The zombie-subscriber guard is in place; cleanup in onClose and onError is symmetric.
packages/browseros-agent/apps/server/src/browser/browser.ts Adds getActivePageForWindow, getPageSession, and ensurePageIdForTarget. Routes screencast through the full attachToPage path, correctly registering consoleCollector and all domain enables.
packages/browseros-agent/packages/shared/src/constants/limits.ts Adds SCREENCAST_LIMITS constants. SUBSCRIBER_BACKPRESSURE_BYTES is defined but unused — dead code.
packages/browseros-agent/apps/server/src/api/server.ts Mounts /screencast WebSocket route. Minimal and correct.
packages/browseros-agent/apps/server/tests/api/screencast-manager.test.ts Integration test against live CDP: subscribe, wait for frames, add second subscriber, unsubscribe both. Covers the happy path and session sharing.

Sequence Diagram

sequenceDiagram
    participant Client as WebSocket Client
    participant Route as screencast.ts
    participant Manager as ScreencastManager
    participant Browser as Browser
    participant CDP as CDP / Chromium

    Client->>Route: "WS connect ?windowId=N&pageId=M"
    Route->>Manager: subscribe(windowId, pageId, ws)
    Manager->>Browser: getActivePageForWindow / getPageSession
    Browser->>CDP: Target.attachToTarget + domain enables
    CDP-->>Browser: sessionId
    Browser-->>Manager: "{targetId, session, url}"
    Manager->>CDP: Page.startScreencast (first subscriber)
    CDP-->>Manager: ack
    Manager->>Client: "status {connected, url}"
    Manager->>Client: lastFrame (cached) OR captureScreenshot (idle page)

    loop Each compositor frame
        CDP->>Manager: screencastFrame event
        Manager->>Client: "{type:frame, data, metadata}"
        Manager->>CDP: Page.screencastFrameAck
    end

    Client->>Route: WS close
    Route->>Manager: unsubscribe(handle, ws)
    alt last subscriber
        Manager->>CDP: Page.stopScreencast
    end
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts:73-107
**Concurrent subscriber orphaned when first subscriber closes during startup**

When two callers concurrently `subscribe` to the same `targetId`, they both await the same `startPromise` (via `pendingStarts`). If subscriber A's WebSocket closes while that Promise is in-flight, A's continuation runs first, passes the `readyState` guard, calls `stopIfIdle``stopSession`. `stopSession` synchronously calls `session.unsubscribeFrame()` (removing the CDP frame listener) and `sessions.delete(targetId)`, then yields to await `stopScreencast`. B's continuation then runs, adds `wsB` to the orphaned session, and receives a `status: connected` + `lastFrame`. But because the frame listener was removed by A's `stopSession`, live frames never arrive for B — the stream silently goes dark with no error surfaced to the client.

### Issue 2 of 3
packages/browseros-agent/packages/shared/src/constants/limits.ts:84-90
`SUBSCRIBER_BACKPRESSURE_BYTES` is exported but never imported or used anywhere in the screencast implementation — backpressure logic is not implemented. Per the dead code guideline, unused constants should be removed rather than left as aspirational scaffolding.

```suggestion
export const SCREENCAST_LIMITS = {
  DEFAULT_JPEG_QUALITY: 80,
  EVERY_NTH_FRAME: 1,
  MAX_WIDTH: 1280,
  MAX_HEIGHT: 800,
} as const
```

### Issue 3 of 3
packages/browseros-agent/apps/server/src/api/services/screencast/screencast-manager.ts:27-33
The `'detached'` variant of `ScreencastStatusMessage` is defined but never emitted — no call site in `ScreencastManager` ever sends `status: 'detached'`. Per the dead code rule, unused type variants should be removed rather than left as dead scaffolding.

```suggestion
export interface ScreencastStatusMessage {
  type: 'status'
  status: 'connected'
  windowId: number
  pageId?: number
  url?: string
}
```

Reviews (3): Last reviewed commit: "fix(server): address Greptile P1s on scr..." | Re-trigger Greptile

Comment thread packages/browseros-agent/apps/server/src/api/routes/screencast.ts
shivammittal274 and others added 2 commits May 28, 2026 11:27
Chromium's Page.startScreencast only emits frames on compositor
invalidation — a static page produces one frame on attach, then
nothing. Late subscribers to an existing session (or any subscriber on
an idle page) saw status:"connected" and a blank canvas forever.

- Cache the most recent screencastFrame on each ScreencastSession and
  replay it to every new subscriber on subscribe.
- If no frame has been cached yet (first subscriber to a quiet page),
  fall back to a one-shot Page.captureScreenshot and inject it as a
  synthetic frame so the canvas paints immediately. Best-effort —
  errors are logged but don't break the subscription.

Verified against a static hidden chrome://newtab/: before the fix a
late subscriber saw 0 frames over 4s; after, the primed screenshot
lands within 16ms and the canvas paints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`/screencast?windowId=N` always streamed the window's active tab, which
loses the agent's intended target as soon as a new tab opens. Sessions
now key off `targetId` and the WS route accepts an optional `pageId`
query — the manager resolves it via `getPageSession(pageId)` (with a
fallback `listPages()` refresh if the cache is stale) and falls back
to the window's active tab when omitted.

`attachTab` is shared between the active-tab and pageId paths so both
go through the same CDP enable + Page.startScreencast pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shivammittal274
Copy link
Copy Markdown
Contributor Author

@greptile-ai review

Comment thread packages/browseros-agent/apps/server/src/browser/browser.ts Outdated
Three issues caught by review:

1. Console collection silently broken for screencast-first tabs.
   `attachTab` cached a Page.enable-only session in `Browser.sessions`.
   Agent tools later resolving the same `targetId` short-circuited on
   the cached entry and skipped the DOM/Runtime/Log/Accessibility
   enables + `consoleCollector.attach()`. `getConsoleLogs` returned
   nothing for those tabs with no error surfaced. Removed `attachTab`
   and routed both `getActivePageForWindow` and `getPageSession`
   through the canonical `attachToPage`. Active-tab path resolves the
   Browser-internal pageId via a new `ensurePageIdForTarget` helper
   (with a `listPages` refresh fallback).

2. Zombie subscriber when WS closes during startup. If the client
   disconnected while `subscribe`'s awaits were in flight, the route's
   `onClose` saw `handle === null` and skipped unsubscribe. `subscribe`
   then added the dead ws to `subscribers`; broadcast skipped it but
   never decremented, so `stopSession` never fired and the CDP
   screencast leaked. Now `subscribe` checks `ws.readyState` after the
   awaits and bails (calling `stopIfIdle` to clean up a freshly
   started session with no real subscribers).

3. `broadcast` dropped a failed subscriber without checking for empty
   set. If the failed send was the last subscriber, the session
   leaked. Extracted the empty-check into `stopIfIdle` and call it
   from `broadcast`, `unsubscribe`, and the close-during-startup
   path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shivammittal274
Copy link
Copy Markdown
Contributor Author

@greptile review

When two subscribers share a pendingStart for the same targetId and
the first one's ws closes mid-flight, the first continuation runs
first, sees ws closed, and stopIfIdle → stopSession synchronously
removes the frame listener and deletes the session from the map.
The second continuation then adds its ws to the orphaned session
object, sends `status:connected`, and never receives a frame.

Mark sessions `disposed: true` inside stopSession and have subscribe
loop back to getOrStartSession when it sees a disposed session —
which is no longer in `this.sessions` so the next attempt opens a
fresh stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shivammittal274 shivammittal274 merged commit 8160875 into main May 29, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant