fix(judge): make Playwright browser review actually work end-to-end by jacsamell · Pull Request #271 · aetheronhq/agent-cube

jacsamell · 2026-06-03T00:48:33Z

Problem

The browser-judge pipeline looked wired but never drove a browser. On a visual PR with the dev server up, judge_4 made zero browser calls and captured zero screenshots, every time. Dogfooding traced it to four distinct breaks.

Root causes + fixes

MCP startup is non-blocking in the Agent SDK. The judge's turn-1 prompt was built before npx @playwright/mcp finished its connect handshake, so the tools were never in the prompt and the server sat pending forever. The bridge now shapes forwarded stdio servers with type: stdio + alwaysLoad: true, which forces the tools into the turn-1 prompt and blocks startup until connected. Verified: status flips pending → connected, all mcp__playwright__browser_* tools exposed, and the judge actually navigates.
@playwright/mcp launches headed by default — nowhere to draw in a judge subprocess or the cube-runner container, so it hung. Inject --headless, plus --isolated (ephemeral profile) and --output-dir <screenshots_dir> so browser_take_screenshot writes where the summary block looks.
Chromium was never installed. Added _ensure_playwright_browsers (mirrors the fix(dev-server): install dependencies in the worktree before bringing up #268 node_modules fallback): lazy npx playwright install chromium when a browser-judge needs it and the cache is missing. Non-fatal.
The addendum named tools that don't exist (playwright_navigate vs the real mcp__playwright__browser_navigate), so even a connected judge was told the wrong API. Corrected to the real browser_* names, fixed the screenshot instructions to use the server's filename + --output-dir, and made the browser pass mandatory for frontend PRs (skip only with explicit justification).

Tests

New unit tests for _augment_playwright_args (inject / idempotent / operator-override-wins / ignore non-playwright / empty). Full automation + cli suites green (409 passed).

Note

For heavily-authed apps (e.g. aetheron-connect-v2, which needs AWS Chamber/SSM secrets + localstack to boot), the cleanest path is cube's existing reuse-already-running behaviour (#266): run the app's normal dev stack and cube points Playwright at it. The auto-bringup is the fallback for apps that boot cleanly.

🤖 Generated with Claude Code

The browser-judge pipeline looked wired but never drove a browser: judge_4 captured zero screenshots and made zero browser calls, even on a visual PR with the dev server up. Root causes, all fixed: 1. MCP startup is non-blocking in the Agent SDK, so the judge's turn-1 prompt was built before `npx @playwright/mcp` finished connecting — the tools were never in the prompt and the server sat 'pending' forever. The bridge now shapes forwarded stdio servers with `type: stdio` + `alwaysLoad: true`, which forces the tools into the turn-1 prompt and blocks startup until the server connects. Verified: status flips 'pending' -> 'connected', all mcp__playwright__browser_* tools exposed. 2. @playwright/mcp launches *headed* by default — nowhere to draw in a judge subprocess (or the cube-runner container), so it hung. Inject --headless, plus --isolated (ephemeral profile) and --output-dir <screenshots_dir> so browser_take_screenshot writes where the summary block looks. 3. Chromium was never installed. Added _ensure_playwright_browsers (mirrors the #268 node_modules fallback): lazy `npx playwright install chromium` when a browser-judge needs it and the cache is missing. Non-fatal. 4. The addendum named tools that don't exist (playwright_navigate vs the real mcp__playwright__browser_navigate), so even a connected judge was told the wrong API. Corrected to the real browser_* names, fixed the screenshot instructions to use the server's filename+output-dir, and made the browser pass mandatory for frontend PRs (skip only with explicit justification). Tests: _augment_playwright_args (inject/idempotent/operator-override/ignore non-playwright/empty). Full automation+cli suites green (409 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-03T00:48:40Z

Warning

Review limit reached

@jacsamell, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c185944-3963-4436-861f-6ddbf32f689f

📥 Commits

Reviewing files that changed from the base of the PR and between 372f5ce and 636418a.

⛔ Files ignored due to path filters (1)

sdk-bridge/dist/cli.js is excluded by !**/dist/**

📒 Files selected for processing (3)

python/cube/automation/judge_panel.py
sdk-bridge/src/providers/claude.ts
tests/automation/test_judge_panel_browser_gating.py

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jacsamell merged commit f3db0a5 into main Jun 3, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(judge): make Playwright browser review actually work end-to-end#271

fix(judge): make Playwright browser review actually work end-to-end#271
jacsamell merged 1 commit into
mainfrom
fix/playwright-judge-browser-review

jacsamell commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jacsamell commented Jun 3, 2026

Problem

Root causes + fixes

Tests

Note

Uh oh!

coderabbitai Bot commented Jun 3, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant