Skip to content

fix(judge): make Playwright browser review actually work end-to-end#271

Merged
jacsamell merged 1 commit into
mainfrom
fix/playwright-judge-browser-review
Jun 3, 2026
Merged

fix(judge): make Playwright browser review actually work end-to-end#271
jacsamell merged 1 commit into
mainfrom
fix/playwright-judge-browser-review

Conversation

@jacsamell
Copy link
Copy Markdown
Contributor

Problem

The browser-judge pipeline looked wired but never drove a browser. On a visual PR with the dev server up, judge_4 made zero browser calls and captured zero screenshots, every time. Dogfooding traced it to four distinct breaks.

Root causes + fixes

  1. MCP startup is non-blocking in the Agent SDK. The judge's turn-1 prompt was built before npx @playwright/mcp finished its connect handshake, so the tools were never in the prompt and the server sat pending forever. The bridge now shapes forwarded stdio servers with type: stdio + alwaysLoad: true, which forces the tools into the turn-1 prompt and blocks startup until connected. Verified: status flips pendingconnected, all mcp__playwright__browser_* tools exposed, and the judge actually navigates.

  2. @playwright/mcp launches headed by default — nowhere to draw in a judge subprocess or the cube-runner container, so it hung. Inject --headless, plus --isolated (ephemeral profile) and --output-dir <screenshots_dir> so browser_take_screenshot writes where the summary block looks.

  3. Chromium was never installed. Added _ensure_playwright_browsers (mirrors the fix(dev-server): install dependencies in the worktree before bringing up #268 node_modules fallback): lazy npx playwright install chromium when a browser-judge needs it and the cache is missing. Non-fatal.

  4. The addendum named tools that don't exist (playwright_navigate vs the real mcp__playwright__browser_navigate), so even a connected judge was told the wrong API. Corrected to the real browser_* names, fixed the screenshot instructions to use the server's filename + --output-dir, and made the browser pass mandatory for frontend PRs (skip only with explicit justification).

Tests

New unit tests for _augment_playwright_args (inject / idempotent / operator-override-wins / ignore non-playwright / empty). Full automation + cli suites green (409 passed).

Note

For heavily-authed apps (e.g. aetheron-connect-v2, which needs AWS Chamber/SSM secrets + localstack to boot), the cleanest path is cube's existing reuse-already-running behaviour (#266): run the app's normal dev stack and cube points Playwright at it. The auto-bringup is the fallback for apps that boot cleanly.

🤖 Generated with Claude Code

The browser-judge pipeline looked wired but never drove a browser: judge_4
captured zero screenshots and made zero browser calls, even on a visual PR
with the dev server up. Root causes, all fixed:

1. MCP startup is non-blocking in the Agent SDK, so the judge's turn-1 prompt
   was built before `npx @playwright/mcp` finished connecting — the tools
   were never in the prompt and the server sat 'pending' forever. The bridge
   now shapes forwarded stdio servers with `type: stdio` + `alwaysLoad: true`,
   which forces the tools into the turn-1 prompt and blocks startup until the
   server connects. Verified: status flips 'pending' -> 'connected', all
   mcp__playwright__browser_* tools exposed.

2. @playwright/mcp launches *headed* by default — nowhere to draw in a judge
   subprocess (or the cube-runner container), so it hung. Inject --headless,
   plus --isolated (ephemeral profile) and --output-dir <screenshots_dir> so
   browser_take_screenshot writes where the summary block looks.

3. Chromium was never installed. Added _ensure_playwright_browsers (mirrors
   the #268 node_modules fallback): lazy `npx playwright install chromium`
   when a browser-judge needs it and the cache is missing. Non-fatal.

4. The addendum named tools that don't exist (playwright_navigate vs the real
   mcp__playwright__browser_navigate), so even a connected judge was told the
   wrong API. Corrected to the real browser_* names, fixed the screenshot
   instructions to use the server's filename+output-dir, and made the browser
   pass mandatory for frontend PRs (skip only with explicit justification).

Tests: _augment_playwright_args (inject/idempotent/operator-override/ignore
non-playwright/empty). Full automation+cli suites green (409 passed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Warning

Review limit reached

@jacsamell, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c185944-3963-4436-861f-6ddbf32f689f

📥 Commits

Reviewing files that changed from the base of the PR and between 372f5ce and 636418a.

⛔ Files ignored due to path filters (1)
  • sdk-bridge/dist/cli.js is excluded by !**/dist/**
📒 Files selected for processing (3)
  • python/cube/automation/judge_panel.py
  • sdk-bridge/src/providers/claude.ts
  • tests/automation/test_judge_panel_browser_gating.py

Comment @coderabbitai help to get the list of available commands and usage tips.

@jacsamell jacsamell merged commit f3db0a5 into main Jun 3, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant