fix(judge): make Playwright browser review actually work end-to-end#271
Conversation
The browser-judge pipeline looked wired but never drove a browser: judge_4 captured zero screenshots and made zero browser calls, even on a visual PR with the dev server up. Root causes, all fixed: 1. MCP startup is non-blocking in the Agent SDK, so the judge's turn-1 prompt was built before `npx @playwright/mcp` finished connecting — the tools were never in the prompt and the server sat 'pending' forever. The bridge now shapes forwarded stdio servers with `type: stdio` + `alwaysLoad: true`, which forces the tools into the turn-1 prompt and blocks startup until the server connects. Verified: status flips 'pending' -> 'connected', all mcp__playwright__browser_* tools exposed. 2. @playwright/mcp launches *headed* by default — nowhere to draw in a judge subprocess (or the cube-runner container), so it hung. Inject --headless, plus --isolated (ephemeral profile) and --output-dir <screenshots_dir> so browser_take_screenshot writes where the summary block looks. 3. Chromium was never installed. Added _ensure_playwright_browsers (mirrors the #268 node_modules fallback): lazy `npx playwright install chromium` when a browser-judge needs it and the cache is missing. Non-fatal. 4. The addendum named tools that don't exist (playwright_navigate vs the real mcp__playwright__browser_navigate), so even a connected judge was told the wrong API. Corrected to the real browser_* names, fixed the screenshot instructions to use the server's filename+output-dir, and made the browser pass mandatory for frontend PRs (skip only with explicit justification). Tests: _augment_playwright_args (inject/idempotent/operator-override/ignore non-playwright/empty). Full automation+cli suites green (409 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 31 minutes and 19 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
Comment |
Problem
The browser-judge pipeline looked wired but never drove a browser. On a visual PR with the dev server up, judge_4 made zero browser calls and captured zero screenshots, every time. Dogfooding traced it to four distinct breaks.
Root causes + fixes
MCP startup is non-blocking in the Agent SDK. The judge's turn-1 prompt was built before
npx @playwright/mcpfinished its connect handshake, so the tools were never in the prompt and the server satpendingforever. The bridge now shapes forwarded stdio servers withtype: stdio+alwaysLoad: true, which forces the tools into the turn-1 prompt and blocks startup until connected. Verified: status flipspending→connected, allmcp__playwright__browser_*tools exposed, and the judge actually navigates.@playwright/mcplaunches headed by default — nowhere to draw in a judge subprocess or the cube-runner container, so it hung. Inject--headless, plus--isolated(ephemeral profile) and--output-dir <screenshots_dir>sobrowser_take_screenshotwrites where the summary block looks.Chromium was never installed. Added
_ensure_playwright_browsers(mirrors the fix(dev-server): install dependencies in the worktree before bringing up #268 node_modules fallback): lazynpx playwright install chromiumwhen a browser-judge needs it and the cache is missing. Non-fatal.The addendum named tools that don't exist (
playwright_navigatevs the realmcp__playwright__browser_navigate), so even a connected judge was told the wrong API. Corrected to the realbrowser_*names, fixed the screenshot instructions to use the server'sfilename+--output-dir, and made the browser pass mandatory for frontend PRs (skip only with explicit justification).Tests
New unit tests for
_augment_playwright_args(inject / idempotent / operator-override-wins / ignore non-playwright / empty). Full automation + cli suites green (409 passed).Note
For heavily-authed apps (e.g. aetheron-connect-v2, which needs AWS Chamber/SSM secrets + localstack to boot), the cleanest path is cube's existing reuse-already-running behaviour (#266): run the app's normal dev stack and cube points Playwright at it. The auto-bringup is the fallback for apps that boot cleanly.
🤖 Generated with Claude Code