diff --git a/docs/tasks/23-judge4-playwright-mcp.md b/docs/tasks/23-judge4-playwright-mcp.md
new file mode 100644
index 00000000..52c763c4
--- /dev/null
+++ b/docs/tasks/23-judge4-playwright-mcp.md
@@ -0,0 +1,211 @@
+# Slice B — Wire Playwright MCP into judge_4 + dev-server bring-up + screenshot routing
+
+Builds on slice A (#262), which plumbed per-judge ``mcp_servers`` from
+``cube.yaml`` all the way through ``SDKBridgeAdapter`` →
+``query({options:{mcpServers}})``. Plumbing has zero behaviour change today
+because no judge declares ``mcp_servers``.
+
+This slice flips judge_4 (Frontend & UX) into a browser-augmented
+reviewer: Playwright MCP for navigation + screenshot, a dev-server cube
+brings up before judge_4 runs, and screenshot files attached to the
+GitHub PR review.
+
+## Why
+
+cube's frontend judge today reviews ``.tsx``/``.css`` diffs purely from
+text. It can spot type errors, prop-drilling, ``next-intl`` violations
+— but it can't see that the button is invisible on dark mode, the
+modal clips off-viewport at 1280px, or the toast renders behind the
+nav. Visual regressions slip through every panel. Adding Playwright
+to judge_4 closes the only gap where text review structurally can't
+catch the bug.
+
+## Scope (in)
+
+1. **judge_4 declares Playwright** in ``cube.yaml``'s ``judges.judge_4.mcp_servers``
+   block. Schema mirrors the Anthropic SDK shape:
+   ```yaml
+   judges:
+     judge_4:
+       model: opus
+       label: "Judge Frontend & UX"
+       persona: "frontend-generic"
+       mcp_servers:
+         playwright:
+           command: "npx"
+           args: ["-y", "@playwright/mcp@latest"]
+   ```
+   The opus default ensures the claude-sdk path gets the MCP servers
+   (only path that honours mcp_servers in slice A).
+
+2. **Dev-server bring-up.** Add an optional ``verify.dev_server:``
+   block:
+   ```yaml
+   verify:
+     install: pnpm install --frozen-lockfile
+     check: pnpm lint && pnpm typecheck && pnpm test
+     dev_server:
+       cmd: pnpm dev
+       url: http://localhost:3000
+       ready_marker: "ready in"
+       timeout_seconds: 60
+   ```
+   - ``cmd``: shell command to start the dev server (run in the writer's
+     worktree, NOT cube's repo).
+   - ``url``: the base URL judge_4's prompt is told to navigate to.
+   - ``ready_marker``: substring cube watches for on the dev-server's
+     stdout/stderr before it considers the server ready. Falls back to
+     polling ``url`` with a HEAD request if not set.
+   - ``timeout_seconds``: hard ceiling for ready detection. On timeout,
+     cube logs a warning, kills the dev-server, and runs judge_4 with
+     a ``[BROWSER UNAVAILABLE]`` note in its prompt — judge still
+     reviews statically.
+
+   When the block is missing, judge_4 runs as before (no browser
+   pass). Out-of-the-box repos see no change.
+
+3. **Lifecycle.** A new helper ``cube.core.dev_server`` exposes:
+   - ``start_dev_server(worktree, config) -> DevServerHandle`` —
+     spawns the command in a process group, hooks stdout/stderr to a
+     log file under ``.prompts/dev-server-<task>.log``, watches for
+     ``ready_marker`` (or polls the URL) until ready or timeout.
+   - ``DevServerHandle.stop()`` — SIGTERM the process group with a
+     5s grace, SIGKILL on timeout (mirrors
+     ``terminate_active_subprocesses`` in core.adapters.base).
+   - Used by the judge-panel handler ONLY for the path where any
+     declared judge has Playwright in its ``mcp_servers``. Other
+     panels skip the bring-up entirely so non-frontend tasks pay no
+     cost.
+   - The base URL is exported into the dev-server's process env as
+     ``CUBE_DEV_SERVER_URL`` AND injected into judge_4's prompt
+     addendum (see step 4).
+
+4. **judge_4 persona addendum.** When a judge has non-empty
+   ``mcp_servers`` AND the dev-server came up, prepend a block to
+   the judge-specific prompt explaining the workflow:
+   - "You have Playwright MCP tools — playwright_navigate,
+     playwright_screenshot, playwright_click, playwright_fill,
+     playwright_pdf, plus the standard observation/eval tools."
+   - "The frontend dev server is running at ``{CUBE_DEV_SERVER_URL}``.
+     Navigate to the changed routes and screenshot the relevant
+     UI states (default, hover, dark mode if applicable, mobile
+     viewport)."
+   - "Save screenshots to
+     ``.prompts/screenshots/<task_id>/judge_4/<descriptive-name>.png``
+     under the cube repo root. Reference each one in your decision
+     file's new ``screenshots:`` field (see schema below)."
+   - "If the route fails to load or the screenshot doesn't match the
+     intent, that's a finding — call it out in
+     ``remaining_issues`` with the file:line of the related diff."
+   - The block is only added when ``mcp_servers`` is non-empty AND
+     the dev-server actually came up. Static-only fallback stays
+     unchanged.
+
+5. **Decision schema extension.** Judge decision files gain an
+   optional ``screenshots:`` field:
+   ```json
+   {
+     "decision": "APPROVED",
+     "remaining_issues": [],
+     "screenshots": [
+       {"path": ".prompts/screenshots/.../button-hover-dark.png",
+        "caption": "Primary button hover state on dark mode"}
+     ]
+   }
+   ```
+   - The aggregator in ``cube.core.decision_parser`` reads this and
+     surfaces the screenshot list in ``get_peer_review_status()``
+     output so callers can use it.
+
+6. **PR review attachment.** When cube posts the panel summary review
+   (``cube.github.reviews``), inline each screenshot referenced in the
+   aggregated decisions:
+   - Use GitHub's image attachment via base64 data URLs in the PR
+     review body (no separate upload API — GitHub renders these inline
+     in markdown).
+   - Cap total inline size at 5 MB across all screenshots; beyond
+     that, list paths instead of embedding.
+   - Group under a ``### Screenshots`` collapsible ``<details>`` block
+     so the panel summary text stays scannable.
+
+## Scope (out)
+
+- **Authenticated routes.** Don't try to thread cookies/session
+  storage through. Repos with auth gates will get
+  ``[BROWSER UNAVAILABLE]`` plus an open follow-up task.
+- **Mobile/responsive matrix.** Single viewport (1280x800) for v1;
+  judge_4 can describe its desired viewport in screenshot captions if
+  it manually resizes via Playwright.
+- **Cross-judge MCP sharing.** Only judge_4 gets Playwright by
+  default. Other judges that want browser access opt in via their own
+  ``mcp_servers`` block — same plumbing.
+- **Slice C polish.** File-trigger gating (only run dev-server when
+  ``.tsx``/``.css``/etc diff non-empty) lives in a follow-up — for
+  v1, every panel run with a declared dev-server pays the bring-up
+  cost.
+
+## Acceptance criteria
+
+1. ``cube.yaml`` parses the new ``verify.dev_server:`` block without
+   error; missing block = no behaviour change.
+2. ``judge_4`` declares Playwright in shipped ``cube.yaml``.
+3. ``cube prv`` on a repo with a real dev-server brings up the server,
+   judge_4 navigates + screenshots, files land under
+   ``.prompts/screenshots/``, screenshots appear in the GitHub PR
+   review.
+4. ``cube prv`` on a repo with NO ``dev_server`` config: judge_4
+   reviews statically, no warning spam, behaviour identical to today.
+5. ``cube prv`` when dev_server ``cmd`` fails to start: judge_4
+   reviews statically with a ``[BROWSER UNAVAILABLE]`` note in its
+   prompt, panel completes, no orphan ``pnpm dev`` processes left
+   running.
+6. ``cube prv`` when ``ready_marker`` never appears within
+   ``timeout_seconds``: same fallback as (5).
+7. Decision parser handles missing ``screenshots`` field (every
+   existing decision file) without crashing.
+8. Aggregation surfaces the screenshot list correctly even when one
+   judge emits an empty list and another emits multiple entries.
+
+## Tests required
+
+- ``tests/core/test_dev_server.py`` — ``start_dev_server`` happy path
+  (process spawns, ``ready_marker`` matches, ``stop()`` SIGTERMs the
+  group), timeout path, ``cmd`` exits non-zero, ``ready_marker``
+  detection via URL poll fallback.
+- ``tests/core/test_decision_parser.py`` — ``screenshots`` field
+  round-trips through ``parse_single_decision_file`` and aggregation;
+  missing field defaults to empty list (no crash).
+- ``tests/github/test_reviews.py`` — PR review body embeds
+  screenshots as data URLs, caps total size, falls back to path-only
+  listing when over cap.
+
+## Files likely to touch
+
+- ``python/cube.yaml`` — judge_4 mcp_servers declaration
+- ``python/cube/core/user_config.py`` — VerifyConfig gains
+  ``dev_server`` sub-field
+- ``python/cube/core/dev_server.py`` — NEW, dev-server lifecycle
+- ``python/cube/automation/judge_panel.py`` — bring-up call site,
+  prompt addendum injection
+- ``python/cube/core/decision_parser.py`` — read screenshots field
+- ``python/cube/github/reviews.py`` — embed screenshots in PR body
+- Test files above
+
+## Non-goals to call out explicitly
+
+- Don't rewrite ``cube.github.reviews`` — surgical addition only.
+- Don't add a "frontend-judge-only" code path; the dev-server runs
+  for any judge that declares Playwright. Future judges (Security
+  doing browser security review, for instance) inherit the same
+  bring-up.
+- Don't gate on file-type triggers in this slice. That's slice C.
+
+## Notes for the writer
+
+Take a smaller-PR approach if you can split this into two commits
+within the worktree — (a) dev-server lifecycle helper +
+``VerifyConfig`` extension + tests, (b) judge_4 wiring + PR review
+embed + tests. Either way: keep ``mcp_servers``-free judges unchanged.
+
+Read ``docs/tasks/22-steering-phase-e-telemetry.md`` for the
+conventions cube uses on these task specs; mirror them.
diff --git a/python/cube.yaml b/python/cube.yaml
index 145c3359..388cb035 100644
--- a/python/cube.yaml
+++ b/python/cube.yaml
@@ -147,6 +147,16 @@ judges:
     label: "Judge Frontend & UX"
     color: "blue"
     persona: "frontend-generic"
+    # Browser-augmented review (Slice B). When the repo also configures
+    # `verify.dev_server`, cube spins up the dev server, points judge_4
+    # at it via Playwright MCP, and pulls back screenshots referenced
+    # in the decision file's `screenshots` field. Without dev_server
+    # configured, the judge falls back to static review with a
+    # [BROWSER UNAVAILABLE] note.
+    mcp_servers:
+      playwright:
+        command: "npx"
+        args: ["-y", "@playwright/mcp@latest"]
 
   judge_5:
     model: "gpt"
diff --git a/python/cube/automation/judge_panel.py b/python/cube/automation/judge_panel.py
index 6d1d1d94..1fc7856e 100644
--- a/python/cube/automation/judge_panel.py
+++ b/python/cube/automation/judge_panel.py
@@ -10,7 +10,7 @@
 from ..core.config import PROJECT_ROOT, WORKTREE_BASE, get_project_root, get_worktree_path
 from ..core.dynamic_layout import DynamicLayout
 from ..core.git import branch_exists, fetch_branches, get_commit_hash, sync_worktree
-from ..core.output import console, print_error, print_info, print_success
+from ..core.output import console, print_error, print_info, print_success, print_warning
 from ..core.parsers.registry import get_parser
 from ..core.session import delete_session, load_session, load_session_head_sha, load_session_metadata, save_session
 from ..core.user_config import get_judge_configs, get_writer_by_key, get_writer_by_key_or_metadata, load_config
@@ -119,6 +119,87 @@ def _judge_session_metadata(jconfig, cli_name: str) -> str:
     return f"{jconfig.label} ({jconfig.model}) | cli={cli_name}"
 
 
+def _maybe_start_dev_server(judges: list, config, task_id: str):
+    """Bring up the configured dev-server when ANY judge in the panel
+    declares ``mcp_servers``. Returns the handle or None.
+
+    Browser-tool judges (Playwright via MCP) need an actual running app
+    to point at. cube spins up the dev-server in the writer's worktree
+    once per panel, shares the URL across every browser-judge, tears
+    down after ``asyncio.gather`` completes.
+
+    Skipped entirely when no judge declares MCP — non-frontend panels
+    pay no cost. Also skipped when ``verify.dev_server.cmd`` is empty
+    (operator didn't configure one) — the addendum will then surface
+    "[BROWSER UNAVAILABLE]" so the judge degrades to static review.
+    """
+    needs_browser = any(j.mcp_servers for j in judges)
+    if not needs_browser:
+        return None
+
+    if not config.verify.dev_server.declared:
+        print_warning(
+            "judge declares mcp_servers but verify.dev_server.cmd is empty in "
+            "cube.yaml — browser tools will be available but no app is running. "
+            "Configure verify.dev_server to enable the browser pass."
+        )
+        return None
+
+    # Pick a worktree to run the dev-server in. For panel reviews, use
+    # the first writer's worktree (the one judges will be reading
+    # source from). For PR reviews, the dedicated review_worktree was
+    # already set up by the caller — use it.
+    from ..core.config import PROJECT_ROOT, WORKTREE_BASE
+    from ..core.dev_server import start as start_dev_server
+
+    review_worktrees = {j.review_worktree for j in judges if j.review_worktree}
+    if review_worktrees:
+        worktree = next(iter(review_worktrees))
+    else:
+        project_name = Path(PROJECT_ROOT).name
+        first_writer = next(iter(config.writers.values()))
+        worktree = WORKTREE_BASE / project_name / f"writer-{first_writer.name}-{task_id}"
+
+    return start_dev_server(
+        worktree=worktree,
+        config=config.verify.dev_server,
+        task_id=task_id,
+    )
+
+
+def _populate_browser_context(judges: list, handle, task_id: str) -> None:
+    """Attach dev-server URL + screenshots directory to each judge that
+    declared ``mcp_servers``. The addendum builder reads these to
+    generate the right prompt block.
+
+    When the handle is None or marked unavailable, judges get the
+    ``unavailable_reason`` instead so the prompt explicitly tells them
+    NOT to call Playwright tools (they'd fail against a dead server).
+    """
+    if not any(j.mcp_servers for j in judges):
+        return
+
+    from ..core.config import PROJECT_ROOT
+
+    screenshots_root = Path(PROJECT_ROOT) / ".prompts" / "screenshots" / task_id
+
+    for j in judges:
+        if not j.mcp_servers:
+            continue
+        if handle is None:
+            j.dev_server_unavailable_reason = (
+                "verify.dev_server.cmd is empty in cube.yaml; configure it to " "enable browser-augmented review."
+            )
+            continue
+        if handle.unavailable:
+            j.dev_server_unavailable_reason = handle.unavailable_reason
+            continue
+        j.dev_server_url = handle.url
+        j_dir = screenshots_root / j.key
+        j_dir.mkdir(parents=True, exist_ok=True)
+        j.screenshots_dir = j_dir
+
+
 def _apply_judge_persona(prompt: str, judge_info: JudgeInfo) -> str:
     """Prepend judge-specific persona instructions when configured."""
     base = prompt
@@ -132,10 +213,102 @@ def _apply_judge_persona(prompt: str, judge_info: JudgeInfo) -> str:
 ---
 
 {base}"""
+    base = _apply_browser_addendum(base, judge_info)
     base = _apply_file_scope(base, judge_info)
     return _apply_resume_delta(base, judge_info)
 
 
+def _apply_browser_addendum(prompt: str, judge_info: JudgeInfo) -> str:
+    """Prepend Playwright MCP instructions when this judge has a browser
+    declared AND the dev-server came up.
+
+    Three cases:
+
+    1. Judge has no ``mcp_servers`` declared → no addendum, prompt
+       unchanged.
+    2. Judge has ``mcp_servers`` AND dev-server came up → "you have
+       Playwright tools, here's where to navigate, here's where to
+       drop screenshots, reference them in your decision file's
+       ``screenshots`` field."
+    3. Judge has ``mcp_servers`` but dev-server failed to come up →
+       "[BROWSER UNAVAILABLE] reason: <X>. Review statically." Lets
+       the judge complete without trying to call Playwright tools
+       against a non-existent server.
+    """
+    if not judge_info.mcp_servers:
+        return prompt
+
+    if judge_info.dev_server_unavailable_reason:
+        unavailable_block = f"""# [BROWSER UNAVAILABLE]
+
+A browser tool (Playwright MCP) was configured for this judge but the
+dev server could not be brought up:
+
+  {judge_info.dev_server_unavailable_reason}
+
+Review the diff statically. Do NOT call Playwright tools — they will
+fail. If the review hinges on visual behaviour that text review can't
+verify, flag it in ``remaining_issues`` so a human can spin up the
+dev server and re-check.
+
+---
+
+"""
+        return unavailable_block + prompt
+
+    if not judge_info.dev_server_url or judge_info.screenshots_dir is None:
+        return prompt
+
+    screenshots_dir = judge_info.screenshots_dir
+    addendum = f"""# BROWSER REVIEW (Playwright MCP available)
+
+You have Playwright MCP tools available for this review. Use them to
+visually verify any UI change in the diff:
+
+  - ``playwright_navigate`` — open a route at the running dev server
+  - ``playwright_screenshot`` — capture the current viewport (full-page
+    or element-scoped)
+  - ``playwright_click`` / ``playwright_fill`` / ``playwright_select_option``
+    — drive interactions to reach the state you want to verify
+  - ``playwright_evaluate`` — run JS to inspect computed styles, ARIA
+    labels, layout
+
+Dev server is running at: **{judge_info.dev_server_url}**
+
+For each changed route in the diff:
+  1. Navigate to it.
+  2. Screenshot the relevant states (default, hover/focus where
+     applicable, dark mode if the project has it, empty / error /
+     loading states if the diff touches them).
+  3. Save each screenshot under: ``{screenshots_dir}/<descriptive-name>.png``
+     (e.g. ``checkout-empty-cart-dark.png``).
+  4. Reference every screenshot you took in your decision file's
+     ``screenshots`` field:
+
+     ```json
+     "screenshots": [
+       {{"path": "{screenshots_dir}/checkout-empty-cart-dark.png",
+        "caption": "Empty cart on dark mode — note the contrast issue on the CTA"}}
+     ]
+     ```
+
+Visual findings count as ``remaining_issues`` like any text-level
+finding — file:line of the related diff line, plus a short caption.
+If a route fails to load, the screenshot doesn't match the diff's
+intent, or you spot a regression text review couldn't catch, surface
+it.
+
+If you decide the change is non-visual (pure refactor, types-only,
+test-only, server-side), you can skip the browser pass — but say so
+explicitly in your review summary so the operator knows you considered
+it.
+
+---
+
+"""
+    return addendum + prompt
+
+
 def _apply_resume_delta(prompt: str, judge_info: JudgeInfo) -> str:
     """Prepend a 'DELTA SINCE LAST REVIEW' block when resuming across a HEAD bump.
 
@@ -1638,35 +1811,46 @@ async def launch_judge_panel(
     console.print(f"⏳ Waiting for all {len(judges)} judges to complete...")
     console.print()
 
-    # Start layout AFTER printing startup messages
-    panel_layout.start()
-
-    # Per-judge stats collected via the ``_out_stats`` out-param plumbed
-    # through ``run_judge`` — captures attempts, log path, and wall-clock
-    # duration around the full execution (including retries).
-    judge_stats: dict[str, dict] = {j.key: {"attempts": 0, "log_path": None, "duration": 0.0} for j in judges}
-
-    async def _run_judge_timed(judge: JudgeInfo):
-        start = time.monotonic()
-        try:
-            line_count = await run_judge(
-                judge,
-                prompt,
-                actual_resume_mode,
-                panel_layout,
-                winner=winner,
-                _out_stats=judge_stats[judge.key],
-            )
-            return line_count
-        finally:
-            judge_stats[judge.key]["duration"] = time.monotonic() - start
+    # Bring up the dev-server when any judge declares mcp_servers (browser
+    # tools need a running app to point at). Skipped entirely otherwise —
+    # zero cost for non-browser panels. The handle is shared by every
+    # browser-judge in this panel; we tear it down after asyncio.gather.
+    dev_server_handle = _maybe_start_dev_server(judges, config, task_id)
+    try:
+        _populate_browser_context(judges, dev_server_handle, task_id)
+
+        # Start layout AFTER printing startup messages
+        panel_layout.start()
+
+        # Per-judge stats collected via the ``_out_stats`` out-param plumbed
+        # through ``run_judge`` — captures attempts, log path, and wall-clock
+        # duration around the full execution (including retries).
+        judge_stats: dict[str, dict] = {j.key: {"attempts": 0, "log_path": None, "duration": 0.0} for j in judges}
+
+        async def _run_judge_timed(judge: JudgeInfo):
+            start = time.monotonic()
+            try:
+                line_count = await run_judge(
+                    judge,
+                    prompt,
+                    actual_resume_mode,
+                    panel_layout,
+                    winner=winner,
+                    _out_stats=judge_stats[judge.key],
+                )
+                return line_count
+            finally:
+                judge_stats[judge.key]["duration"] = time.monotonic() - start
 
-    run_results = await asyncio.gather(
-        *(_run_judge_timed(judge) for judge in judges),
-        return_exceptions=True,
-    )
+        run_results = await asyncio.gather(
+            *(_run_judge_timed(judge) for judge in judges),
+            return_exceptions=True,
+        )
 
-    DynamicLayout.close()
+        DynamicLayout.close()
+    finally:
+        if dev_server_handle is not None:
+            dev_server_handle.stop()
 
     # Build the per-judge result list. Order: skipped-approved judges first
     # (they were partitioned out before fan-out), then runnable judges in
diff --git a/python/cube/core/decision_parser.py b/python/cube/core/decision_parser.py
index 6a27a61b..25bd232a 100644
--- a/python/cube/core/decision_parser.py
+++ b/python/cube/core/decision_parser.py
@@ -33,6 +33,43 @@ def _coerce_issue_list(value: Any) -> list:
     return []
 
 
+def _coerce_screenshot_list(value: Any) -> list[dict]:
+    """Normalise a judge's ``screenshots`` field to a list of
+    ``{path, caption}`` dicts.
+
+    Tolerates the common deviations browser-judges produce:
+
+    - missing field → empty list (old decision files are unaffected)
+    - list of dicts (canonical) → as-is, after coercing each entry to
+      have ``path`` and ``caption`` string fields
+    - list of strings → wrap each as ``{path: "...", caption: ""}``
+    - single string → wrap as a one-element list
+    - any other → empty list (defensive)
+
+    Aggregation in ``get_peer_review_status`` collects these across the
+    panel so the PR-review code can surface every screenshot in one
+    place without re-parsing.
+    """
+    if not value:
+        return []
+    if isinstance(value, str):
+        return [{"path": value.strip(), "caption": ""}] if value.strip() else []
+    if not isinstance(value, list):
+        return []
+
+    out: list[dict] = []
+    for entry in value:
+        if isinstance(entry, str):
+            text = entry.strip()
+            if text:
+                out.append({"path": text, "caption": ""})
+        elif isinstance(entry, dict):
+            path = str(entry.get("path") or "").strip()
+            if path:
+                out.append({"path": path, "caption": str(entry.get("caption") or "").strip()})
+    return out
+
+
 def normalize_decision(decision: str) -> str:
     """Normalize decision string to canonical form.
 
@@ -488,6 +525,7 @@ def get_peer_review_status(
     from .user_config import get_judge_configs
 
     all_issues = []
+    all_screenshots: list = []
     approvals = 0
     decisions_found = 0
     judge_decisions = {}
@@ -505,6 +543,12 @@ def get_peer_review_status(
 
         decisions_found += 1
         decision = normalize_decision(data.get("decision", "UNKNOWN"))
+        # Screenshots are optional and arrive from browser-judges only.
+        # Tolerant: missing field → empty list (old decision files just
+        # work). Aggregate across the panel so the PR-review code can
+        # surface every screenshot in one place.
+        for shot in _coerce_screenshot_list(data.get("screenshots", [])):
+            all_screenshots.append(shot)
         # Judges occasionally emit ``remaining_issues`` / ``blocker_issues``
         # as a single string instead of a list (the schema asks for a list
         # but the model sometimes returns ``"one bullet describing the
@@ -533,4 +577,5 @@ def get_peer_review_status(
         "decisions_found": decisions_found,
         "approvals": approvals,
         "judge_decisions": judge_decisions,
+        "screenshots": all_screenshots,
     }
diff --git a/python/cube/core/dev_server.py b/python/cube/core/dev_server.py
new file mode 100644
index 00000000..6e4bfdcd
--- /dev/null
+++ b/python/cube/core/dev_server.py
@@ -0,0 +1,263 @@
+"""Dev-server lifecycle for browser-augmented judges.
+
+When a judge declares Playwright (or any browser MCP) via
+``cube.yaml``'s ``judges.<key>.mcp_servers``, cube needs an actual
+running app to point Playwright at. This module brings up the dev
+server before the panel, watches for readiness, and tears it down
+afterwards — same lifecycle for every panel that has at least one
+browser-judge declared.
+
+Lifecycle:
+
+1. ``start(worktree, config) -> DevServerHandle``: spawns ``config.cmd``
+   in the writer's worktree under a fresh process group, hooks
+   stdout/stderr to a per-task log file, watches for the
+   ``ready_marker`` substring on the stream (falls back to polling
+   ``config.url`` with HEAD if no marker is configured), returns a
+   handle when ready.
+2. ``DevServerHandle.stop()``: SIGTERMs the process group with a
+   short grace period, SIGKILLs anything still alive. Idempotent.
+3. ``DevServerHandle.unavailable``: True when start timed out or
+   the process exited before reaching ready. Judges can fall back to
+   static review when this is set.
+
+Configuration lives on ``VerifyConfig.dev_server`` (see
+``user_config.py``). Repos that don't declare a dev_server block get
+``None`` and cube skips the bring-up entirely — no behaviour change.
+"""
+
+from __future__ import annotations
+
+import os
+import signal
+import subprocess
+import time
+import urllib.error
+import urllib.request
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Optional
+
+from .output import print_info, print_warning
+
+
+@dataclass
+class DevServerConfig:
+    """Per-repo dev-server config, mirrored from ``cube.yaml``.
+
+    Loaded under ``verify.dev_server:``. When all fields are empty, the
+    block is considered "not declared" — cube skips bring-up.
+    """
+
+    cmd: str = ""
+    url: str = "http://localhost:3000"
+    ready_marker: str = ""
+    timeout_seconds: int = 60
+
+    @property
+    def declared(self) -> bool:
+        """True if the operator opted in. Empty cmd = not declared."""
+        return bool(self.cmd.strip())
+
+
+@dataclass
+class DevServerHandle:
+    """Live dev-server process. Use via context manager preferred.
+
+    ``unavailable=True`` means the bring-up failed (timeout, crash, or
+    config not declared). Judges that needed the server can fall back to
+    static review.
+    """
+
+    process: Optional[subprocess.Popen] = None
+    url: str = ""
+    log_path: Optional[Path] = None
+    unavailable: bool = False
+    unavailable_reason: str = ""
+    _stopped: bool = field(default=False, repr=False)
+
+    def __enter__(self) -> "DevServerHandle":
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+        self.stop()
+
+    def stop(self) -> None:
+        """SIGTERM the process group; SIGKILL anything still alive after 5s.
+
+        Idempotent. Uses the process group so child processes spawned by
+        the dev-server (``node``, ``next dev``, ``webpack``, etc.) are
+        cleaned up too — most dev-server commands fork a worker, and
+        signalling only the immediate child leaks the worker.
+        """
+        if self._stopped:
+            return
+        self._stopped = True
+
+        if self.process is None or self.process.poll() is not None:
+            return
+
+        pgid = self._pgid()
+        try:
+            if pgid is not None:
+                os.killpg(pgid, signal.SIGTERM)
+            else:
+                self.process.terminate()
+        except ProcessLookupError:
+            return  # already gone
+
+        try:
+            self.process.wait(timeout=5)
+            return
+        except subprocess.TimeoutExpired:
+            pass
+
+        try:
+            if pgid is not None:
+                os.killpg(pgid, signal.SIGKILL)
+            else:
+                self.process.kill()
+        except ProcessLookupError:
+            return
+        try:
+            self.process.wait(timeout=2)
+        except subprocess.TimeoutExpired:
+            # Truly stuck. Nothing more we can do without leaking
+            # operator-visible noise; log and move on.
+            print_warning(
+                f"dev-server pid {self.process.pid} did not exit after SIGKILL — "
+                "leaving zombie. Operator may need to kill manually."
+            )
+
+    def _pgid(self) -> Optional[int]:
+        if self.process is None:
+            return None
+        try:
+            return os.getpgid(self.process.pid)
+        except (OSError, ProcessLookupError):
+            return None
+
+
+def _poll_url(url: str, timeout: float = 2.0) -> bool:
+    """HEAD the URL; True on any 2xx/3xx, False otherwise.
+
+    Used as the readiness fallback when ``ready_marker`` is not
+    configured (or when the marker hasn't appeared yet but the server
+    may already be serving). Wide success window (2xx/3xx) because
+    Next.js and friends often 307 the root to the locale prefix on
+    first request.
+    """
+    try:
+        req = urllib.request.Request(url, method="HEAD")
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            return 200 <= resp.status < 400
+    except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, OSError):
+        return False
+
+
+def _unavailable(reason: str, log_path: Optional[Path]) -> DevServerHandle:
+    """Construct a handle that signals "not available" without a process."""
+    print_warning(f"dev-server unavailable: {reason}")
+    return DevServerHandle(
+        process=None,
+        url="",
+        log_path=log_path,
+        unavailable=True,
+        unavailable_reason=reason,
+        _stopped=True,
+    )
+
+
+def start(
+    worktree: Path,
+    config: DevServerConfig,
+    task_id: str,
+    log_dir: Optional[Path] = None,
+) -> DevServerHandle:
+    """Spawn the dev server in ``worktree`` and block until ready or timeout.
+
+    On any failure (config not declared, command fails to spawn, marker
+    never appears, port never opens) returns a handle with
+    ``unavailable=True`` so callers can fall back to static review
+    without further error handling.
+
+    The log file path is included on the handle so judges' prompts can
+    reference it for diagnostics ("If your screenshot 500s, tail
+    .prompts/dev-server-<task>.log").
+    """
+    if not config.declared:
+        return _unavailable("dev_server.cmd is empty in cube.yaml", None)
+
+    log_path = (log_dir or worktree.parent) / f"dev-server-{task_id}.log"
+    log_path.parent.mkdir(parents=True, exist_ok=True)
+    log_handle = open(log_path, "wb", buffering=0)
+
+    print_info(f"dev-server: starting `{config.cmd}` in {worktree}")
+    try:
+        process = subprocess.Popen(
+            config.cmd,
+            shell=True,
+            cwd=worktree,
+            stdout=log_handle,
+            stderr=subprocess.STDOUT,
+            # New process group so we can signal children (Next.js / Vite
+            # fork workers that survive a plain SIGTERM to the immediate
+            # PID).
+            start_new_session=True,
+        )
+    except (OSError, FileNotFoundError) as e:
+        log_handle.close()
+        return _unavailable(f"spawn failed: {e}", log_path)
+
+    handle = DevServerHandle(process=process, url=config.url, log_path=log_path)
+    if _wait_for_ready(handle, config, log_path):
+        print_info(f"dev-server: ready at {config.url}")
+        return handle
+
+    # Bring-up failed. Tear down whatever did start, return unavailable
+    # so the caller can proceed with static-only review.
+    handle.stop()
+    log_handle.close()
+    return _unavailable(
+        f"never became ready within {config.timeout_seconds}s " f"(see {log_path} for stdout/stderr)",
+        log_path,
+    )
+
+
+def _wait_for_ready(handle: DevServerHandle, config: DevServerConfig, log_path: Path) -> bool:
+    """Block until either:
+      - ``config.ready_marker`` appears in the log file (preferred), or
+      - a HEAD ``config.url`` returns 2xx/3xx (fallback / always-runs).
+
+    Bails out early if the process exits — no point waiting for a
+    marker on a dead process.
+    """
+    deadline = time.monotonic() + max(1, config.timeout_seconds)
+    marker = config.ready_marker.encode() if config.ready_marker else b""
+
+    while time.monotonic() < deadline:
+        # Process died — no marker, no port, give up.
+        assert handle.process is not None
+        if handle.process.poll() is not None:
+            print_warning(f"dev-server process exited with code {handle.process.returncode} " f"before becoming ready")
+            return False
+
+        # Marker check (cheap: scan from start). We tolerate a tiny
+        # missed-read window because Popen's stdout is buffered through
+        # the log file rather than read directly.
+        if marker:
+            try:
+                content = log_path.read_bytes()
+                if marker in content:
+                    return True
+            except (OSError, FileNotFoundError):
+                pass
+
+        # URL poll fallback — always runs even if marker is set, so a
+        # mistyped marker doesn't prevent ready detection.
+        if _poll_url(config.url):
+            return True
+
+        time.sleep(0.5)
+
+    return False
diff --git a/python/cube/core/user_config.py b/python/cube/core/user_config.py
index fbbaaf0d..c1774148 100644
--- a/python/cube/core/user_config.py
+++ b/python/cube/core/user_config.py
@@ -57,6 +57,9 @@ def session_key(self, task_id: str, review_type: str) -> str:
         return f"{self.key.upper()}_{task_id}_{review_type}"
 
 
+from .dev_server import DevServerConfig  # noqa: E402  (avoid circular import order)
+
+
 @dataclass
 class VerifyConfig:
     """Deterministic verify gate run between writer and judge phases.
@@ -84,6 +87,10 @@ class VerifyConfig:
     check: str = ""
     timeout_seconds: int = 600
     max_attempts: int = 3
+    # Optional dev-server bring-up — only used by browser-augmented
+    # judges (judge_4 with Playwright). When the block is missing or
+    # ``cmd`` is empty, cube skips the bring-up entirely.
+    dev_server: DevServerConfig = field(default_factory=DevServerConfig)
 
     @property
     def effective_check(self) -> str:
@@ -315,12 +322,20 @@ def load_config() -> CubeConfig:
     prompter_model = _resolve_model_alias(prompter.get("model", "sonnet-4.6-thinking"), model_aliases)
 
     verify_raw = data.get("verify") or {}
+    dev_server_raw = verify_raw.get("dev_server") or {}
+    dev_server_cfg = DevServerConfig(
+        cmd=str(dev_server_raw.get("cmd", "")).strip(),
+        url=str(dev_server_raw.get("url", "http://localhost:3000")).strip(),
+        ready_marker=str(dev_server_raw.get("ready_marker", "")).strip(),
+        timeout_seconds=int(dev_server_raw.get("timeout_seconds", 60)),
+    )
     verify_cfg = VerifyConfig(
         cmd=str(verify_raw.get("cmd", "")).strip(),
         install=str(verify_raw.get("install", "")).strip(),
         check=str(verify_raw.get("check", "")).strip(),
         timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
         max_attempts=int(verify_raw.get("max_attempts", 3)),
+        dev_server=dev_server_cfg,
     )
 
     _config_cache = CubeConfig(
diff --git a/python/cube/models/types.py b/python/cube/models/types.py
index d1233a93..d62ff8bd 100644
--- a/python/cube/models/types.py
+++ b/python/cube/models/types.py
@@ -50,6 +50,13 @@ class JudgeInfo:
     # browser-based UI review on judge_4). Empty / None = no MCP, judge
     # runs with bare built-in tools.
     mcp_servers: dict[str, Any] = field(default_factory=dict)
+    # Browser context, populated by the panel when the dev-server bring-up
+    # ran for this judge (i.e. the judge declared mcp_servers). The judge's
+    # prompt addendum references these to tell it where to navigate and
+    # where to drop screenshots. None = no browser pass; addendum omitted.
+    dev_server_url: Optional[str] = None
+    dev_server_unavailable_reason: Optional[str] = None
+    screenshots_dir: Optional[Path] = None
 
 
 @dataclass
diff --git a/tests/core/test_decision_parser.py b/tests/core/test_decision_parser.py
index e0076e1d..94fcafdf 100644
--- a/tests/core/test_decision_parser.py
+++ b/tests/core/test_decision_parser.py
@@ -205,6 +205,117 @@ def mock_find(judge_key, task_id, review_type, project_root=None):
         assert len(decisions) == 2
 
 
+class TestScreenshotCoercion:
+    """``screenshots`` is a new optional field on judge decision files
+    (browser-judges produce it via Playwright MCP). The coercion must be
+    backward-compatible with every existing decision file shape.
+    """
+
+    def test_missing_field_yields_empty_list(self):
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        assert _coerce_screenshot_list(None) == []
+        assert _coerce_screenshot_list([]) == []
+
+    def test_canonical_list_of_dicts_passes_through(self):
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        shots = _coerce_screenshot_list(
+            [
+                {"path": "/a.png", "caption": "Empty cart"},
+                {"path": "/b.png", "caption": ""},
+            ]
+        )
+        assert shots == [
+            {"path": "/a.png", "caption": "Empty cart"},
+            {"path": "/b.png", "caption": ""},
+        ]
+
+    def test_list_of_strings_gets_wrapped(self):
+        """Judges sometimes drop just paths without captions."""
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        shots = _coerce_screenshot_list(["/a.png", "/b.png"])
+        assert shots == [
+            {"path": "/a.png", "caption": ""},
+            {"path": "/b.png", "caption": ""},
+        ]
+
+    def test_single_string_wraps_to_one_entry(self):
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        assert _coerce_screenshot_list("/single.png") == [{"path": "/single.png", "caption": ""}]
+
+    def test_dict_without_path_is_dropped(self):
+        """Defensive — only entries with a non-empty path count."""
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        assert _coerce_screenshot_list([{"caption": "no path"}]) == []
+        assert _coerce_screenshot_list([{"path": ""}]) == []
+
+    def test_garbage_returns_empty(self):
+        from cube.core.decision_parser import _coerce_screenshot_list
+
+        assert _coerce_screenshot_list(42) == []
+        assert _coerce_screenshot_list({"shape": "wrong"}) == []
+
+
+class TestPeerReviewStatusScreenshots:
+    """``get_peer_review_status`` surfaces screenshots aggregated across
+    every judge in the panel. The PR-review code reads this to embed
+    screenshots in the GitHub review body.
+    """
+
+    def test_missing_screenshots_field_is_tolerated(self, tmp_path, mock_config, monkeypatch):
+        from cube.core.decision_parser import get_peer_review_status
+
+        decisions_dir = tmp_path / ".prompts" / "decisions"
+        decisions_dir.mkdir(parents=True)
+        (decisions_dir / "judge_1-mytask-peer-review.json").write_text(
+            json.dumps({"decision": "APPROVED", "remaining_issues": []})
+        )
+        (decisions_dir / "judge_2-mytask-peer-review.json").write_text(
+            json.dumps({"decision": "APPROVED", "remaining_issues": []})
+        )
+        status = get_peer_review_status("mytask", project_root=tmp_path)
+        assert status["screenshots"] == []
+        assert status["approved"] is True
+
+    def test_screenshots_aggregate_across_judges(self, tmp_path, mock_config, monkeypatch):
+        from cube.core.decision_parser import get_peer_review_status
+
+        decisions_dir = tmp_path / ".prompts" / "decisions"
+        decisions_dir.mkdir(parents=True)
+        (decisions_dir / "judge_1-mytask-peer-review.json").write_text(
+            json.dumps(
+                {
+                    "decision": "APPROVED",
+                    "remaining_issues": [],
+                    "screenshots": [{"path": ".prompts/screenshots/mytask/judge_1/a.png", "caption": "first"}],
+                }
+            )
+        )
+        (decisions_dir / "judge_2-mytask-peer-review.json").write_text(
+            json.dumps(
+                {
+                    "decision": "REQUEST_CHANGES",
+                    "remaining_issues": ["a thing"],
+                    "screenshots": [
+                        {"path": ".prompts/screenshots/mytask/judge_2/b.png", "caption": "second"},
+                        {"path": ".prompts/screenshots/mytask/judge_2/c.png", "caption": ""},
+                    ],
+                }
+            )
+        )
+        status = get_peer_review_status("mytask", project_root=tmp_path)
+        paths = sorted(s["path"] for s in status["screenshots"])
+        assert paths == [
+            ".prompts/screenshots/mytask/judge_1/a.png",
+            ".prompts/screenshots/mytask/judge_2/b.png",
+            ".prompts/screenshots/mytask/judge_2/c.png",
+        ]
+
+
 class TestNormalizeDecisionSynonyms:
     """`normalize_decision` must recognise common judge-output drift so
     unrecognised negatives don't slip past the auto-approve gate.
diff --git a/tests/core/test_dev_server.py b/tests/core/test_dev_server.py
new file mode 100644
index 00000000..d4083b36
--- /dev/null
+++ b/tests/core/test_dev_server.py
@@ -0,0 +1,144 @@
+"""Tests for ``cube.core.dev_server``.
+
+Covers the three contracts judges depend on:
+
+1. ``DevServerConfig.declared`` is False when the operator hasn't
+   configured a dev server. Cube skips the bring-up entirely.
+2. ``start()`` returns ``unavailable=True`` (not an exception) when the
+   command fails to spawn, exits early, or never reaches ready within
+   the configured timeout. Lets the panel proceed with static-only
+   review instead of crashing.
+3. ``DevServerHandle.stop()`` is idempotent and signals the process
+   group (so dev-server worker processes get cleaned up too).
+"""
+
+from __future__ import annotations
+
+import sys
+import time
+
+from cube.core.dev_server import DevServerConfig, start
+
+
+def test_dev_server_config_undeclared_when_cmd_empty():
+    """No ``cmd`` → not declared → cube skips bring-up. Out-of-box
+    behaviour for any repo that doesn't opt in to browser review."""
+    assert DevServerConfig().declared is False
+    assert DevServerConfig(cmd="").declared is False
+    assert DevServerConfig(cmd="   ").declared is False
+
+
+def test_dev_server_config_declared_when_cmd_set():
+    assert DevServerConfig(cmd="pnpm dev").declared is True
+
+
+def test_start_returns_unavailable_when_not_declared(tmp_path):
+    """No cmd → returns a sentinel handle with ``unavailable=True`` and
+    no process. The panel populates this onto judges so their persona
+    prompt gets the [BROWSER UNAVAILABLE] block instead of trying to
+    call Playwright tools against thin air.
+    """
+    handle = start(tmp_path, DevServerConfig(), task_id="t")
+    assert handle.unavailable is True
+    assert handle.process is None
+    assert "empty" in handle.unavailable_reason
+
+
+def test_start_returns_unavailable_when_spawn_fails(tmp_path):
+    """``cmd`` exists but the underlying binary doesn't / dies
+    immediately → bring-up fails gracefully. ``shell=True`` so we
+    exercise the path subprocess actually takes."""
+    cfg = DevServerConfig(
+        cmd="/this/binary/does/not/exist --whatever",
+        url="http://127.0.0.1:1",  # unreachable
+        timeout_seconds=2,
+    )
+    handle = start(tmp_path, cfg, task_id="t")
+    assert handle.unavailable is True
+
+
+def test_start_returns_unavailable_when_marker_never_appears(tmp_path):
+    """Process spawns but never emits the configured ready_marker AND
+    the URL never opens → timeout path. The handle is unavailable but
+    the process has been cleaned up (no orphan ``sleep`` left running).
+    """
+    cfg = DevServerConfig(
+        # Sleep for longer than the timeout, never emit anything.
+        cmd=f'{sys.executable} -c "import time; time.sleep(30)"',
+        url="http://127.0.0.1:1",  # unreachable
+        ready_marker="this string will never appear",
+        timeout_seconds=2,
+    )
+    started = time.monotonic()
+    handle = start(tmp_path, cfg, task_id="t")
+    elapsed = time.monotonic() - started
+    assert handle.unavailable is True
+    # Bounded by timeout — should NOT block forever.
+    assert elapsed < 6, f"start() should have bailed by timeout; took {elapsed:.1f}s"
+
+
+def test_start_succeeds_when_marker_appears_in_log(tmp_path):
+    """Happy path via the log-marker readiness signal. The subprocess
+    prints the marker on its own; the helper notices and returns a
+    live handle."""
+    marker = "READY-MARKER-XYZ"
+    cfg = DevServerConfig(
+        cmd=f"{sys.executable} -u -c \"import sys; sys.stdout.write('{marker}\\n'); sys.stdout.flush(); import time; time.sleep(30)\"",
+        url="http://127.0.0.1:1",  # unreachable — force the marker path
+        ready_marker=marker,
+        timeout_seconds=5,
+    )
+    handle = start(tmp_path, cfg, task_id="t")
+    try:
+        assert handle.unavailable is False
+        assert handle.process is not None
+        assert handle.process.poll() is None  # still running
+        assert handle.url == cfg.url
+        assert handle.log_path is not None and handle.log_path.exists()
+        # Log captures the marker line we printed.
+        assert marker.encode() in handle.log_path.read_bytes()
+    finally:
+        handle.stop()
+
+
+def test_stop_is_idempotent(tmp_path):
+    """Calling stop() twice is a no-op the second time — and any
+    later call. Important because the panel wraps stop() in a try/
+    finally that may also fire on exception paths."""
+    cfg = DevServerConfig(
+        cmd=f'{sys.executable} -c "import time; time.sleep(30)"',
+        url="http://127.0.0.1:1",
+        ready_marker="will never appear",
+        timeout_seconds=1,
+    )
+    handle = start(tmp_path, cfg, task_id="t")
+    # start() already calls stop() internally on timeout, so this just
+    # verifies that an extra stop() doesn't blow up.
+    handle.stop()
+    handle.stop()
+    handle.stop()
+
+
+def test_stop_kills_process_group(tmp_path):
+    """A shell command that backgrounds a child must NOT leave the
+    child running after stop() — otherwise ``pnpm dev`` style
+    dev-servers (which fork workers) leak processes across panel
+    runs. We verify the wrapper script's own PID exits, which is
+    the necessary precondition for child cleanup via SIGTERM/SIGKILL
+    on the process group.
+    """
+    marker = "UP"
+    cfg = DevServerConfig(
+        cmd=f"{sys.executable} -u -c \"import sys; sys.stdout.write('{marker}\\n'); sys.stdout.flush(); import time; time.sleep(30)\"",
+        url="http://127.0.0.1:1",
+        ready_marker=marker,
+        timeout_seconds=5,
+    )
+    handle = start(tmp_path, cfg, task_id="t")
+    assert handle.unavailable is False
+    pid = handle.process.pid  # type: ignore[union-attr]
+    handle.stop()
+    # After stop(), the process is reaped — poll() returns the exit code,
+    # not None.
+    assert handle.process is not None
+    assert handle.process.poll() is not None, f"pid {pid} still alive after stop()"