Skip to content

research: agent runtime architecture — from embedded component to browser-native runtime #72

@viv

Description

@viv

Summary

Research into evolving Review Loop from an embedded dev-only component (Shadow DOM overlay injected via Vite/Astro/Express) to an agent runtime — enabling visual regression, autonomous annotation verification, multi-page crawling, and deep AI agent integration.

Full research report: docs/reports/agent-runtime-research.md

Prior Art: Tidewave and Runtime-Embedded Agents

José Valim's Tidewave demonstrates a compelling alternative to the "agent controls a browser externally" approach. Tidewave runs an MCP server inside the web application runtime (Phoenix/Rails), giving agents direct access to the live system — DOM-to-source mapping, database queries, background jobs, REPL execution, and correlated error context.

Valim's vertical integration essay identifies three frustrations with current coding agents that directly mirror Review Loop's challenges:

  1. Agent can't verify its own work — it says a feature is complete but can't see the browser to confirm
  2. Error context requires human mediation — developers copy-paste stacktraces from browser to agent
  3. No UI-to-source mapping — developers must manually translate "this dropdown" to the source file that renders it

Point 3 is the exact problem our source file mapping feature (#66) targets. Point 1 is what the Playwright-based verification in this research aims to solve.

Key Insight: Review Loop Already Has Runtime Access

The critical realisation from studying Tidewave is that Review Loop's Vite plugin is already running inside the dev server. The existing middleware has access to Vite's module graph, which maps source files to rendered output. This means Review Loop is already in a Tidewave-like position — it just doesn't exploit it yet.

Tidewave Review Loop (current) Review Loop (proposed)
Runtime position MCP server inside the app Vite middleware inside the dev server Vite middleware + Playwright
Source mapping Runtime knows which component renders each element Has Vite module graph access but doesn't use it Could use module graph for annotation → source hints
Verification Agent has REPL + live system access None — agent is blind to rendered output Playwright screenshots + DOM inspection
Framework scope Deep per-framework integration Framework-agnostic via adapters Same, with optional deeper integration

Implications for Architecture Choice

Tidewave's approach validates Architecture D (Hybrid) as the right starting point, but reframes why:

  • Don't abandon the embedded plugin — it's the runtime integration point. Enhance it with source mapping, error context, and richer MCP tools
  • Add Playwright for what the plugin can't do — visual verification, multi-page crawling, screenshot diffing
  • The Vite plugin should become smarter, not be replaced — more like Tidewave's runtime MCP server, less like a passive UI injector

This is a both/and approach rather than either/or: runtime intelligence from the plugin, visual intelligence from the browser.

Motivation

The current embedded model has limitations that runtime-aware tooling can address:

  • No visual verification — agents can't screenshot, diff, or verify their own changes
  • No autonomous navigation — agents can't browse the site to check their work
  • Same-page constraint — can only annotate pages the user navigates to manually
  • No source mapping — agents must grep for source files (see feat: source file mapping for agent annotations #66)
  • Unexploited runtime access — the Vite plugin sits inside the dev server but doesn't expose module graph, error context, or build diagnostics to agents
  • Can't review deployed sites — limited to local dev environments

Architectures Researched

Four architectures were evaluated:

Architecture Approach Install Friction Framework Adapters Visual Verification Works on Any Site
A: Playwright + Extension Launch Chromium via Playwright, sideload review extension Medium None Yes Yes
B: Electron App Self-contained review browser with webview + panel Medium None Yes Yes
C: Chrome Extension + Native Host Extension in user's browser, native messaging to MCP Medium None Partial Yes
D: Hybrid Keep Vite plugin + add agent browser mode Low-Medium Yes (embedded mode) Yes (agent mode) Partial

Recommended Phased Approach

Phase 1: Runtime Intelligence (Tidewave-Inspired) — Enhance the Vite Plugin

Before adding browser automation, exploit the runtime access we already have:

  • Source file mapping via Vite's module graph — annotation → source file hints (feat: source file mapping for agent annotations #66)
  • Build error context — surface Vite compilation errors/warnings to agents via MCP
  • Module dependency info — which components contribute to a page
  • HMR-aware feedback — notify agents when their changes trigger successful or failed hot reloads

This is low-risk, high-value, and requires no new infrastructure.

Phase 2: Visual Verification — Add Playwright

Add npx review-loop verify that launches Chromium via Playwright to:

  • Navigate to each annotated page
  • Take screenshots and verify DOM state
  • Report results via new MCP tools (verify_annotation, screenshot_page, crawl_site)

Non-breaking — existing users unaffected.

Phase 3: Chrome Extension — Replace Embedded UI

Build a Chrome extension (content script + side panel + native messaging host) that eliminates framework adapters entirely. Works on any site, dev or production.

Phase 4: Full Agent Runtime

Unify all phases: runtime intelligence from the plugin, visual verification from Playwright, human review from the extension, MCP orchestrates everything.

Technologies Evaluated

  • Tidewave — José Valim's runtime-embedded MCP server for Phoenix/Rails (blog)
  • Browser-Use — Python AI browser agent framework (Playwright-based)
  • Playwright MCP — Microsoft's MCP server for browser automation via accessibility tree
  • Chrome DevTools MCP — Google's AI DevTools integration (26 tools via CDP)
  • Stagehand — Browserbase's act/extract/observe primitives
  • Vercel Agent-Browser — daemon architecture with Rust CDP client
  • Electron / Tauri v2 / Puppeteer / WebDriver BiDi — embedding technologies

New MCP Tools (proposed)

Runtime intelligence (Phase 1)

  • get_source_hint(annotationId) — source file + line range for an annotation
  • get_build_status() — current Vite build errors/warnings
  • get_module_graph(pageUrl) — which source modules contribute to a page

Visual verification (Phase 2)

  • screenshot_page(url) — capture page screenshot
  • verify_annotation(id) — navigate to annotation's page, check if change rendered
  • get_accessibility_tree(url) — structured page representation
  • crawl_site(baseUrl, depth?) — discover and catalogue all pages
  • visual_diff(url, before?, after?) — compare page states

Open Questions

  1. Should the agent runtime be a separate package (review-loop-browser)?
  2. Chrome extension distribution: Web Store, enterprise sideload, or both?
  3. Should Playwright be a peer dependency or bundled (~50MB)?
  4. How to handle authentication on target sites in agent browser mode?
  5. Should the extension support Firefox via WebExtensions API?
  6. Visual diff engine: custom, pixelmatch, or existing tools?
  7. Is the Electron app (Architecture B) worth pursuing given the extension approach?
  8. How deep should per-framework runtime integration go? Tidewave goes very deep (REPL, DB access); Review Loop's zero-config principle may favour a lighter touch
  9. Should Review Loop adopt ACP (Agent Client Protocol) alongside MCP for multi-agent portability?

What Stays Unchanged

  • ReviewStorage class and inline-review.json format
  • MCP protocol and tool interface (extended, not replaced)
  • Shared types (src/shared/types.ts)
  • Export functionality

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions