From 643a0428a26c4bee6d754b3338eceacf214334a4 Mon Sep 17 00:00:00 2001 From: Brad Cunningham Date: Tue, 5 May 2026 08:58:21 -0400 Subject: [PATCH] spec: V56.4 browser-driven harness MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sibling to V56.3's static-fixture harness pattern. Wraps existing camofox-mcp infrastructure into per-detector contracts. Closes ~25 of 43 remaining wired detectors that need browser runtime (console_error, react_error, hydration_mismatch, perf metrics, nav-state, axe-driven a11y, interaction state). Same DetectorContract type, same calibration scorecard, same serial 1-PR-per-detector cadence. New runner class only. Buckets: A — direct console / error capture (5) B — production classifier reuse (4) C — perf metrics (7) D — nav state (5) E — interaction state (4) Defers: race conditions (V56.5), IDOR variants (V56.5), multi-context (V56.6), service/web worker (V56.7), WebRTC (residual), agent/LLM (V56.8), visual layout (V56.9). After V56.4 closes: 76/94 wired detectors with working harness. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/V56.4_BROWSER_HARNESS.md | 383 ++++++++++++++++++++++++++++ 1 file changed, 383 insertions(+) create mode 100644 docs/specs/V56.4_BROWSER_HARNESS.md diff --git a/docs/specs/V56.4_BROWSER_HARNESS.md b/docs/specs/V56.4_BROWSER_HARNESS.md new file mode 100644 index 00000000..8ece1ad6 --- /dev/null +++ b/docs/specs/V56.4_BROWSER_HARNESS.md @@ -0,0 +1,383 @@ +# V56.4 — Browser-Driven Harness + +**Status:** Draft 1 — implementation contract (serial calibration, single-coder) +**Author:** @architect +**Date:** 2026-05-05 +**Depends on:** V56.3 (static-fixture harness), V49 (camofox MCP transport), V36 (browser-platform probe), V44 (calibration corpus) +**Deferred to follow-ups:** V56.5 (concurrent-action / race), V56.6 (multi-context), V56.7 (agent), V56.8 (visual) + +--- + +## 1. Problem statement + +V56.3 closed the static-fixture harness pattern: 51 of 94 wired BugKinds now have a per-detector contract, mini-fixture, and calibration scorecard. The remaining 43 wired kinds cannot fit that pattern because their detection signals are not in the response body — they only exist inside a real browser at runtime: console events, React error boundaries, DOM mutations under fault injection, Performance Observer entries, navigation history state, axe-core violations on a live DOM, focus chain traversal, service-worker / web-worker errors. + +These kinds DO already work in production through the batch pipeline (`bughunter run`) which runs camofox-mcp + per-phase classifiers. What's missing is a calibration-friendly harness that uses the same browser primitives but in the V56.3 shape: single fixture, single detector, sub-second run, scorecard pass/fail per assertion. Without that, regressions in browser-observed kinds attribute to the run, not the detector — exactly the gap V56 was meant to close. + +V56.4 lands a sibling harness pattern (`BrowserHarnessRunner`) that wraps the existing camofox infrastructure into a per-detector contract, parallel to V56.3's static `runHarness` path. Same DetectorContract type, same expected-clusters.jsonl format, same `bughunter test-detector ` CLI, same per-PR-per-detector merge cadence. New runner class, new fixture template (HTML page that triggers the planted bug), new browser-observation harvest step. + +--- + +## 2. Boundaries + +### 2.1 In scope + +- `BrowserHarnessRunner` class in `packages/cli/src/harness/browser-executor.ts` +- New harness phase: probe phase (camofox navigate + observation harvest) before the existing classify phase +- Fixture template: HTML pages that trigger the planted bug at load time +- Per-detector contracts for the ~25 browser-observed kinds (see §6 for the list) +- Mini-fixture pattern same as V56.3: bin/up.sh boots a static HTTP server on a fixed port, serves a small set of routes +- Browser observation envelope (see §4): console events, errors, network requests, DOM snapshot, performance entries, axe results +- `tools: ['browser-mcp']` flag in DetectorContract triggers BrowserHarnessRunner instead of static runHarness +- Camofox tab lifecycle management: one `withTab()` per probe with explicit teardown +- The serial calibration cadence (one detector per PR, scorecard ALL-PASS gate) + +### 2.2 Out of scope + +- Multi-tab / multi-context harness — deferred to V56.6 +- Concurrent-action / race-condition harness — deferred to V56.5 +- LLM-agent harness (prompt_injection_executed) — deferred to V56.7 +- Visual / screenshot-diff harness (i18n_rtl_layout_break, visual_anomaly) — deferred to V56.8 +- WebRTC / webrtc_ice_failure — genuinely needs WebRTC peer infrastructure; documented as honestly-residual deferred +- Replacing or modifying the production batch pipeline (`bughunter run`) +- New BugKinds — V56.4 strictly wraps existing detectors that already fire in the batch path + +### 2.3 External dependencies + +- `camofox-mcp` (already deployed at port 9377) — used via existing `BrowserMcpAdapter` +- Existing `CamofoxBrowserMcpAdapter` from `packages/cli/src/adapters/browser-mcp.ts` — REUSED, not rewritten +- Existing classify-phase functions (`classify/console.ts`, `classify/react.ts`, etc.) — invoked by the harness; not modified +- Existing axe-core injection helpers (used by accessibility detectors) — REUSED +- `bootFixture()` helper from V56.2.1 — REUSED for HTTP-server-backed fixtures + +--- + +## 3. Existing code map (READ FIRST) + +### 3.1 Files you MUST read before writing any V56.4 code + +| File | Purpose for V56.4 | +|---|---| +| `/root/BugHunter/packages/cli/src/harness/executor.ts` | V56.3 harness — `runHarness()` is the static-fixture path. V56.4 ADDS a parallel `runBrowserHarness()`; does NOT modify the static path. | +| `/root/BugHunter/packages/cli/src/adapters/browser-mcp.ts` (lines 118–250) | `BrowserMcpAdapter` interface + `CamofoxBrowserMcpAdapter` implementation. V56.4 uses `withTab()` + `evaluate()` + `snapshot()` + `cookies()`. | +| `/root/BugHunter/packages/cli/src/cli/test-detector.ts` | V56.3 CLI runner. V56.4 adds a switch in `runOneContract()` based on `contract.requires.tools` — calls `runBrowserHarness()` when `'browser-mcp'` is in tools. | +| `/root/BugHunter/packages/cli/src/detectors/contracts.ts` | `DetectorContract` type. V56.4 reuses unchanged; existing `tools: ['browser-mcp']` field already drives the dispatch. | +| `/root/BugHunter/packages/cli/src/classify/console.ts` (line 24) | Production console_error classifier. V56.4 invokes it on harvested console events. | +| `/root/BugHunter/packages/cli/src/classify/react.ts` (lines 38, 45) | Production hydration_mismatch + react_error classifiers. Same pattern. | +| `/root/BugHunter/packages/cli/src/classify/network.ts` | Production network_5xx + network_4xx_unexpected (already harnessed in V56.3 via static runner — V56.4 ALSO covers them via browser-observed HAR for completeness, but the static path stays primary). | +| `/root/BugHunter/packages/cli/src/discovery/browser-platform-probe.ts` | Bootstrap-script pattern for browser observation. V56.4's harvest script borrows the same install-then-poll shape. | +| `/root/BugHunter/packages/cli/src/perf/web-vitals-vendored/web-vitals.umd.js` | Vendored web-vitals library. V56.4 perf detectors install this on each fixture page and harvest after settle. | +| `/root/BugHunter/fixtures/detector-calibration/seo-mini/app/server.js` | Reference fixture — minimal HTTP server returning HTML. V56.4 fixtures follow this template; the HTML payload changes per detector. | +| `/root/BugHunter/.claude/projects/-root/memory/feedback_test_detector_contracts_fully.md` | The 4-shape testing minimum — V56.4 fixtures MUST encode positive + negative + ≥2 edges + input-degradation. | + +### 3.2 Patterns to follow + +- **Fixture HTML structure:** one `