tangle-network · drewstone · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026
diff --git a/.changeset/design-audit-8-layer-architecture.md b/.changeset/design-audit-8-layer-architecture.md
@@ -0,0 +1,56 @@
+---
+'@tangle-network/browser-agent-driver': minor
+---
+
+feat(design-audit): 8-layer architecture — Layers 1-7 fully shipped, Layer 8 scaffold
+
+Full implementation of RFC-002: World-Class Design Audit. Primary consumer is coding agents (Claude Code, Codex, OpenCode, Pi); the architecture is JSON-first, tool-callable, and self-explaining when uncertain.
+
+**Layer 1 — Multi-dimensional scoring** _(shipped)_
+- Ensemble classifier (URL pattern + DOM heuristic + LLM tiebreaker) with `ensembleConfidence`, `signalsAgreed`, `dissent`.
+- Five universal dimensions: `product_intent / visual_craft / trust_clarity / workflow / content_ia`.
+- Per-page-type rollup weights (saas-app, marketing, dashboard, docs, ecommerce, social, tool, blog, utility).
+- Per-page-type calibration anchors (`rubric/anchors/*.yaml`) so app surfaces aren't judged against marketing-site polish.
+- `AuditResult_v2` emitted alongside v1 shape; v1 deprecated with one-release lag.
+
+**Layer 2 — Patch primitives** _(shipped)_
+- Every major/critical finding now ships `patches[]` with `target`, `diff.before`/`after`, `testThatProves`, `rollback`, `estimatedDelta`, and `estimatedDeltaConfidence`.
+- `diff.before` is validated as a substring of the page snapshot at parse time — agents apply patches literally without re-authoring.
+- Severity enforcement: findings without valid patches are downgraded from major/critical to minor.
+- `patches/render.ts`: renders `unifiedDiff` from before/after when `target.filePath` is known (`git apply`-able).
+
+**Layer 3 — First-principles fallback** _(shipped)_
+- Fires when `ensembleConfidence < 0.6`, signals disagree, or page type is `unknown`.
+- Scores against 5 universal product principles only (primary-job clarity, action obviousness, state preview, trust-before-commitment, recovery-from-failure).
+- Sets `rollup.confidence = 'low'`; emits `NovelPatternObservation` to `~/.bad/novel-patterns/` for fleet mining.
+- New rubric fragment `first-principles.md` carries the exact prompt that fires in this mode.
+
+**Layer 4 — Outcome attribution** _(shipped)_
+- `bad design-audit ack-patch <patchId> --pre-run-id <runId>` — records that an agent applied a patch.
+- `bad design-audit --post-patch <patchId>` on re-audit — computes observed delta vs predicted, writes `agreementScore`.
+- JSONL store at `~/.bad/attribution/applications/`. Append-only — outcomes are new events, not mutations.
+- `aggregatePatchReliability()` cross-tenant rollup: groups by `patchHash = sha256(before+after+scope).slice(0,16)`. After N≥30 / ≥5 tenants / replicationRate≥0.7 → `recommendation: 'recommended'`.
+
+**Layer 5 — Pattern library** _(scaffold)_
+- `patterns/{store,mine,match}.ts` + `cli-patterns.ts` (`bad patterns query|show`).
+- Cold-start: library is empty until ~6 weeks of attribution data accumulates. Mine threshold: N≥30, ≥5 tenants, replicationRate≥0.7. Mining impl is a TODO; the query API and types are stable.
+
+**Layer 6 — Composable predicates** _(shipped)_
+- `AppliesWhen` extended with `audience`, `modality`, `regulatoryContext`, `audienceVulnerability`.
+- 9 new rubric fragments: `audience-{clinician,kids,developer}.md`, `regulatory-{hipaa,gdpr,coppa}.md`, `modality-{mobile,tablet}.md`, `audience-vulnerability-minor-facing.md`.
+- Rubric loader matches new predicates when context provided via `--audience`, `--modality`, `--regulatory`, `--audience-vulnerability` CLI flags.
+
+**Layer 7 — Domain ethics gate** _(shipped)_
+- 4 rule files (medical, kids, finance, legal) with citation-backed rules (FDA 21 CFR 201.57, COPPA 16 CFR 312.5, TILA/Reg Z, GDPR).
+- Hard rollup floor: `critical-floor → 4`, `major-floor → 6`. `preEthicsScore` preserves the LLM's uncapped score.
+- `--skip-ethics` bypass (test-only, logged + warned), `--ethics-rules-dir` override.
+- 8 paired pass/fail fixtures in `bench/design/ethics-fixtures/`.
+
+**Layer 8 — Modality adapters** _(scaffold)_
+- `modality/{types,html,ios,android,index}.ts`. HTML adapter wraps existing Playwright pipeline. iOS and Android throw `NotImplementedError` with clear message. `--modality html|ios|android` dispatches to the right adapter.
+
+**Skill contract updates:**
+- `~/code/dotfiles/claude/skills/bad/SKILL.md`: patch consumption loop, Layer 3-8 contract, ack-patch / --post-patch close-the-loop, ethics floor priority rule.
+- `skills/design-evolve/SKILL.md`: Phase 3 (apply fixes) now patch-first; Phase 4 includes attribution close-the-loop.
+
+**Tests:** +40 new tests across `design-audit-patch-{parse,validate}`, `design-audit-first-principles`, `design-audit-attribution`. Total: 1393 passing.
diff --git a/.changeset/design-audit-layer-1-foundation.md b/.changeset/design-audit-layer-1-foundation.md
@@ -0,0 +1,19 @@
+---
+'@tangle-network/browser-agent-driver': minor
+---
+
+feat(design-audit): Layer 1 — multi-dim scoring foundation
+
+Land the first layer of the world-class 8-layer design-audit architecture (RFC `docs/rfc/design-audit-world-class.md`). This release ships:
+
+- **Ensemble classifier** (`src/design/audit/classify-ensemble.ts`) — three-signal vote (URL pattern + DOM heuristic + LLM tiebreaker) with explicit `ensembleConfidence`, `signalsAgreed`, and `dissent` records. URL+DOM agreement above the 0.7 threshold skips the LLM call entirely.
+- **Per-page-type rollup weights** (`src/design/audit/rubric/rollup-weights.ts`) — saas-app, marketing, dashboard, docs, ecommerce, social, tool, blog, utility, plus `default`/`unknown` fallbacks. Module-load invariant: every weight set sums to 1.0 ± 1e-6.
+- **Per-page-type calibration anchors** (`src/design/audit/rubric/anchors/*.yaml`) — 9 anchor files referencing real product 9-10 examples (Linear's app, Figma, Notion, Stripe, MDN, Apple Store, Threads, Stratechery, Vercel deploys, etc.) so saas-app surfaces are no longer judged against marketing-site polish.
+- **Multi-dim scoring** (`src/design/audit/v2/score.ts`) — five universal dimensions (product_intent / visual_craft / trust_clarity / workflow / content_ia) each with `score`, `range`, `confidence`. Rollup is a weighted aggregate with conservative confidence (any dim `low` → rollup `low`).
+- **`AuditResult_v2`** — emitted alongside the v1 shape in `report.json` under a top-level `v2` block. One-release deprecation window before v1 is removed.
+- **`--audit-passes auto`** — new default that runs the ensemble classifier first, then picks the focused pass bundle for that classification.
+- **CLI summary** — per-page console output now prints the 5-dimension breakdown plus rollup formula.
+
+Backwards compat: all existing v1 fields (`score`, `findings`, `summary`, `strengths`, etc.) remain on `PageAuditResult` and `report.json`. Consumers should migrate to `report.v2.pages[].scores` over the next release.
+
+Skill update: `skills/bad/SKILL.md` documents the new JSON shape with an agent-side worked example for choosing which dimension to invest in based on `score × weight` leverage.
diff --git a/.changeset/design-audit-layer-7-ethics-gate.md b/.changeset/design-audit-layer-7-ethics-gate.md
@@ -0,0 +1,16 @@
+---
+'@tangle-network/browser-agent-driver': minor
+---
+
+feat(design-audit): Layer 7 — domain ethics gate (+ Layer 6 composable predicates)
+
+Adds a hard score floor for pages that fail domain-specific ethics rules and the predicate vocabulary that lets those rules target the right audience/modality/regulatory context. RFC: `docs/rfc/design-audit-world-class.md`.
+
+- **Ethics rule set** (`src/design/audit/ethics/rules/{medical,kids,finance,legal}.yaml`) — curated, citation-backed rules covering medication dosage disclosure (FDA 21 CFR 201.57), kid-facing dark-pattern guards (COPPA, FTC Endorsement Guides), finance fee disclosure (TILA / Reg Z), and legal disclaimer presence.
+- **Detector kinds** (`src/design/audit/ethics/check.ts`) — `pattern-absent`, `pattern-present`, `llm-classifier`. Pattern checks are case-insensitive against page text; the LLM classifier asks for a single yes/no token to keep latency + cost predictable.
+- **Hard rollup floor** — a `critical-floor` violation caps the rollup at 4; `major-floor` caps at 6. `PageAuditResult.preEthicsScore` preserves the LLM's pre-cap score so reports can show "would have scored 8, capped at 4 — fix the dosage disclosure".
+- **Composable predicates (Layer 6)** — extends `AppliesWhen` with `audience`, `modality`, `regulatoryContext`, and `audienceVulnerability`. A pediatric medical app on tablet for clinicians now matches the medical *and* kids rule sets simultaneously instead of forcing one classification.
+- **CLI flags**: `--skip-ethics` (test-only bypass, audited + warned), `--ethics-rules-dir <path>` (override the builtin yaml), `--audience`, `--modality`, `--audience-vulnerability` (comma-separated tag lists threaded into rule matching).
+- **Fixtures** (`bench/design/ethics-fixtures/`) — paired pass/fail HTML for each rule category, used by `tests/design-audit-ethics-{rules,check}.test.ts`.
+
+Backwards compat: rules ship empty by default for any classification not on the curated list, so existing audits see no change unless they opt in via `--audience`/`--modality` or land on a covered domain. `EthicsViolation` is exported from both `src/design/audit/types.ts` and `v2/types.ts`; `PageAuditResult.ethicsViolations` is optional.
diff --git a/.changeset/jobs-reports-content-engine.md b/.changeset/jobs-reports-content-engine.md
@@ -0,0 +1,28 @@
+---
+'@tangle-network/browser-agent-driver': minor
+---
+
+feat(jobs+reports): comparative-audit jobs API + AI SDK report tool surface
+
+Three new modules layered cleanly on top of the existing audit pipeline. Lets you declaratively audit N URLs (optionally expanded into M historical wayback snapshots each), aggregate the results, and emit shareable markdown reports — or expose the same data as AI SDK tools so a browser-side agent can answer ad-hoc questions.
+
+**`src/jobs/`** — declarative comparative-audit jobs.
+- `JobSpec` JSON describes targets + audit options + cost cap; `createJob` mints and persists; `runJob` fans out with bounded concurrency and crash-safe per-result writes to `~/.bad/jobs/`.
+- Pre-flight cost estimate (`estimateCost`) refuses jobs that would silently spend more than `maxCostUSD`.
+- `AuditFn` injection keeps the queue decoupled from Playwright/LLM for tests.
+- CLI: `bad jobs create --spec <file.json>`, `bad jobs status <id>`, `bad jobs list`, `bad jobs estimate --spec <file.json>`.
+
+**`src/discover/`** — turn a `DiscoverSpec` into audit targets.
+- `wayback` source uses archive.org's CDX API to list captures, then samples `count` evenly across the time range.
+- `list` source is a pass-through.
+- Pluggable `fetch` for tests; status-200-only filter on by default so 4xx snapshots don't poison the job.
+
+**`src/reports/`** — turn a job into an artifact.
+- `aggregateJob` reads each per-target `report.json`, projects to `AggregateRow` (rollup, dimensions, ethics count). All numbers in any report flow through this — never an LLM.
+- `leaderboard`, `longitudinalFor`, `compareRuns`, `tierBuckets` are pure functions over rows.
+- `renderLeaderboard` / `renderLongitudinal` / `renderBatchComparison` produce deterministic markdown.
+- `narrateReport(brain, body)` optionally prepends an LLM exec-summary; without `brain`, returns the deterministic body unchanged. Same contract as the audit-patches layer: agent narrates, code computes.
+- `buildReportTools()` exposes a 7-tool AI SDK surface (`queryJob`, `fetchAudit`, `compareRuns`, `longitudinal`, `tierBuckets`, `renderTemplate`, `runFreshAudit`) so a browser-side agent can interrogate jobs without re-implementing aggregation.
+- CLI: `bad reports generate --job <id> --template <leaderboard|longitudinal|batch-comparison> [--top N --by-type X --buckets 10,100 --narrate --out file.md]`.
+
+**Tests:** +55 across `jobs-store`, `jobs-queue`, `jobs-cost-estimate`, `discover-wayback`, `reports-aggregate`, `reports-templates`, `reports-tools`. Total: 1448 passing.
diff --git a/bench/design/ethics-fixtures/finance-disclosed-fees.html b/bench/design/ethics-fixtures/finance-disclosed-fees.html
@@ -0,0 +1,24 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>SwiftPay — review and confirm</title>
+</head>
+<body>
+  <main>
+    <h1>Review your payment</h1>
+    <dl>
+      <dt>You send</dt><dd>$1,000.00 USD</dd>
+      <dt>Network fee</dt><dd>$2.50</dd>
+      <dt>FX rate</dt><dd>1 USD = 0.92 EUR</dd>
+      <dt>Conversion fee (1.5%)</dt><dd>$15.00</dd>
+      <dt>Tax (VAT)</dt><dd>$3.40</dd>
+      <dt>Total charged</dt><dd>$1,020.90 USD</dd>
+      <dt>Recipient gets</dt><dd>€905.30 EUR</dd>
+      <dt>APR (effective annual percentage rate, if held)</dt><dd>0.0%</dd>
+    </dl>
+    <p>All fees, taxes, and FX are disclosed above. By tapping Pay you authorize SwiftPay to charge $1,020.90.</p>
+    <button>Pay now</button>
+  </main>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/finance-hidden-fees.html b/bench/design/ethics-fixtures/finance-hidden-fees.html
@@ -0,0 +1,16 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>SwiftPay — Send money</title>
+</head>
+<body>
+  <main>
+    <h1>Send $1,000 to Alex</h1>
+    <p>You send: $1,000.00</p>
+    <p>Recipient gets: $1,000.00</p>
+    <button>Pay now</button>
+    <p style="font-size:6px;color:#fff">Network, FX, conversion, and processing fees may apply. See terms.</p>
+  </main>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/gdpr-no-consent.html b/bench/design/ethics-fixtures/gdpr-no-consent.html
@@ -0,0 +1,16 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>EuroShop — Home</title>
+</head>
+<body>
+  <header><h1>EuroShop</h1></header>
+  <main>
+    <h2>Best deals in Berlin today</h2>
+    <p>Trending products curated for the EU market.</p>
+    <button>Shop now</button>
+  </main>
+  <footer><p>EuroShop GmbH 2026</p></footer>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/gdpr-with-consent.html b/bench/design/ethics-fixtures/gdpr-with-consent.html
@@ -0,0 +1,31 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>EuroShop — Home</title>
+</head>
+<body>
+  <div role="dialog" aria-label="Cookie consent">
+    <h2>We use cookies</h2>
+    <p>We use necessary and optional cookies. Manage your preferences below.</p>
+    <fieldset>
+      <legend>Cookie consent preferences</legend>
+      <label><input type="checkbox" checked disabled /> Necessary</label>
+      <label><input type="checkbox" /> Analytics</label>
+      <label><input type="checkbox" /> Marketing</label>
+    </fieldset>
+    <button>Accept selected</button>
+    <button>Reject all</button>
+  </div>
+  <header><h1>EuroShop</h1></header>
+  <main>
+    <h2>Best deals in Berlin today</h2>
+    <p>Trending products curated for the EU market.</p>
+    <button>Shop now</button>
+  </main>
+  <footer>
+    <a href="/privacy">Privacy policy</a> ·
+    <a href="/privacy/choices">Your privacy choices (Do Not Sell or Share)</a>
+  </footer>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/kids-age-gated.html b/bench/design/ethics-fixtures/kids-age-gated.html
@@ -0,0 +1,19 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>SuperKids — verify your age</title>
+</head>
+<body>
+  <main>
+    <h1>Welcome to SuperKids</h1>
+    <form>
+      <label for="dob">Date of birth (we verify your age before letting you play):</label>
+      <input id="dob" name="dob" type="date" required />
+      <p>Enter your age so we can keep things appropriate. Parents will be notified.</p>
+      <button type="submit">Continue</button>
+    </form>
+    <p>We only collect what is strictly necessary for the activity. Optional fields are clearly marked.</p>
+  </main>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/kids-dark-pattern.html b/bench/design/ethics-fixtures/kids-dark-pattern.html
@@ -0,0 +1,17 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>SuperKids Coins!</title>
+</head>
+<body>
+  <main>
+    <h1>Hi friend! Get 100 SuperCoins NOW!</h1>
+    <p style="color:red;font-size:32px"><strong>HURRY! Only 3 left!</strong></p>
+    <p>Tap YES or your unicorn will be sad forever 🥺</p>
+    <button style="background:lime;font-size:48px">YES, give me coins!</button>
+    <a href="#" style="font-size:8px;color:#ccc">no thanks, I hate fun</a>
+    <p style="font-size:8px;color:#ddd">Cost: $4.99 charged to mom's card. Auto-renews monthly. Cancel by writing a letter.</p>
+  </main>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/medical-no-dosage.html b/bench/design/ethics-fixtures/medical-no-dosage.html
@@ -0,0 +1,18 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>RxMed — Order amoxicillin</title>
+</head>
+<body>
+  <header><h1>RxMed Pharmacy</h1></header>
+  <main>
+    <h2>Amoxicillin 500mg</h2>
+    <p>Take this medication as your doctor recommends.</p>
+    <p>Available in 30-tablet packs.</p>
+    <button>Add to cart</button>
+    <button>Refill prescription</button>
+  </main>
+  <footer><p>RxMed &copy; 2026</p></footer>
+</body>
+</html>
diff --git a/bench/design/ethics-fixtures/medical-with-dosage.html b/bench/design/ethics-fixtures/medical-with-dosage.html
@@ -0,0 +1,24 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <title>RxMed — Order amoxicillin (Rx)</title>
+</head>
+<body>
+  <header><h1>RxMed Pharmacy</h1></header>
+  <main>
+    <h2>Amoxicillin 500mg — Prescribing Information</h2>
+    <section aria-labelledby="dose-h">
+      <h3 id="dose-h">Dosage and administration</h3>
+      <p>Adults: 500 mg orally every 8 hours. Adjust dosage for renal impairment.</p>
+    </section>
+    <section aria-labelledby="warn-h">
+      <h3 id="warn-h">Warnings and contraindications</h3>
+      <p>Contraindication: hypersensitivity to penicillin.</p>
+      <p>Adverse effects: nausea, diarrhea, rare anaphylaxis. Report any side effect to MedWatch (FDA 1088).</p>
+    </section>
+    <button>Add to cart</button>
+    <p><a href="/medwatch">Report a side effect</a> (MedWatch).</p>
+  </main>
+</body>
+</html>
diff --git a/package.json b/package.json
@@ -133,6 +133,7 @@
     "pixelmatch": "^7.1.0",
     "playwright": "^1.40.0",
     "pngjs": "^7.0.0",
+    "tsx": "^4.21.0",
     "typescript": "^5.3.0",
     "vitest": "^4.0.18"
   }