Luis85 · Luis85 · May 17, 2026 · May 17, 2026 · May 17, 2026 · May 17, 2026
@@ -13,6 +13,7 @@ Detailed documentation for the POC. Start with the project [README](../README.md
 |---|---|
 | [`architecture.md`](architecture.md) | You want the full architecture picture — system map, user flow, data flow, engine internals, module graph, with Mermaid diagrams. **Start here.** |
 | [`workflow.md`](workflow.md) | You want to run the `plan` → paste → `validate` → `report` loop end-to-end. |
+| [`report-reference.md`](report-reference.md) | You want a single-page overview of the HTML report — section-by-section walkthrough, the five perspectives that shaped it, what's still open. |
 | [`dsl-reference.md`](dsl-reference.md) | You're writing or reading a rule file and need the full YAML grammar — every operator, every grouping construct. |
 | [`audit-trail.md`](audit-trail.md) | You need to replay a verdict, diff two verdicts, or map the audit trail to EU AI Act / ISO 42001 requirements. |
 | [`compliance.md`](compliance.md) | You're scoping the pattern against a regulation or standard — what the POC ticks natively, what the adopter must still provide, what's out of scope. |

@@ -0,0 +1,170 @@
+---
+title: Report reference
+folder: experiments/rule-engine-poc/docs
+description: A single-page overview of the HTML report — what each section is for, what the five product perspectives that shaped it found, what's implemented, and what's still open.
+entry_point: false
+---
+
+# Report reference
+
+This is the meta-doc for the HTML report itself — the user-facing artifact of the POC. It consolidates what `architecture.md`, `workflow.md`, `audit-trail.md`, and the five wave-4 research artifacts (`research/17`–`21`) say about the report into one place.
+
+## Contents
+
+1. [What the report is](#1-what-the-report-is)
+2. [Section-by-section walkthrough](#2-section-by-section-walkthrough)
+3. [The five product perspectives that shaped it](#3-the-five-product-perspectives-that-shaped-it)
+4. [What got implemented (wave-4 delta)](#4-what-got-implemented-wave-4-delta)
+5. [What's still open](#5-whats-still-open)
+6. [Generating one](#6-generating-one)
+7. [Reading one](#7-reading-one)
+
+---
+
+## 1. What the report is
+
+A self-contained HTML file rendered by `src/html-report.ts` from a `VerdictResult`. One report per target; one file per `npm run report` invocation. Inline CSS, no JavaScript, no external assets — it survives email forwarding, Slack attachment, S3 retention, and offline viewing.
+
+The report is the *only* document most readers will see. The terminal output and the JSON `--json` mode are for CI and operators; the HTML is for everyone else (PR reviewer, PM, EM, QA, compliance officer, auditor, the author a week later).
+
+Three committed samples under [`research/sample-reports/`](../research/sample-reports/) show the three primary verdict shapes — `blocked`, `needs-attention`, `ready-to-progress`.
+
+## 2. Section-by-section walkthrough
+
+The report renders top-to-bottom in this order. The order matters: it follows the reader's actual scan path established by `research/17` (UX audit) and `research/20` (auditor reading path).
+
+| Section | Job | Section header at render | Source |
+|---|---|---|---|
+| **System-identity header** | Tell a cold reader *what this is* — engine version + prominent timestamp | (no header — runs above the verdict card) | `research/20` Art. 13 "provider identity" gap |
+| **Verdict tile** | Categorical tier in 2-second-scan colour: blocked / needs-attention / ready-to-progress / unknown | (the headline) | engine `verdict` |
+| **Stats line** | "N rule(s) fired · M action(s) to take" — quantifies how contested the decision is | (under the tile) | `result.evaluations`, `result.actions` |
+| **Blocker-by-absence banner** | "X rules could not be evaluated because the LLM did not supply Y, Z, W" — yellow, adjacent to verdict | (conditional banner) | `research/21` skim-trap finding |
+| **Skip-validate banner** | "WARNING: validation gate was skipped" — when `--skip-validate` was set | (conditional banner) | `research/18` + `research/21` trust calibration |
+| **Verdict-tier + glyph legend** | Collapsed `<details>` explaining blocked / needs-attention / ready-to-progress / unknown + `[+] / [-] / [?]` glyph meanings | "Glossary." | `research/17` + `research/20` |
+| **Weighted tally** | Per-tier weight totals, side-by-side with actions | "Weighted tally." | engine `weightedTally` |
+| **Suggested actions** | Imperative-voice action sentences in priority-of-cause order (not alphabetical) | "Take these actions." | engine `evaluations` walked in priority desc; `rules/action-glossary.yaml` if present |
+| **Extraction flags** | The LLM's structured output as a table | "Extraction flags." | `ctx.flags` |
+| **What fired** | Matched rules only, in priority order, each with rule id + description + flags it matched on + actions it contributed | "What fired." | `result.evaluations` filtered to `matched === true` |
+| **Audit trail** | Every rule evaluation, matched + skipped. Skipped rules collapse to `<details>` summary by default | "Audit trail." | `result.evaluations` |
+| **Reproduce block** | Shell-quoted command + the three replay anchors (engine version + ruleset hash + flags hash) | "How to reproduce." | `research/20` Art. 12 replay manifest |
+| **Provenance** | Hash preamble + 12-char truncated hashes + file paths | "Provenance." | `result.rulesetHash`, `result.flagsHash`, `result.engineVersion` |
+
+The CSS is inline and uses a 3-tier severity palette (red / amber / green) with non-colour-only signals (glyphs, row washes, section headings). A `@media (max-width: 540px)` rule collapses the summary grid to a single column on phones.
+
+## 3. The five product perspectives that shaped it
+
+Wave 4 dispatched five subagents in parallel against three committed sample renders. Each reviewed the report through a different lens. The convergent findings drove the v3 rebuild in the wave-4 implementer pass.
+
+### UX (`research/17-report-ux-audit.md`)
+
+> "The audit trail buries what matters. With ~21 rules and 1–2 matches per sample, the page is ~95% 'did not match' content."
+
+Top recommendations: a **What fired** section above the full audit trail; collapse-by-default for skipped rules; replace "X of 21 rules matched" coverage text with per-tier-appropriate phrasing; sort actions by priority-of-cause not alphabetical; a `cond--miss` row-wash to match `cond--missing`'s amber; a `@media (max-width: 540px)` single-column fallback.
+
+### Stakeholder strategy (`research/18-report-stakeholders.md`)
+
+> "The report is one artifact serving six different first-fields — PR-reviewers want the verdict pill, compliance wants the provenance block, PMs want suggested actions, auditors want the hashes, authors want the flags."
+
+Top recommendations: expand action slugs (`kick-ci`) to human sentences (`"Re-run the failing CI job."`) via an `actions[].human` field — implemented as a sidecar glossary in `rules/action-glossary.yaml`; introduce a `label_set` config (default `dev`; `pm`, `qa`, `compliance` as presentational overrides) so headline labels match the reader; flag a portfolio dashboard as the first hosted-SaaS gravity seam — *do not build it yet*.
+
+### Brand (`research/19-report-brand-review.md`)
+
+Verdict: **pass-with-findings** (not S1-blocking under the sandbox scope). On-temperament (no emoji, no gradients, no icons, restrained density, ASCII `[+]/[-]/[?]` glyphs correctly used as monospace-iconography). Off-token: 18 distinct literal hex values, hand-picked font stacks, near-white page background where Specorator calls for cream. Section headers should be sentence-case with periods — implemented. Open ADR-shaped decision: Specorator has no red token; the `blocked` tier currently uses literal `#fdecea / #d8281b / #7a160d` and stays that way until graduation.
+
+### Auditor readability (`research/20-report-auditor-readability.md`)
+
+> "30-second test passes... what the 30-second test fails on: the report does not name what kind of system this is, who built it, what version of the workflow it governs, or what the verdict is binding against."
+
+Implemented: system-identity header (engine version + prominent timestamp), reproduce-block with the three replay anchors, verdict-tier glossary, glyph legend. Still open: provider identity / contact, model-card link, capabilities-and-limitations sentence, expected-lifetime statement, reviewer-of-record field. Closes `research/02`'s "human-readable rationale presentation" open item; the remainder is governance, not engineering.
+
+### Misread risks (`research/21-report-misread-risks.md`)
+
+Three flagged misread paths:
+
+1. **The skim trap** — a busy reader looks only at the verdict tile and action list, missing context. Implemented mitigation: blocker-by-absence banner is rendered at the same visual weight as the verdict card.
+2. **`verified` badge as trust trap** — a green pill saying `verified` will read as "extraction verified" when it only means "extraction is bound to current inputs". Implemented mitigation: tooltip on the badge explicitly says *bound to current inputs, not flag-correctness*. Plus the `--skip-validate` banner is now visually equal-weight to the verdict so a skipped-validation run cannot pass undetected.
+3. **Blocker-by-absence is the most dangerous skim path** — a high-priority blocker whose input flag was never extracted simply doesn't fire. Implemented mitigation: dedicated banner naming the missing flags AND the count of un-evaluable rules (counted using `matched === false`, fixed in Codex round 12 to exclude rules that fired via `when.any` despite one missing branch).
+
+## 4. What got implemented (wave-4 delta)
+
+The 12 changes that landed across the wave-4 implementer pass (Agent B's RALPH loop) and the subsequent Codex round 11–14 hardening:
+
+| # | Change | Research source |
+|---|---|---|
+| 1 | "What fired" section above audit trail | `17`, `20`, `21` |
+| 2 | Non-matched rules collapsed via `<details>` | `17` |
+| 3 | Blocker-by-absence banner naming missing flags | `21`, `17` |
+| 4 | Suggested actions in priority-of-cause order | `17` |
+| 5 | Action human-sentences from glossary | `18` |
+| 6 | Provenance preamble + reproduce block + 12-char hash truncation | `17`, `20`, `18` |
+| 7 | System-identity header + prominent timestamp | `20` |
+| 8 | Verdict-tier + glyph legend in collapsed `<details>` | `20`, `17` |
+| 9 | `cond--miss` row wash matching `cond--missing` amber | `17` |
+| 10 | `@media (max-width: 540px)` single-column fallback | `17` |
+| 11 | `--skip-validate` banner + `verified` badge tooltip | `18`, `21` |
+| 12 | Sentence-case headers + imperative voice ("Take these actions.") | `19` |
+
+Plus three follow-up Codex hardenings that were caught after the wave-4 push:
+
+- **Round 11** (`90f3fe1`) — `openInBrowser` waits for the spawned process to exit cleanly (not just `spawn`); `takeOpt` rejects missing values for `--config` / `--target`.
+- **Round 12** (`eb01077`) — `missingFlagNames` only counts rules whose final outcome was determined by absence (excludes `when.any` rules that matched another branch); reproduce-command paths are shell-quoted.
+- **Round 13** (`003a05e`) — single-shot `cli.ts::takeOption` rejects missing `--html` values.
+
+Test surface for the report: 28 tests in `test/html-report.test.ts` plus the report-flow integration tests in `test/report-flow.test.ts`.
+
+## 5. What's still open
+
+Deferred, with the bucket each lives in:
+
+| Item | Source | Bucket |
+|---|---|---|
+| `label_set` config (dev / pm / qa / compliance presets) | `research/18` | Strategy slice 14–18 |
+| Reader-specific export modes (PDF, markdown, Slack-friendly text) | `research/18` | Strategy slice |
+| Portfolio dashboard (one HTML across N targets) | `research/18` | Hosted-SaaS gravity seam — explicitly *do not build* per strategist |
+| Provider identity / model card / capabilities-and-limitations sentence | `research/20` | Governance (compliance.md "what's not in this POC") |
+| Reviewer-of-record field, override workflow | `research/20` | Governance |
+| Brand-token migration (replace 18 literal hex values with vars) | `research/19` | ADR at graduation |
+| Diff-against-previous-run | `research/21` | Production prep |
+| Confidence / uncertainty surface | `research/20` + `research/21` | Calibration study first |
+| RAT-A / RAT-B / RAT-C / RAT-D / RAT-E / RAT-F | `research/07`, `research/14` | Discovery activity — needs users, not more engineering |
+
+## 6. Generating one
+
+```bash
+# Plan (writes the prompt + sidecar)
+npm run plan -- --target <id>
+
+# User pastes the prompt into Claude / ChatGPT / Gemini, saves JSON to
+# extractions/<id>.json.
+
+# Validate (optional sanity check)
+npm run validate -- --target <id>
+
+# Report (renders HTML, opens browser best-effort)
+npm run report -- --target <id>
+# Or without opening a browser:
+npm run report -- --target <id> --no-open
+```
+
+Exit codes: `0` no blockers, `1` at least one `blocked` verdict, `2` missing / malformed extraction.
+
+For testing without the AI loop (single-shot fixture flow):
+
+```bash
+npx tsx src/cli.ts rules/quality-gates.yaml fixtures/blocked-missing-ears.json --html /tmp/preview.html
+```
+
+## 7. Reading one
+
+For the **report consumer** (not the POC operator):
+
+1. **Verdict tile** — colour and label. That's the answer.
+2. **Stats line** — how many rules fired? How many actions to take?
+3. **Banners** (if present) — blocker-by-absence flags missing inputs; skip-validate flag means the validation gate was bypassed (treat the verdict as advisory).
+4. **Take these actions** — the imperative-voice human sentences; if `[code](#)` slug is shown, hover for the technical name.
+5. **What fired** — the rules that drove the verdict, in priority-of-cause order. Each is a 1-paragraph card; expand the audit trail for everything that *didn't* fire.
+6. **Provenance** — the three hashes (engine version, ruleset, flags) let you verify the report came from a specific tuple. The reproduce-command block lets you re-run it locally.
+
+A reader who reads only steps 1–4 should still get the answer correct. The rest is depth on demand.
+
+See [`docs/audit-trail.md`](audit-trail.md) for replay mechanics and [`docs/compliance.md`](compliance.md) for which sections speak to which regulation.
@@ -1,8 +1,10 @@
 import { readdirSync } from "node:fs";
 import { join } from "node:path";
+import { fileURLToPath } from "node:url";
 import { spawnSync } from "node:child_process";
 
-const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
+// fileURLToPath handles percent-decoding and Windows drive paths.
+const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
 const rules = "rules/quality-gates.yaml";
 
 const files = readdirSync(fixturesDir)

@@ -1,8 +1,11 @@
 import { readdirSync, mkdirSync } from "node:fs";
 import { join, basename } from "node:path";
+import { fileURLToPath } from "node:url";
 import { spawnSync } from "node:child_process";
 
-const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
+// fileURLToPath handles percent-decoding and Windows drive paths;
+// `.pathname` alone breaks for both (Codex round 15 P2).
+const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
 const rules = "rules/quality-gates.yaml";
 const reportsDir = "reports";
 

@@ -268,13 +268,24 @@ export function renderHtmlReport(
       ? `<div class="banner banner--skip" role="alert"><strong>WARNING:</strong> validation gate was skipped (<code>--skip-validate</code>). Verdict and provenance are NOT verified against the flag schema or forbidden-fields policy.</div>`
       : "";
 
-  // Reproduce command: assembled from the same fields plan/report use.
-  // Codex round 12 P2: quote paths so paths with spaces or shell
-  // metacharacters (e.g., "My Projects/rules.yaml") don't break the
-  // command. Single-quote shell-escape: replace any ' inside the
-  // path with the four-char sequence '\'' .
-  const shellQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
-  const reproCmd = `npx tsx src/cli.ts ${shellQuote(ctx.rulesPath)} ${shellQuote(ctx.flagsPath)} --html <out.html> --quiet`;
+  // Reproduce command: render three flavours because the supported
+  // shells disagree on quoting AND on variable expansion:
+  // - POSIX (bash / zsh / sh): single-quote escape (' becomes '\'').
+  //   Single quotes suppress $var expansion.
+  // - cmd.exe: double-quote escape (" becomes ""). cmd doesn't
+  //   expand $var; it expands %VAR%, but our paths don't carry %.
+  // - PowerShell: single-quote escape (' becomes ''). Double quotes
+  //   in PowerShell EXPAND $var and $(), which can mutate the path
+  //   (Codex round 19 P2).
+  const posixQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
+  const cmdQuote = (s: string): string => `"${s.replace(/"/g, '""')}"`;
+  const psQuote = (s: string): string => `'${s.replace(/'/g, "''")}'`;
+  // Use a literal filename, NOT `<out.html>`: angle brackets are shell
+  // I/O redirection on all three, so a copy-paste would silently send
+  // --html no value (Codex round 18 P2).
+  const reproCmdPosix = `npx tsx src/cli.ts ${posixQuote(ctx.rulesPath)} ${posixQuote(ctx.flagsPath)} --html out.html --quiet`;
+  const reproCmdCmd = `npx tsx src/cli.ts ${cmdQuote(ctx.rulesPath)} ${cmdQuote(ctx.flagsPath)} --html out.html --quiet`;
+  const reproCmdPwsh = `npx tsx src/cli.ts ${psQuote(ctx.rulesPath)} ${psQuote(ctx.flagsPath)} --html out.html --quiet`;
 
   return `<!doctype html>
 <html lang="en">
@@ -491,7 +502,12 @@ export function renderHtmlReport(
     </p>
     <div class="reproduce">
       <p>How to reproduce &mdash; run from <code>experiments/rule-engine-poc/</code>:</p>
-      <pre><code>${esc(reproCmd)}</code></pre>
+      <p class="repro-label">POSIX (macOS, Linux, WSL, Git Bash):</p>
+      <pre><code>${esc(reproCmdPosix)}</code></pre>
+      <p class="repro-label">Windows cmd.exe:</p>
+      <pre><code>${esc(reproCmdCmd)}</code></pre>
+      <p class="repro-label">PowerShell (Windows / cross-platform):</p>
+      <pre><code>${esc(reproCmdPwsh)}</code></pre>
       <p>Then verify the three hashes above match the values in the regenerated report.</p>
     </div>
   </section>

@@ -137,6 +137,15 @@ function validate(
   if (typeof rule.priority !== "number") {
     throw new Error(`Rule '${rule.id}' missing numeric 'priority'`);
   }
+  // Codex round 15 P2: NaN/Infinity priorities silently break the
+  // documented sort order (b.priority - a.priority returns NaN, treated
+  // as 0), reordering the audit trail unpredictably. Same fail-fast
+  // discipline as weight + gt + lt.
+  if (!Number.isFinite(rule.priority)) {
+    throw new Error(
+      `Rule '${rule.id}' has non-finite 'priority' (got ${String(rule.priority)})`,
+    );
+  }
 }
 
 const CONDITION_OPS = [

@@ -104,13 +104,20 @@ export function validateExtraction(
       continue;
     }
     if (value === null) {
-      warnings.push({
-        severity: "warning",
-        code: "null-value-omit-instead",
+      // Codex round 16 P1: previously a warning, but the engine's
+      // `hasOwnProperty` presence check treats null as PRESENT — so
+      // `exists`/`ne` rules behave differently against {flag: null}
+      // than against {} despite the validator's old "null ≈ missing"
+      // claim. The right fix is to refuse null at the gate so the
+      // engine never sees it; LLMs are instructed to omit unknowns.
+      errors.push({
+        severity: "error",
+        code: "null-value-not-allowed",
         path: key,
         message:
-          `Flag '${key}' is null; prefer omitting unknowns over emitting null. ` +
-          `The engine will treat null and missing identically.`,
+          `Flag '${key}' is null; omit the field instead. ` +
+          `The engine's presence check treats null as PRESENT, which can ` +
+          `silently change verdicts for rules using 'exists' or 'ne'.`,
       });
       continue;
     }