diff --git a/experiments/rule-engine-poc/docs/README.md b/experiments/rule-engine-poc/docs/README.md
index a7c1b6e24..168dfd941 100644
--- a/experiments/rule-engine-poc/docs/README.md
+++ b/experiments/rule-engine-poc/docs/README.md
@@ -13,6 +13,7 @@ Detailed documentation for the POC. Start with the project [README](../README.md
|---|---|
| [`architecture.md`](architecture.md) | You want the full architecture picture — system map, user flow, data flow, engine internals, module graph, with Mermaid diagrams. **Start here.** |
| [`workflow.md`](workflow.md) | You want to run the `plan` → paste → `validate` → `report` loop end-to-end. |
+| [`report-reference.md`](report-reference.md) | You want a single-page overview of the HTML report — section-by-section walkthrough, the five perspectives that shaped it, what's still open. |
| [`dsl-reference.md`](dsl-reference.md) | You're writing or reading a rule file and need the full YAML grammar — every operator, every grouping construct. |
| [`audit-trail.md`](audit-trail.md) | You need to replay a verdict, diff two verdicts, or map the audit trail to EU AI Act / ISO 42001 requirements. |
| [`compliance.md`](compliance.md) | You're scoping the pattern against a regulation or standard — what the POC ticks natively, what the adopter must still provide, what's out of scope. |
diff --git a/experiments/rule-engine-poc/docs/report-reference.md b/experiments/rule-engine-poc/docs/report-reference.md
new file mode 100644
index 000000000..f3e2919cd
--- /dev/null
+++ b/experiments/rule-engine-poc/docs/report-reference.md
@@ -0,0 +1,170 @@
+---
+title: Report reference
+folder: experiments/rule-engine-poc/docs
+description: A single-page overview of the HTML report — what each section is for, what the five product perspectives that shaped it found, what's implemented, and what's still open.
+entry_point: false
+---
+
+# Report reference
+
+This is the meta-doc for the HTML report itself — the user-facing artifact of the POC. It consolidates what `architecture.md`, `workflow.md`, `audit-trail.md`, and the five wave-4 research artifacts (`research/17`–`21`) say about the report into one place.
+
+## Contents
+
+1. [What the report is](#1-what-the-report-is)
+2. [Section-by-section walkthrough](#2-section-by-section-walkthrough)
+3. [The five product perspectives that shaped it](#3-the-five-product-perspectives-that-shaped-it)
+4. [What got implemented (wave-4 delta)](#4-what-got-implemented-wave-4-delta)
+5. [What's still open](#5-whats-still-open)
+6. [Generating one](#6-generating-one)
+7. [Reading one](#7-reading-one)
+
+---
+
+## 1. What the report is
+
+A self-contained HTML file rendered by `src/html-report.ts` from a `VerdictResult`. One report per target; one file per `npm run report` invocation. Inline CSS, no JavaScript, no external assets — it survives email forwarding, Slack attachment, S3 retention, and offline viewing.
+
+The report is the *only* document most readers will see. The terminal output and the JSON `--json` mode are for CI and operators; the HTML is for everyone else (PR reviewer, PM, EM, QA, compliance officer, auditor, the author a week later).
+
+Three committed samples under [`research/sample-reports/`](../research/sample-reports/) show the three primary verdict shapes — `blocked`, `needs-attention`, `ready-to-progress`.
+
+## 2. Section-by-section walkthrough
+
+The report renders top-to-bottom in this order. The order matters: it follows the reader's actual scan path established by `research/17` (UX audit) and `research/20` (auditor reading path).
+
+| Section | Job | Section header at render | Source |
+|---|---|---|---|
+| **System-identity header** | Tell a cold reader *what this is* — engine version + prominent timestamp | (no header — runs above the verdict card) | `research/20` Art. 13 "provider identity" gap |
+| **Verdict tile** | Categorical tier in 2-second-scan colour: blocked / needs-attention / ready-to-progress / unknown | (the headline) | engine `verdict` |
+| **Stats line** | "N rule(s) fired · M action(s) to take" — quantifies how contested the decision is | (under the tile) | `result.evaluations`, `result.actions` |
+| **Blocker-by-absence banner** | "X rules could not be evaluated because the LLM did not supply Y, Z, W" — yellow, adjacent to verdict | (conditional banner) | `research/21` skim-trap finding |
+| **Skip-validate banner** | "WARNING: validation gate was skipped" — when `--skip-validate` was set | (conditional banner) | `research/18` + `research/21` trust calibration |
+| **Verdict-tier + glyph legend** | Collapsed `` explaining blocked / needs-attention / ready-to-progress / unknown + `[+] / [-] / [?]` glyph meanings | "Glossary." | `research/17` + `research/20` |
+| **Weighted tally** | Per-tier weight totals, side-by-side with actions | "Weighted tally." | engine `weightedTally` |
+| **Suggested actions** | Imperative-voice action sentences in priority-of-cause order (not alphabetical) | "Take these actions." | engine `evaluations` walked in priority desc; `rules/action-glossary.yaml` if present |
+| **Extraction flags** | The LLM's structured output as a table | "Extraction flags." | `ctx.flags` |
+| **What fired** | Matched rules only, in priority order, each with rule id + description + flags it matched on + actions it contributed | "What fired." | `result.evaluations` filtered to `matched === true` |
+| **Audit trail** | Every rule evaluation, matched + skipped. Skipped rules collapse to `` summary by default | "Audit trail." | `result.evaluations` |
+| **Reproduce block** | Shell-quoted command + the three replay anchors (engine version + ruleset hash + flags hash) | "How to reproduce." | `research/20` Art. 12 replay manifest |
+| **Provenance** | Hash preamble + 12-char truncated hashes + file paths | "Provenance." | `result.rulesetHash`, `result.flagsHash`, `result.engineVersion` |
+
+The CSS is inline and uses a 3-tier severity palette (red / amber / green) with non-colour-only signals (glyphs, row washes, section headings). A `@media (max-width: 540px)` rule collapses the summary grid to a single column on phones.
+
+## 3. The five product perspectives that shaped it
+
+Wave 4 dispatched five subagents in parallel against three committed sample renders. Each reviewed the report through a different lens. The convergent findings drove the v3 rebuild in the wave-4 implementer pass.
+
+### UX (`research/17-report-ux-audit.md`)
+
+> "The audit trail buries what matters. With ~21 rules and 1–2 matches per sample, the page is ~95% 'did not match' content."
+
+Top recommendations: a **What fired** section above the full audit trail; collapse-by-default for skipped rules; replace "X of 21 rules matched" coverage text with per-tier-appropriate phrasing; sort actions by priority-of-cause not alphabetical; a `cond--miss` row-wash to match `cond--missing`'s amber; a `@media (max-width: 540px)` single-column fallback.
+
+### Stakeholder strategy (`research/18-report-stakeholders.md`)
+
+> "The report is one artifact serving six different first-fields — PR-reviewers want the verdict pill, compliance wants the provenance block, PMs want suggested actions, auditors want the hashes, authors want the flags."
+
+Top recommendations: expand action slugs (`kick-ci`) to human sentences (`"Re-run the failing CI job."`) via an `actions[].human` field — implemented as a sidecar glossary in `rules/action-glossary.yaml`; introduce a `label_set` config (default `dev`; `pm`, `qa`, `compliance` as presentational overrides) so headline labels match the reader; flag a portfolio dashboard as the first hosted-SaaS gravity seam — *do not build it yet*.
+
+### Brand (`research/19-report-brand-review.md`)
+
+Verdict: **pass-with-findings** (not S1-blocking under the sandbox scope). On-temperament (no emoji, no gradients, no icons, restrained density, ASCII `[+]/[-]/[?]` glyphs correctly used as monospace-iconography). Off-token: 18 distinct literal hex values, hand-picked font stacks, near-white page background where Specorator calls for cream. Section headers should be sentence-case with periods — implemented. Open ADR-shaped decision: Specorator has no red token; the `blocked` tier currently uses literal `#fdecea / #d8281b / #7a160d` and stays that way until graduation.
+
+### Auditor readability (`research/20-report-auditor-readability.md`)
+
+> "30-second test passes... what the 30-second test fails on: the report does not name what kind of system this is, who built it, what version of the workflow it governs, or what the verdict is binding against."
+
+Implemented: system-identity header (engine version + prominent timestamp), reproduce-block with the three replay anchors, verdict-tier glossary, glyph legend. Still open: provider identity / contact, model-card link, capabilities-and-limitations sentence, expected-lifetime statement, reviewer-of-record field. Closes `research/02`'s "human-readable rationale presentation" open item; the remainder is governance, not engineering.
+
+### Misread risks (`research/21-report-misread-risks.md`)
+
+Three flagged misread paths:
+
+1. **The skim trap** — a busy reader looks only at the verdict tile and action list, missing context. Implemented mitigation: blocker-by-absence banner is rendered at the same visual weight as the verdict card.
+2. **`verified` badge as trust trap** — a green pill saying `verified` will read as "extraction verified" when it only means "extraction is bound to current inputs". Implemented mitigation: tooltip on the badge explicitly says *bound to current inputs, not flag-correctness*. Plus the `--skip-validate` banner is now visually equal-weight to the verdict so a skipped-validation run cannot pass undetected.
+3. **Blocker-by-absence is the most dangerous skim path** — a high-priority blocker whose input flag was never extracted simply doesn't fire. Implemented mitigation: dedicated banner naming the missing flags AND the count of un-evaluable rules (counted using `matched === false`, fixed in Codex round 12 to exclude rules that fired via `when.any` despite one missing branch).
+
+## 4. What got implemented (wave-4 delta)
+
+The 12 changes that landed across the wave-4 implementer pass (Agent B's RALPH loop) and the subsequent Codex round 11–14 hardening:
+
+| # | Change | Research source |
+|---|---|---|
+| 1 | "What fired" section above audit trail | `17`, `20`, `21` |
+| 2 | Non-matched rules collapsed via `` | `17` |
+| 3 | Blocker-by-absence banner naming missing flags | `21`, `17` |
+| 4 | Suggested actions in priority-of-cause order | `17` |
+| 5 | Action human-sentences from glossary | `18` |
+| 6 | Provenance preamble + reproduce block + 12-char hash truncation | `17`, `20`, `18` |
+| 7 | System-identity header + prominent timestamp | `20` |
+| 8 | Verdict-tier + glyph legend in collapsed `` | `20`, `17` |
+| 9 | `cond--miss` row wash matching `cond--missing` amber | `17` |
+| 10 | `@media (max-width: 540px)` single-column fallback | `17` |
+| 11 | `--skip-validate` banner + `verified` badge tooltip | `18`, `21` |
+| 12 | Sentence-case headers + imperative voice ("Take these actions.") | `19` |
+
+Plus three follow-up Codex hardenings that were caught after the wave-4 push:
+
+- **Round 11** (`90f3fe1`) — `openInBrowser` waits for the spawned process to exit cleanly (not just `spawn`); `takeOpt` rejects missing values for `--config` / `--target`.
+- **Round 12** (`eb01077`) — `missingFlagNames` only counts rules whose final outcome was determined by absence (excludes `when.any` rules that matched another branch); reproduce-command paths are shell-quoted.
+- **Round 13** (`003a05e`) — single-shot `cli.ts::takeOption` rejects missing `--html` values.
+
+Test surface for the report: 28 tests in `test/html-report.test.ts` plus the report-flow integration tests in `test/report-flow.test.ts`.
+
+## 5. What's still open
+
+Deferred, with the bucket each lives in:
+
+| Item | Source | Bucket |
+|---|---|---|
+| `label_set` config (dev / pm / qa / compliance presets) | `research/18` | Strategy slice 14–18 |
+| Reader-specific export modes (PDF, markdown, Slack-friendly text) | `research/18` | Strategy slice |
+| Portfolio dashboard (one HTML across N targets) | `research/18` | Hosted-SaaS gravity seam — explicitly *do not build* per strategist |
+| Provider identity / model card / capabilities-and-limitations sentence | `research/20` | Governance (compliance.md "what's not in this POC") |
+| Reviewer-of-record field, override workflow | `research/20` | Governance |
+| Brand-token migration (replace 18 literal hex values with vars) | `research/19` | ADR at graduation |
+| Diff-against-previous-run | `research/21` | Production prep |
+| Confidence / uncertainty surface | `research/20` + `research/21` | Calibration study first |
+| RAT-A / RAT-B / RAT-C / RAT-D / RAT-E / RAT-F | `research/07`, `research/14` | Discovery activity — needs users, not more engineering |
+
+## 6. Generating one
+
+```bash
+# Plan (writes the prompt + sidecar)
+npm run plan -- --target
+
+# User pastes the prompt into Claude / ChatGPT / Gemini, saves JSON to
+# extractions/.json.
+
+# Validate (optional sanity check)
+npm run validate -- --target
+
+# Report (renders HTML, opens browser best-effort)
+npm run report -- --target
+# Or without opening a browser:
+npm run report -- --target --no-open
+```
+
+Exit codes: `0` no blockers, `1` at least one `blocked` verdict, `2` missing / malformed extraction.
+
+For testing without the AI loop (single-shot fixture flow):
+
+```bash
+npx tsx src/cli.ts rules/quality-gates.yaml fixtures/blocked-missing-ears.json --html /tmp/preview.html
+```
+
+## 7. Reading one
+
+For the **report consumer** (not the POC operator):
+
+1. **Verdict tile** — colour and label. That's the answer.
+2. **Stats line** — how many rules fired? How many actions to take?
+3. **Banners** (if present) — blocker-by-absence flags missing inputs; skip-validate flag means the validation gate was bypassed (treat the verdict as advisory).
+4. **Take these actions** — the imperative-voice human sentences; if `[code](#)` slug is shown, hover for the technical name.
+5. **What fired** — the rules that drove the verdict, in priority-of-cause order. Each is a 1-paragraph card; expand the audit trail for everything that *didn't* fire.
+6. **Provenance** — the three hashes (engine version, ruleset, flags) let you verify the report came from a specific tuple. The reproduce-command block lets you re-run it locally.
+
+A reader who reads only steps 1–4 should still get the answer correct. The rest is depth on demand.
+
+See [`docs/audit-trail.md`](audit-trail.md) for replay mechanics and [`docs/compliance.md`](compliance.md) for which sections speak to which regulation.
diff --git a/experiments/rule-engine-poc/scripts/run-all-fixtures.mjs b/experiments/rule-engine-poc/scripts/run-all-fixtures.mjs
index 50412f68d..11e09fe0c 100644
--- a/experiments/rule-engine-poc/scripts/run-all-fixtures.mjs
+++ b/experiments/rule-engine-poc/scripts/run-all-fixtures.mjs
@@ -1,8 +1,10 @@
import { readdirSync } from "node:fs";
import { join } from "node:path";
+import { fileURLToPath } from "node:url";
import { spawnSync } from "node:child_process";
-const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
+// fileURLToPath handles percent-decoding and Windows drive paths.
+const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
const rules = "rules/quality-gates.yaml";
const files = readdirSync(fixturesDir)
diff --git a/experiments/rule-engine-poc/scripts/run-all-html.mjs b/experiments/rule-engine-poc/scripts/run-all-html.mjs
index 551bd3918..be2dd412f 100644
--- a/experiments/rule-engine-poc/scripts/run-all-html.mjs
+++ b/experiments/rule-engine-poc/scripts/run-all-html.mjs
@@ -1,8 +1,11 @@
import { readdirSync, mkdirSync } from "node:fs";
import { join, basename } from "node:path";
+import { fileURLToPath } from "node:url";
import { spawnSync } from "node:child_process";
-const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
+// fileURLToPath handles percent-decoding and Windows drive paths;
+// `.pathname` alone breaks for both (Codex round 15 P2).
+const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
const rules = "rules/quality-gates.yaml";
const reportsDir = "reports";
diff --git a/experiments/rule-engine-poc/src/html-report.ts b/experiments/rule-engine-poc/src/html-report.ts
index 73346291b..8a7ad95d1 100644
--- a/experiments/rule-engine-poc/src/html-report.ts
+++ b/experiments/rule-engine-poc/src/html-report.ts
@@ -268,13 +268,24 @@ export function renderHtmlReport(
? `WARNING: validation gate was skipped (--skip-validate). Verdict and provenance are NOT verified against the flag schema or forbidden-fields policy.
`
: "";
- // Reproduce command: assembled from the same fields plan/report use.
- // Codex round 12 P2: quote paths so paths with spaces or shell
- // metacharacters (e.g., "My Projects/rules.yaml") don't break the
- // command. Single-quote shell-escape: replace any ' inside the
- // path with the four-char sequence '\'' .
- const shellQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
- const reproCmd = `npx tsx src/cli.ts ${shellQuote(ctx.rulesPath)} ${shellQuote(ctx.flagsPath)} --html --quiet`;
+ // Reproduce command: render three flavours because the supported
+ // shells disagree on quoting AND on variable expansion:
+ // - POSIX (bash / zsh / sh): single-quote escape (' becomes '\'').
+ // Single quotes suppress $var expansion.
+ // - cmd.exe: double-quote escape (" becomes ""). cmd doesn't
+ // expand $var; it expands %VAR%, but our paths don't carry %.
+ // - PowerShell: single-quote escape (' becomes ''). Double quotes
+ // in PowerShell EXPAND $var and $(), which can mutate the path
+ // (Codex round 19 P2).
+ const posixQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
+ const cmdQuote = (s: string): string => `"${s.replace(/"/g, '""')}"`;
+ const psQuote = (s: string): string => `'${s.replace(/'/g, "''")}'`;
+ // Use a literal filename, NOT ``: angle brackets are shell
+ // I/O redirection on all three, so a copy-paste would silently send
+ // --html no value (Codex round 18 P2).
+ const reproCmdPosix = `npx tsx src/cli.ts ${posixQuote(ctx.rulesPath)} ${posixQuote(ctx.flagsPath)} --html out.html --quiet`;
+ const reproCmdCmd = `npx tsx src/cli.ts ${cmdQuote(ctx.rulesPath)} ${cmdQuote(ctx.flagsPath)} --html out.html --quiet`;
+ const reproCmdPwsh = `npx tsx src/cli.ts ${psQuote(ctx.rulesPath)} ${psQuote(ctx.flagsPath)} --html out.html --quiet`;
return `
@@ -491,7 +502,12 @@ export function renderHtmlReport(
How to reproduce — run from experiments/rule-engine-poc/:
-
${esc(reproCmd)}
+
POSIX (macOS, Linux, WSL, Git Bash):
+
${esc(reproCmdPosix)}
+
Windows cmd.exe:
+
${esc(reproCmdCmd)}
+
PowerShell (Windows / cross-platform):
+
${esc(reproCmdPwsh)}
Then verify the three hashes above match the values in the regenerated report.
diff --git a/experiments/rule-engine-poc/src/loader.ts b/experiments/rule-engine-poc/src/loader.ts
index 7de40ced2..a2454469f 100644
--- a/experiments/rule-engine-poc/src/loader.ts
+++ b/experiments/rule-engine-poc/src/loader.ts
@@ -137,6 +137,15 @@ function validate(
if (typeof rule.priority !== "number") {
throw new Error(`Rule '${rule.id}' missing numeric 'priority'`);
}
+ // Codex round 15 P2: NaN/Infinity priorities silently break the
+ // documented sort order (b.priority - a.priority returns NaN, treated
+ // as 0), reordering the audit trail unpredictably. Same fail-fast
+ // discipline as weight + gt + lt.
+ if (!Number.isFinite(rule.priority)) {
+ throw new Error(
+ `Rule '${rule.id}' has non-finite 'priority' (got ${String(rule.priority)})`,
+ );
+ }
}
const CONDITION_OPS = [
diff --git a/experiments/rule-engine-poc/src/validate.ts b/experiments/rule-engine-poc/src/validate.ts
index 291fee8fb..85aee8fbd 100644
--- a/experiments/rule-engine-poc/src/validate.ts
+++ b/experiments/rule-engine-poc/src/validate.ts
@@ -104,13 +104,20 @@ export function validateExtraction(
continue;
}
if (value === null) {
- warnings.push({
- severity: "warning",
- code: "null-value-omit-instead",
+ // Codex round 16 P1: previously a warning, but the engine's
+ // `hasOwnProperty` presence check treats null as PRESENT — so
+ // `exists`/`ne` rules behave differently against {flag: null}
+ // than against {} despite the validator's old "null ≈ missing"
+ // claim. The right fix is to refuse null at the gate so the
+ // engine never sees it; LLMs are instructed to omit unknowns.
+ errors.push({
+ severity: "error",
+ code: "null-value-not-allowed",
path: key,
message:
- `Flag '${key}' is null; prefer omitting unknowns over emitting null. ` +
- `The engine will treat null and missing identically.`,
+ `Flag '${key}' is null; omit the field instead. ` +
+ `The engine's presence check treats null as PRESENT, which can ` +
+ `silently change verdicts for rules using 'exists' or 'ne'.`,
});
continue;
}
diff --git a/experiments/rule-engine-poc/test/html-report.test.ts b/experiments/rule-engine-poc/test/html-report.test.ts
index 66cad50ac..22c33b2c3 100644
--- a/experiments/rule-engine-poc/test/html-report.test.ts
+++ b/experiments/rule-engine-poc/test/html-report.test.ts
@@ -287,8 +287,8 @@ describe("renderHtmlReport: provenance reframing", () => {
});
it("shell-quotes paths in the reproduce command so spaces don't break it", () => {
- // Codex round 12 P2: unquoted paths break copy-pasted reproduce
- // commands on user machines (e.g., "My Projects/...").
+ // Codex rounds 12/17/19: render three forms — POSIX, cmd.exe,
+ // PowerShell — because each uses different quoting + expansion.
const flags: ExtractionFlags = { ci_failing: true };
const result = evaluate(rules, flags);
const html = renderHtmlReport(
@@ -299,10 +299,38 @@ describe("renderHtmlReport: provenance reframing", () => {
flagsPath: "extractions with spaces/x.json",
}),
);
- // Single quotes are HTML-escaped (') in the rendered output,
- // but they decode back to ' when the user pastes the command.
+ // POSIX + PowerShell: single quotes (HTML-escaped to ').
expect(html).toContain("'My Projects/rules.yaml'");
expect(html).toContain("'extractions with spaces/x.json'");
+ // cmd.exe: double quotes (HTML-escaped to ").
+ expect(html).toContain(""My Projects/rules.yaml"");
+ expect(html).toContain(""extractions with spaces/x.json"");
+ // All three labels appear.
+ expect(html).toContain("POSIX");
+ expect(html).toContain("cmd.exe");
+ expect(html).toContain("PowerShell");
+ });
+
+ it("PowerShell repro form uses single quotes so $var stays literal", () => {
+ // Codex round 19 P2: PowerShell double-quoted strings expand
+ // $var / $(...) — single quotes suppress that. The PowerShell
+ // labelled block must use single quotes so a path like
+ // 'src/$something/x.json' is reproduced literally.
+ const flags: ExtractionFlags = { ci_failing: true };
+ const result = evaluate(rules, flags);
+ const html = renderHtmlReport(
+ result,
+ baseCtx({
+ flags,
+ rulesPath: "My$Projects/rules.yaml",
+ flagsPath: "src/$something/x.json",
+ }),
+ );
+ const psStart = html.indexOf("PowerShell");
+ expect(psStart).toBeGreaterThan(-1);
+ const psBlock = html.slice(psStart, psStart + 800);
+ expect(psBlock).toContain("'My$Projects/rules.yaml'");
+ expect(psBlock).toContain("'src/$something/x.json'");
});
it("truncates ruleset and flags hashes to 12-char prefixes", () => {
diff --git a/experiments/rule-engine-poc/test/loader.test.ts b/experiments/rule-engine-poc/test/loader.test.ts
index 54bcc78c2..445392e94 100644
--- a/experiments/rule-engine-poc/test/loader.test.ts
+++ b/experiments/rule-engine-poc/test/loader.test.ts
@@ -463,6 +463,48 @@ describe("loader", () => {
).toThrow(/non-finite 'lt'/);
});
+ it("rejects rules with non-finite priority", () => {
+ expect(() =>
+ loadRulesFromString(
+ `
+- id: r1
+ priority: .nan
+ description: x
+ when:
+ all:
+ - flag: a
+ eq: true
+ then:
+ verdict: blocked
+ weight: 1
+ actions: [a]
+`,
+ "priority-nan",
+ ),
+ ).toThrow(/non-finite 'priority'/);
+ });
+
+ it("rejects rules with infinite priority", () => {
+ expect(() =>
+ loadRulesFromString(
+ `
+- id: r1
+ priority: .inf
+ description: x
+ when:
+ all:
+ - flag: a
+ eq: true
+ then:
+ verdict: blocked
+ weight: 1
+ actions: [a]
+`,
+ "priority-inf",
+ ),
+ ).toThrow(/non-finite 'priority'/);
+ });
+
it("assigns a stable content hash to each rule", () => {
const ruleA = loadRulesFromString(
`
diff --git a/experiments/rule-engine-poc/test/validate.test.ts b/experiments/rule-engine-poc/test/validate.test.ts
index 824f3593a..f4a7445fa 100644
--- a/experiments/rule-engine-poc/test/validate.test.ts
+++ b/experiments/rule-engine-poc/test/validate.test.ts
@@ -105,12 +105,11 @@ describe("validateExtraction", () => {
expect(r.errors[0]!.code).toBe("disallowed-value");
});
- it("warns (but does not error) when a flag value is null", () => {
+ it("errors when a flag value is null (engine treats null as present)", () => {
const r = validateExtraction({ ci_passing: null }, schema);
- expect(r.ok).toBe(true);
- expect(r.warnings).toHaveLength(1);
- expect(r.warnings[0]!.code).toBe("null-value-omit-instead");
- expect(r.warnings[0]!.path).toBe("ci_passing");
+ expect(r.ok).toBe(false);
+ expect(r.errors[0]!.code).toBe("null-value-not-allowed");
+ expect(r.errors[0]!.path).toBe("ci_passing");
});
describe("with expectedPromptHash", () => {