Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
32d02f4
feat(experiments): rule engine POC for OODA Decide quadrant
claude May 17, 2026
6625ee1
fix(rule-engine-poc): tighten loader validation (Codex P1 + P2)
claude May 17, 2026
00a43e7
fix(rule-engine-poc): exists-AND and empty-group hardening (Codex P1+P2)
claude May 17, 2026
179b883
fix(rule-engine-poc): condition-level validation (Codex P1+P2 round 3)
claude May 17, 2026
45577c4
fix(rule-engine-poc): loader hardening (Codex round 4)
claude May 17, 2026
271702e
feat(rule-engine-poc): plan/report workflow with config-driven targets
claude May 17, 2026
4e0d0ce
fix(rule-engine-poc): honest browser-open + stricter operator types (…
claude May 17, 2026
0aaa114
fix(rule-engine-poc): three platform/format hardening fixes (Codex ro…
claude May 17, 2026
3dc1034
feat(rule-engine-poc): validate gate between plan and report
claude May 17, 2026
a9b1db9
feat(rule-engine-poc): prompt-extraction binding (stale-extraction gu…
claude May 17, 2026
cc00478
fix(rule-engine-poc): polish bundle (Codex round 7 + reviewer S2/S3)
claude May 17, 2026
3140320
docs(rule-engine-poc): research wave 3 strategist re-evaluation (rese…
claude May 17, 2026
d191ff8
fix(rule-engine-poc): recompute prompt hash on report (research/14)
claude May 17, 2026
927dabc
docs(rule-engine-poc): research wave 3 — critic, sre, user-researcher
claude May 17, 2026
0509c11
docs(rule-engine-poc): research wave 3 reviewer re-review + typo fixes
claude May 17, 2026
827e3b3
docs(rule-engine-poc): consolidated architecture doc with mermaid dia…
claude May 17, 2026
fde9d10
fix(rule-engine-poc): slug + UTF-8 truncation hardening (Codex round 8)
claude May 17, 2026
4e54c0e
feat(rule-engine-poc): RALPH-loop polish (agents A+B+C)
claude May 17, 2026
ce801ca
docs(rule-engine-poc): RALPH-loop polish (agents A test + D docs)
claude May 17, 2026
2cf48e4
fix(rule-engine-poc): fail closed when prompt hash unrecomputable (Co…
claude May 17, 2026
bc7ef0a
fix(rule-engine-poc): case-insensitive target ids + prompt-hash-first…
claude May 17, 2026
421d821
refactor(rule-engine-poc): extract cli-shared seam (architecture pass 2)
claude May 17, 2026
1d106c2
docs(rule-engine-poc): commit sample HTML reports for research wave 4
claude May 17, 2026
837380d
chore(typos): exclude generated rule-engine sample HTML reports
claude May 17, 2026
bc11f57
docs(rule-engine-poc): research wave 4 stakeholder pass (research/18)
claude May 17, 2026
9c49473
docs(rule-engine-poc): research wave 4 brand review (research/19)
claude May 17, 2026
3adb87c
docs(rule-engine-poc): research wave 4 misread-risk critique (researc…
claude May 17, 2026
05e430a
docs(rule-engine-poc): research wave 4 auditor-readability (research/20)
claude May 17, 2026
826e749
feat(rule-engine-poc): action glossary scaffold (agent A partial)
claude May 17, 2026
28c84e9
feat(rule-engine-poc): wire action glossary into config + docs (agent…
claude May 17, 2026
f054c18
docs(rule-engine-poc): agent C docs sync — audit trail + workflow + e…
claude May 17, 2026
def9b70
chore(rule-engine-poc): snapshot in-flight agent B + C work
claude May 17, 2026
a479ada
docs(rule-engine-poc): agent C audit-trail.md second-pass refinement
claude May 17, 2026
ded7400
feat(rule-engine-poc): HTML report v3 — wave 4 implementer pass (agen…
claude May 17, 2026
90f3fe1
fix(rule-engine-poc): browser exit-code + takeOpt missing-value (Code…
claude May 17, 2026
eb01077
fix(rule-engine-poc): missing-flag banner + shell-quoted reproduce (C…
claude May 17, 2026
003a05e
fix(rule-engine-poc): single-shot CLI rejects missing --html value (C…
claude May 17, 2026
c00fc4d
fix(rule-engine-poc): string-only actions + finite gt/lt (Codex round…
claude May 17, 2026
b1bf4b1
Merge branch 'develop' into claude/rule-engine-poc-gO5yq
Luis85 May 17, 2026
1f74fb0
docs(rule-engine-poc): compliance map for adopters
claude May 17, 2026
cb23228
Merge branch 'claude/rule-engine-poc-gO5yq' of http://127.0.0.1:43891…
claude May 17, 2026
d203b15
docs(rule-engine-poc): report-reference.md — single-page overview of …
claude May 17, 2026
a09bee9
fix(rule-engine-poc): fileURLToPath + finite priority (Codex round 15)
claude May 17, 2026
90f5871
fix(rule-engine-poc): null extraction values are errors, not warnings…
claude May 17, 2026
4929c08
fix(rule-engine-poc): Windows-safe reproduce command (Codex round 17)
claude May 17, 2026
c6585cd
fix(rule-engine-poc): reproduce command uses literal out.html (Codex …
claude May 17, 2026
51a3356
Merge remote-tracking branch 'origin/develop' into claude/rule-engine…
claude May 17, 2026
f3a8327
fix(rule-engine-poc): PowerShell-specific repro form (Codex round 19)
claude May 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions experiments/rule-engine-poc/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Detailed documentation for the POC. Start with the project [README](../README.md
|---|---|
| [`architecture.md`](architecture.md) | You want the full architecture picture — system map, user flow, data flow, engine internals, module graph, with Mermaid diagrams. **Start here.** |
| [`workflow.md`](workflow.md) | You want to run the `plan` → paste → `validate` → `report` loop end-to-end. |
| [`report-reference.md`](report-reference.md) | You want a single-page overview of the HTML report — section-by-section walkthrough, the five perspectives that shaped it, what's still open. |
| [`dsl-reference.md`](dsl-reference.md) | You're writing or reading a rule file and need the full YAML grammar — every operator, every grouping construct. |
| [`audit-trail.md`](audit-trail.md) | You need to replay a verdict, diff two verdicts, or map the audit trail to EU AI Act / ISO 42001 requirements. |
| [`compliance.md`](compliance.md) | You're scoping the pattern against a regulation or standard — what the POC ticks natively, what the adopter must still provide, what's out of scope. |
Expand Down
170 changes: 170 additions & 0 deletions experiments/rule-engine-poc/docs/report-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: Report reference
folder: experiments/rule-engine-poc/docs
description: A single-page overview of the HTML report — what each section is for, what the five product perspectives that shaped it found, what's implemented, and what's still open.
entry_point: false
---

# Report reference

This is the meta-doc for the HTML report itself — the user-facing artifact of the POC. It consolidates what `architecture.md`, `workflow.md`, `audit-trail.md`, and the five wave-4 research artifacts (`research/17`–`21`) say about the report into one place.

## Contents

1. [What the report is](#1-what-the-report-is)
2. [Section-by-section walkthrough](#2-section-by-section-walkthrough)
3. [The five product perspectives that shaped it](#3-the-five-product-perspectives-that-shaped-it)
4. [What got implemented (wave-4 delta)](#4-what-got-implemented-wave-4-delta)
5. [What's still open](#5-whats-still-open)
6. [Generating one](#6-generating-one)
7. [Reading one](#7-reading-one)

---

## 1. What the report is

A self-contained HTML file rendered by `src/html-report.ts` from a `VerdictResult`. One report per target; one file per `npm run report` invocation. Inline CSS, no JavaScript, no external assets — it survives email forwarding, Slack attachment, S3 retention, and offline viewing.

The report is the *only* document most readers will see. The terminal output and the JSON `--json` mode are for CI and operators; the HTML is for everyone else (PR reviewer, PM, EM, QA, compliance officer, auditor, the author a week later).

Three committed samples under [`research/sample-reports/`](../research/sample-reports/) show the three primary verdict shapes — `blocked`, `needs-attention`, `ready-to-progress`.

## 2. Section-by-section walkthrough

The report renders top-to-bottom in this order. The order matters: it follows the reader's actual scan path established by `research/17` (UX audit) and `research/20` (auditor reading path).

| Section | Job | Section header at render | Source |
|---|---|---|---|
| **System-identity header** | Tell a cold reader *what this is* — engine version + prominent timestamp | (no header — runs above the verdict card) | `research/20` Art. 13 "provider identity" gap |
| **Verdict tile** | Categorical tier in 2-second-scan colour: blocked / needs-attention / ready-to-progress / unknown | (the headline) | engine `verdict` |
| **Stats line** | "N rule(s) fired · M action(s) to take" — quantifies how contested the decision is | (under the tile) | `result.evaluations`, `result.actions` |
| **Blocker-by-absence banner** | "X rules could not be evaluated because the LLM did not supply Y, Z, W" — yellow, adjacent to verdict | (conditional banner) | `research/21` skim-trap finding |
| **Skip-validate banner** | "WARNING: validation gate was skipped" — when `--skip-validate` was set | (conditional banner) | `research/18` + `research/21` trust calibration |
| **Verdict-tier + glyph legend** | Collapsed `<details>` explaining blocked / needs-attention / ready-to-progress / unknown + `[+] / [-] / [?]` glyph meanings | "Glossary." | `research/17` + `research/20` |
| **Weighted tally** | Per-tier weight totals, side-by-side with actions | "Weighted tally." | engine `weightedTally` |
| **Suggested actions** | Imperative-voice action sentences in priority-of-cause order (not alphabetical) | "Take these actions." | engine `evaluations` walked in priority desc; `rules/action-glossary.yaml` if present |
| **Extraction flags** | The LLM's structured output as a table | "Extraction flags." | `ctx.flags` |
| **What fired** | Matched rules only, in priority order, each with rule id + description + flags it matched on + actions it contributed | "What fired." | `result.evaluations` filtered to `matched === true` |
| **Audit trail** | Every rule evaluation, matched + skipped. Skipped rules collapse to `<details>` summary by default | "Audit trail." | `result.evaluations` |
| **Reproduce block** | Shell-quoted command + the three replay anchors (engine version + ruleset hash + flags hash) | "How to reproduce." | `research/20` Art. 12 replay manifest |
| **Provenance** | Hash preamble + 12-char truncated hashes + file paths | "Provenance." | `result.rulesetHash`, `result.flagsHash`, `result.engineVersion` |

The CSS is inline and uses a 3-tier severity palette (red / amber / green) with non-colour-only signals (glyphs, row washes, section headings). A `@media (max-width: 540px)` rule collapses the summary grid to a single column on phones.

## 3. The five product perspectives that shaped it

Wave 4 dispatched five subagents in parallel against three committed sample renders. Each reviewed the report through a different lens. The convergent findings drove the v3 rebuild in the wave-4 implementer pass.

### UX (`research/17-report-ux-audit.md`)

> "The audit trail buries what matters. With ~21 rules and 1–2 matches per sample, the page is ~95% 'did not match' content."

Top recommendations: a **What fired** section above the full audit trail; collapse-by-default for skipped rules; replace "X of 21 rules matched" coverage text with per-tier-appropriate phrasing; sort actions by priority-of-cause not alphabetical; a `cond--miss` row-wash to match `cond--missing`'s amber; a `@media (max-width: 540px)` single-column fallback.

### Stakeholder strategy (`research/18-report-stakeholders.md`)

> "The report is one artifact serving six different first-fields — PR-reviewers want the verdict pill, compliance wants the provenance block, PMs want suggested actions, auditors want the hashes, authors want the flags."

Top recommendations: expand action slugs (`kick-ci`) to human sentences (`"Re-run the failing CI job."`) via an `actions[].human` field — implemented as a sidecar glossary in `rules/action-glossary.yaml`; introduce a `label_set` config (default `dev`; `pm`, `qa`, `compliance` as presentational overrides) so headline labels match the reader; flag a portfolio dashboard as the first hosted-SaaS gravity seam — *do not build it yet*.

### Brand (`research/19-report-brand-review.md`)

Verdict: **pass-with-findings** (not S1-blocking under the sandbox scope). On-temperament (no emoji, no gradients, no icons, restrained density, ASCII `[+]/[-]/[?]` glyphs correctly used as monospace-iconography). Off-token: 18 distinct literal hex values, hand-picked font stacks, near-white page background where Specorator calls for cream. Section headers should be sentence-case with periods — implemented. Open ADR-shaped decision: Specorator has no red token; the `blocked` tier currently uses literal `#fdecea / #d8281b / #7a160d` and stays that way until graduation.

### Auditor readability (`research/20-report-auditor-readability.md`)

> "30-second test passes... what the 30-second test fails on: the report does not name what kind of system this is, who built it, what version of the workflow it governs, or what the verdict is binding against."

Implemented: system-identity header (engine version + prominent timestamp), reproduce-block with the three replay anchors, verdict-tier glossary, glyph legend. Still open: provider identity / contact, model-card link, capabilities-and-limitations sentence, expected-lifetime statement, reviewer-of-record field. Closes `research/02`'s "human-readable rationale presentation" open item; the remainder is governance, not engineering.

### Misread risks (`research/21-report-misread-risks.md`)

Three flagged misread paths:

1. **The skim trap** — a busy reader looks only at the verdict tile and action list, missing context. Implemented mitigation: blocker-by-absence banner is rendered at the same visual weight as the verdict card.
2. **`verified` badge as trust trap** — a green pill saying `verified` will read as "extraction verified" when it only means "extraction is bound to current inputs". Implemented mitigation: tooltip on the badge explicitly says *bound to current inputs, not flag-correctness*. Plus the `--skip-validate` banner is now visually equal-weight to the verdict so a skipped-validation run cannot pass undetected.
3. **Blocker-by-absence is the most dangerous skim path** — a high-priority blocker whose input flag was never extracted simply doesn't fire. Implemented mitigation: dedicated banner naming the missing flags AND the count of un-evaluable rules (counted using `matched === false`, fixed in Codex round 12 to exclude rules that fired via `when.any` despite one missing branch).

## 4. What got implemented (wave-4 delta)

The 12 changes that landed across the wave-4 implementer pass (Agent B's RALPH loop) and the subsequent Codex round 11–14 hardening:

| # | Change | Research source |
|---|---|---|
| 1 | "What fired" section above audit trail | `17`, `20`, `21` |
| 2 | Non-matched rules collapsed via `<details>` | `17` |
| 3 | Blocker-by-absence banner naming missing flags | `21`, `17` |
| 4 | Suggested actions in priority-of-cause order | `17` |
| 5 | Action human-sentences from glossary | `18` |
| 6 | Provenance preamble + reproduce block + 12-char hash truncation | `17`, `20`, `18` |
| 7 | System-identity header + prominent timestamp | `20` |
| 8 | Verdict-tier + glyph legend in collapsed `<details>` | `20`, `17` |
| 9 | `cond--miss` row wash matching `cond--missing` amber | `17` |
| 10 | `@media (max-width: 540px)` single-column fallback | `17` |
| 11 | `--skip-validate` banner + `verified` badge tooltip | `18`, `21` |
| 12 | Sentence-case headers + imperative voice ("Take these actions.") | `19` |

Plus three follow-up Codex hardenings that were caught after the wave-4 push:

- **Round 11** (`90f3fe1`) — `openInBrowser` waits for the spawned process to exit cleanly (not just `spawn`); `takeOpt` rejects missing values for `--config` / `--target`.
- **Round 12** (`eb01077`) — `missingFlagNames` only counts rules whose final outcome was determined by absence (excludes `when.any` rules that matched another branch); reproduce-command paths are shell-quoted.
- **Round 13** (`003a05e`) — single-shot `cli.ts::takeOption` rejects missing `--html` values.

Test surface for the report: 28 tests in `test/html-report.test.ts` plus the report-flow integration tests in `test/report-flow.test.ts`.

## 5. What's still open

Deferred, with the bucket each lives in:

| Item | Source | Bucket |
|---|---|---|
| `label_set` config (dev / pm / qa / compliance presets) | `research/18` | Strategy slice 14–18 |
| Reader-specific export modes (PDF, markdown, Slack-friendly text) | `research/18` | Strategy slice |
| Portfolio dashboard (one HTML across N targets) | `research/18` | Hosted-SaaS gravity seam — explicitly *do not build* per strategist |
| Provider identity / model card / capabilities-and-limitations sentence | `research/20` | Governance (compliance.md "what's not in this POC") |
| Reviewer-of-record field, override workflow | `research/20` | Governance |
| Brand-token migration (replace 18 literal hex values with vars) | `research/19` | ADR at graduation |
| Diff-against-previous-run | `research/21` | Production prep |
| Confidence / uncertainty surface | `research/20` + `research/21` | Calibration study first |
| RAT-A / RAT-B / RAT-C / RAT-D / RAT-E / RAT-F | `research/07`, `research/14` | Discovery activity — needs users, not more engineering |

## 6. Generating one

```bash
# Plan (writes the prompt + sidecar)
npm run plan -- --target <id>

# User pastes the prompt into Claude / ChatGPT / Gemini, saves JSON to
# extractions/<id>.json.

# Validate (optional sanity check)
npm run validate -- --target <id>

# Report (renders HTML, opens browser best-effort)
npm run report -- --target <id>
# Or without opening a browser:
npm run report -- --target <id> --no-open
```

Exit codes: `0` no blockers, `1` at least one `blocked` verdict, `2` missing / malformed extraction.

For testing without the AI loop (single-shot fixture flow):

```bash
npx tsx src/cli.ts rules/quality-gates.yaml fixtures/blocked-missing-ears.json --html /tmp/preview.html
```

## 7. Reading one

For the **report consumer** (not the POC operator):

1. **Verdict tile** — colour and label. That's the answer.
2. **Stats line** — how many rules fired? How many actions to take?
3. **Banners** (if present) — blocker-by-absence flags missing inputs; skip-validate flag means the validation gate was bypassed (treat the verdict as advisory).
4. **Take these actions** — the imperative-voice human sentences; if `[code](#)` slug is shown, hover for the technical name.
5. **What fired** — the rules that drove the verdict, in priority-of-cause order. Each is a 1-paragraph card; expand the audit trail for everything that *didn't* fire.
6. **Provenance** — the three hashes (engine version, ruleset, flags) let you verify the report came from a specific tuple. The reproduce-command block lets you re-run it locally.

A reader who reads only steps 1–4 should still get the answer correct. The rest is depth on demand.

See [`docs/audit-trail.md`](audit-trail.md) for replay mechanics and [`docs/compliance.md`](compliance.md) for which sections speak to which regulation.
4 changes: 3 additions & 1 deletion experiments/rule-engine-poc/scripts/run-all-fixtures.mjs
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import { readdirSync } from "node:fs";
import { join } from "node:path";
import { fileURLToPath } from "node:url";
import { spawnSync } from "node:child_process";

const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
// fileURLToPath handles percent-decoding and Windows drive paths.
const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
const rules = "rules/quality-gates.yaml";

const files = readdirSync(fixturesDir)
Expand Down
5 changes: 4 additions & 1 deletion experiments/rule-engine-poc/scripts/run-all-html.mjs
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import { readdirSync, mkdirSync } from "node:fs";
import { join, basename } from "node:path";
import { fileURLToPath } from "node:url";
import { spawnSync } from "node:child_process";

const fixturesDir = new URL("../fixtures/", import.meta.url).pathname;
// fileURLToPath handles percent-decoding and Windows drive paths;
// `.pathname` alone breaks for both (Codex round 15 P2).
const fixturesDir = fileURLToPath(new URL("../fixtures/", import.meta.url));
const rules = "rules/quality-gates.yaml";
const reportsDir = "reports";

Expand Down
32 changes: 24 additions & 8 deletions experiments/rule-engine-poc/src/html-report.ts
Original file line number Diff line number Diff line change
Expand Up @@ -268,13 +268,24 @@ export function renderHtmlReport(
? `<div class="banner banner--skip" role="alert"><strong>WARNING:</strong> validation gate was skipped (<code>--skip-validate</code>). Verdict and provenance are NOT verified against the flag schema or forbidden-fields policy.</div>`
: "";

// Reproduce command: assembled from the same fields plan/report use.
// Codex round 12 P2: quote paths so paths with spaces or shell
// metacharacters (e.g., "My Projects/rules.yaml") don't break the
// command. Single-quote shell-escape: replace any ' inside the
// path with the four-char sequence '\'' .
const shellQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
const reproCmd = `npx tsx src/cli.ts ${shellQuote(ctx.rulesPath)} ${shellQuote(ctx.flagsPath)} --html <out.html> --quiet`;
// Reproduce command: render three flavours because the supported
// shells disagree on quoting AND on variable expansion:
// - POSIX (bash / zsh / sh): single-quote escape (' becomes '\'').
// Single quotes suppress $var expansion.
// - cmd.exe: double-quote escape (" becomes ""). cmd doesn't
// expand $var; it expands %VAR%, but our paths don't carry %.
// - PowerShell: single-quote escape (' becomes ''). Double quotes
// in PowerShell EXPAND $var and $(), which can mutate the path
// (Codex round 19 P2).
const posixQuote = (s: string): string => `'${s.replace(/'/g, "'\\''")}'`;
const cmdQuote = (s: string): string => `"${s.replace(/"/g, '""')}"`;
const psQuote = (s: string): string => `'${s.replace(/'/g, "''")}'`;
// Use a literal filename, NOT `<out.html>`: angle brackets are shell
// I/O redirection on all three, so a copy-paste would silently send
// --html no value (Codex round 18 P2).
const reproCmdPosix = `npx tsx src/cli.ts ${posixQuote(ctx.rulesPath)} ${posixQuote(ctx.flagsPath)} --html out.html --quiet`;
const reproCmdCmd = `npx tsx src/cli.ts ${cmdQuote(ctx.rulesPath)} ${cmdQuote(ctx.flagsPath)} --html out.html --quiet`;
const reproCmdPwsh = `npx tsx src/cli.ts ${psQuote(ctx.rulesPath)} ${psQuote(ctx.flagsPath)} --html out.html --quiet`;

return `<!doctype html>
<html lang="en">
Expand Down Expand Up @@ -491,7 +502,12 @@ export function renderHtmlReport(
</p>
<div class="reproduce">
<p>How to reproduce &mdash; run from <code>experiments/rule-engine-poc/</code>:</p>
<pre><code>${esc(reproCmd)}</code></pre>
<p class="repro-label">POSIX (macOS, Linux, WSL, Git Bash):</p>
<pre><code>${esc(reproCmdPosix)}</code></pre>
<p class="repro-label">Windows cmd.exe:</p>
<pre><code>${esc(reproCmdCmd)}</code></pre>
<p class="repro-label">PowerShell (Windows / cross-platform):</p>
<pre><code>${esc(reproCmdPwsh)}</code></pre>
<p>Then verify the three hashes above match the values in the regenerated report.</p>
</div>
</section>
Expand Down
9 changes: 9 additions & 0 deletions experiments/rule-engine-poc/src/loader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,15 @@ function validate(
if (typeof rule.priority !== "number") {
throw new Error(`Rule '${rule.id}' missing numeric 'priority'`);
}
Comment on lines +137 to +139
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-finite rule priorities at load time

priority is only checked with typeof === "number", so YAML values like .nan are accepted. In evaluate, sorting then does b.priority - a.priority; with NaN that comparator returns NaN (treated like 0), which silently breaks the documented priority desc, id asc ordering and can reorder audit/action output unpredictably. This loader should reject non-finite priorities the same way it already rejects non-finite numeric operators and weights.

Useful? React with 👍 / 👎.

// Codex round 15 P2: NaN/Infinity priorities silently break the
// documented sort order (b.priority - a.priority returns NaN, treated
// as 0), reordering the audit trail unpredictably. Same fail-fast
// discipline as weight + gt + lt.
if (!Number.isFinite(rule.priority)) {
throw new Error(
`Rule '${rule.id}' has non-finite 'priority' (got ${String(rule.priority)})`,
);
}
}

const CONDITION_OPS = [
Expand Down
17 changes: 12 additions & 5 deletions experiments/rule-engine-poc/src/validate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,20 @@ export function validateExtraction(
continue;
}
if (value === null) {
warnings.push({
severity: "warning",
code: "null-value-omit-instead",
// Codex round 16 P1: previously a warning, but the engine's
// `hasOwnProperty` presence check treats null as PRESENT — so
// `exists`/`ne` rules behave differently against {flag: null}
// than against {} despite the validator's old "null ≈ missing"
// claim. The right fix is to refuse null at the gate so the
// engine never sees it; LLMs are instructed to omit unknowns.
errors.push({
severity: "error",
code: "null-value-not-allowed",
path: key,
message:
`Flag '${key}' is null; prefer omitting unknowns over emitting null. ` +
`The engine will treat null and missing identically.`,
`Flag '${key}' is null; omit the field instead. ` +
`The engine's presence check treats null as PRESENT, which can ` +
`silently change verdicts for rules using 'exists' or 'ne'.`,
});
continue;
Comment on lines +106 to +122
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat null extraction values as validation errors

The null special-case downgrades schema violations to a warning and skips type enforcement, so report/validate can pass even when the extraction is not type-correct. This can change verdicts silently: the engine treats null as a present flag (hasOwnProperty), so rules using presence-sensitive logic (for example exists or ne) evaluate differently than if the flag were omitted, despite the validator message claiming null is equivalent to missing. In practice, a model emitting {"some_flag": null} can produce an accepted but semantically different decision path.

Useful? React with 👍 / 👎.

}
Expand Down
Loading
Loading