-
Notifications
You must be signed in to change notification settings - Fork 0
docs(rule-engine-poc): single-page HTML report reference #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
32d02f4
6625ee1
00a43e7
179b883
45577c4
271702e
4e0d0ce
0aaa114
3dc1034
a9b1db9
cc00478
3140320
d191ff8
927dabc
0509c11
827e3b3
fde9d10
4e54c0e
ce801ca
2cf48e4
bc7ef0a
421d821
1d106c2
837380d
bc11f57
9c49473
3adb87c
05e430a
826e749
28c84e9
f054c18
def9b70
a479ada
ded7400
90f3fe1
eb01077
003a05e
c00fc4d
b1bf4b1
1f74fb0
cb23228
d203b15
a09bee9
90f5871
4929c08
c6585cd
51a3356
f3a8327
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| --- | ||
| title: Report reference | ||
| folder: experiments/rule-engine-poc/docs | ||
| description: A single-page overview of the HTML report — what each section is for, what the five product perspectives that shaped it found, what's implemented, and what's still open. | ||
| entry_point: false | ||
| --- | ||
|
|
||
| # Report reference | ||
|
|
||
| This is the meta-doc for the HTML report itself — the user-facing artifact of the POC. It consolidates what `architecture.md`, `workflow.md`, `audit-trail.md`, and the five wave-4 research artifacts (`research/17`–`21`) say about the report into one place. | ||
|
|
||
| ## Contents | ||
|
|
||
| 1. [What the report is](#1-what-the-report-is) | ||
| 2. [Section-by-section walkthrough](#2-section-by-section-walkthrough) | ||
| 3. [The five product perspectives that shaped it](#3-the-five-product-perspectives-that-shaped-it) | ||
| 4. [What got implemented (wave-4 delta)](#4-what-got-implemented-wave-4-delta) | ||
| 5. [What's still open](#5-whats-still-open) | ||
| 6. [Generating one](#6-generating-one) | ||
| 7. [Reading one](#7-reading-one) | ||
|
|
||
| --- | ||
|
|
||
| ## 1. What the report is | ||
|
|
||
| A self-contained HTML file rendered by `src/html-report.ts` from a `VerdictResult`. One report per target; one file per `npm run report` invocation. Inline CSS, no JavaScript, no external assets — it survives email forwarding, Slack attachment, S3 retention, and offline viewing. | ||
|
|
||
| The report is the *only* document most readers will see. The terminal output and the JSON `--json` mode are for CI and operators; the HTML is for everyone else (PR reviewer, PM, EM, QA, compliance officer, auditor, the author a week later). | ||
|
|
||
| Three committed samples under [`research/sample-reports/`](../research/sample-reports/) show the three primary verdict shapes — `blocked`, `needs-attention`, `ready-to-progress`. | ||
|
|
||
| ## 2. Section-by-section walkthrough | ||
|
|
||
| The report renders top-to-bottom in this order. The order matters: it follows the reader's actual scan path established by `research/17` (UX audit) and `research/20` (auditor reading path). | ||
|
|
||
| | Section | Job | Section header at render | Source | | ||
| |---|---|---|---| | ||
| | **System-identity header** | Tell a cold reader *what this is* — engine version + prominent timestamp | (no header — runs above the verdict card) | `research/20` Art. 13 "provider identity" gap | | ||
| | **Verdict tile** | Categorical tier in 2-second-scan colour: blocked / needs-attention / ready-to-progress / unknown | (the headline) | engine `verdict` | | ||
| | **Stats line** | "N rule(s) fired · M action(s) to take" — quantifies how contested the decision is | (under the tile) | `result.evaluations`, `result.actions` | | ||
| | **Blocker-by-absence banner** | "X rules could not be evaluated because the LLM did not supply Y, Z, W" — yellow, adjacent to verdict | (conditional banner) | `research/21` skim-trap finding | | ||
| | **Skip-validate banner** | "WARNING: validation gate was skipped" — when `--skip-validate` was set | (conditional banner) | `research/18` + `research/21` trust calibration | | ||
| | **Verdict-tier + glyph legend** | Collapsed `<details>` explaining blocked / needs-attention / ready-to-progress / unknown + `[+] / [-] / [?]` glyph meanings | "Glossary." | `research/17` + `research/20` | | ||
| | **Weighted tally** | Per-tier weight totals, side-by-side with actions | "Weighted tally." | engine `weightedTally` | | ||
| | **Suggested actions** | Imperative-voice action sentences in priority-of-cause order (not alphabetical) | "Take these actions." | engine `evaluations` walked in priority desc; `rules/action-glossary.yaml` if present | | ||
| | **Extraction flags** | The LLM's structured output as a table | "Extraction flags." | `ctx.flags` | | ||
| | **What fired** | Matched rules only, in priority order, each with rule id + description + flags it matched on + actions it contributed | "What fired." | `result.evaluations` filtered to `matched === true` | | ||
| | **Audit trail** | Every rule evaluation, matched + skipped. Skipped rules collapse to `<details>` summary by default | "Audit trail." | `result.evaluations` | | ||
| | **Reproduce block** | Shell-quoted command + the three replay anchors (engine version + ruleset hash + flags hash) | "How to reproduce." | `research/20` Art. 12 replay manifest | | ||
| | **Provenance** | Hash preamble + 12-char truncated hashes + file paths | "Provenance." | `result.rulesetHash`, `result.flagsHash`, `result.engineVersion` | | ||
|
|
||
| The CSS is inline and uses a 3-tier severity palette (red / amber / green) with non-colour-only signals (glyphs, row washes, section headings). A `@media (max-width: 540px)` rule collapses the summary grid to a single column on phones. | ||
|
|
||
| ## 3. The five product perspectives that shaped it | ||
|
|
||
| Wave 4 dispatched five subagents in parallel against three committed sample renders. Each reviewed the report through a different lens. The convergent findings drove the v3 rebuild in the wave-4 implementer pass. | ||
|
|
||
| ### UX (`research/17-report-ux-audit.md`) | ||
|
|
||
| > "The audit trail buries what matters. With ~21 rules and 1–2 matches per sample, the page is ~95% 'did not match' content." | ||
|
|
||
| Top recommendations: a **What fired** section above the full audit trail; collapse-by-default for skipped rules; replace "X of 21 rules matched" coverage text with per-tier-appropriate phrasing; sort actions by priority-of-cause not alphabetical; a `cond--miss` row-wash to match `cond--missing`'s amber; a `@media (max-width: 540px)` single-column fallback. | ||
|
|
||
| ### Stakeholder strategy (`research/18-report-stakeholders.md`) | ||
|
|
||
| > "The report is one artifact serving six different first-fields — PR-reviewers want the verdict pill, compliance wants the provenance block, PMs want suggested actions, auditors want the hashes, authors want the flags." | ||
|
|
||
| Top recommendations: expand action slugs (`kick-ci`) to human sentences (`"Re-run the failing CI job."`) via an `actions[].human` field — implemented as a sidecar glossary in `rules/action-glossary.yaml`; introduce a `label_set` config (default `dev`; `pm`, `qa`, `compliance` as presentational overrides) so headline labels match the reader; flag a portfolio dashboard as the first hosted-SaaS gravity seam — *do not build it yet*. | ||
|
|
||
| ### Brand (`research/19-report-brand-review.md`) | ||
|
|
||
| Verdict: **pass-with-findings** (not S1-blocking under the sandbox scope). On-temperament (no emoji, no gradients, no icons, restrained density, ASCII `[+]/[-]/[?]` glyphs correctly used as monospace-iconography). Off-token: 18 distinct literal hex values, hand-picked font stacks, near-white page background where Specorator calls for cream. Section headers should be sentence-case with periods — implemented. Open ADR-shaped decision: Specorator has no red token; the `blocked` tier currently uses literal `#fdecea / #d8281b / #7a160d` and stays that way until graduation. | ||
|
|
||
| ### Auditor readability (`research/20-report-auditor-readability.md`) | ||
|
|
||
| > "30-second test passes... what the 30-second test fails on: the report does not name what kind of system this is, who built it, what version of the workflow it governs, or what the verdict is binding against." | ||
|
|
||
| Implemented: system-identity header (engine version + prominent timestamp), reproduce-block with the three replay anchors, verdict-tier glossary, glyph legend. Still open: provider identity / contact, model-card link, capabilities-and-limitations sentence, expected-lifetime statement, reviewer-of-record field. Closes `research/02`'s "human-readable rationale presentation" open item; the remainder is governance, not engineering. | ||
|
|
||
| ### Misread risks (`research/21-report-misread-risks.md`) | ||
|
|
||
| Three flagged misread paths: | ||
|
|
||
| 1. **The skim trap** — a busy reader looks only at the verdict tile and action list, missing context. Implemented mitigation: blocker-by-absence banner is rendered at the same visual weight as the verdict card. | ||
| 2. **`verified` badge as trust trap** — a green pill saying `verified` will read as "extraction verified" when it only means "extraction is bound to current inputs". Implemented mitigation: tooltip on the badge explicitly says *bound to current inputs, not flag-correctness*. Plus the `--skip-validate` banner is now visually equal-weight to the verdict so a skipped-validation run cannot pass undetected. | ||
| 3. **Blocker-by-absence is the most dangerous skim path** — a high-priority blocker whose input flag was never extracted simply doesn't fire. Implemented mitigation: dedicated banner naming the missing flags AND the count of un-evaluable rules (counted using `matched === false`, fixed in Codex round 12 to exclude rules that fired via `when.any` despite one missing branch). | ||
|
|
||
| ## 4. What got implemented (wave-4 delta) | ||
|
|
||
| The 12 changes that landed across the wave-4 implementer pass (Agent B's RALPH loop) and the subsequent Codex round 11–14 hardening: | ||
|
|
||
| | # | Change | Research source | | ||
| |---|---|---| | ||
| | 1 | "What fired" section above audit trail | `17`, `20`, `21` | | ||
| | 2 | Non-matched rules collapsed via `<details>` | `17` | | ||
| | 3 | Blocker-by-absence banner naming missing flags | `21`, `17` | | ||
| | 4 | Suggested actions in priority-of-cause order | `17` | | ||
| | 5 | Action human-sentences from glossary | `18` | | ||
| | 6 | Provenance preamble + reproduce block + 12-char hash truncation | `17`, `20`, `18` | | ||
| | 7 | System-identity header + prominent timestamp | `20` | | ||
| | 8 | Verdict-tier + glyph legend in collapsed `<details>` | `20`, `17` | | ||
| | 9 | `cond--miss` row wash matching `cond--missing` amber | `17` | | ||
| | 10 | `@media (max-width: 540px)` single-column fallback | `17` | | ||
| | 11 | `--skip-validate` banner + `verified` badge tooltip | `18`, `21` | | ||
| | 12 | Sentence-case headers + imperative voice ("Take these actions.") | `19` | | ||
|
|
||
| Plus three follow-up Codex hardenings that were caught after the wave-4 push: | ||
|
|
||
| - **Round 11** (`90f3fe1`) — `openInBrowser` waits for the spawned process to exit cleanly (not just `spawn`); `takeOpt` rejects missing values for `--config` / `--target`. | ||
| - **Round 12** (`eb01077`) — `missingFlagNames` only counts rules whose final outcome was determined by absence (excludes `when.any` rules that matched another branch); reproduce-command paths are shell-quoted. | ||
| - **Round 13** (`003a05e`) — single-shot `cli.ts::takeOption` rejects missing `--html` values. | ||
|
|
||
| Test surface for the report: 28 tests in `test/html-report.test.ts` plus the report-flow integration tests in `test/report-flow.test.ts`. | ||
|
|
||
| ## 5. What's still open | ||
|
|
||
| Deferred, with the bucket each lives in: | ||
|
|
||
| | Item | Source | Bucket | | ||
| |---|---|---| | ||
| | `label_set` config (dev / pm / qa / compliance presets) | `research/18` | Strategy slice 14–18 | | ||
| | Reader-specific export modes (PDF, markdown, Slack-friendly text) | `research/18` | Strategy slice | | ||
| | Portfolio dashboard (one HTML across N targets) | `research/18` | Hosted-SaaS gravity seam — explicitly *do not build* per strategist | | ||
| | Provider identity / model card / capabilities-and-limitations sentence | `research/20` | Governance (compliance.md "what's not in this POC") | | ||
| | Reviewer-of-record field, override workflow | `research/20` | Governance | | ||
| | Brand-token migration (replace 18 literal hex values with vars) | `research/19` | ADR at graduation | | ||
| | Diff-against-previous-run | `research/21` | Production prep | | ||
| | Confidence / uncertainty surface | `research/20` + `research/21` | Calibration study first | | ||
| | RAT-A / RAT-B / RAT-C / RAT-D / RAT-E / RAT-F | `research/07`, `research/14` | Discovery activity — needs users, not more engineering | | ||
|
|
||
| ## 6. Generating one | ||
|
|
||
| ```bash | ||
| # Plan (writes the prompt + sidecar) | ||
| npm run plan -- --target <id> | ||
|
|
||
| # User pastes the prompt into Claude / ChatGPT / Gemini, saves JSON to | ||
| # extractions/<id>.json. | ||
|
|
||
| # Validate (optional sanity check) | ||
| npm run validate -- --target <id> | ||
|
|
||
| # Report (renders HTML, opens browser best-effort) | ||
| npm run report -- --target <id> | ||
| # Or without opening a browser: | ||
| npm run report -- --target <id> --no-open | ||
| ``` | ||
|
|
||
| Exit codes: `0` no blockers, `1` at least one `blocked` verdict, `2` missing / malformed extraction. | ||
|
|
||
| For testing without the AI loop (single-shot fixture flow): | ||
|
|
||
| ```bash | ||
| npx tsx src/cli.ts rules/quality-gates.yaml fixtures/blocked-missing-ears.json --html /tmp/preview.html | ||
| ``` | ||
|
|
||
| ## 7. Reading one | ||
|
|
||
| For the **report consumer** (not the POC operator): | ||
|
|
||
| 1. **Verdict tile** — colour and label. That's the answer. | ||
| 2. **Stats line** — how many rules fired? How many actions to take? | ||
| 3. **Banners** (if present) — blocker-by-absence flags missing inputs; skip-validate flag means the validation gate was bypassed (treat the verdict as advisory). | ||
| 4. **Take these actions** — the imperative-voice human sentences; if `[code](#)` slug is shown, hover for the technical name. | ||
| 5. **What fired** — the rules that drove the verdict, in priority-of-cause order. Each is a 1-paragraph card; expand the audit trail for everything that *didn't* fire. | ||
| 6. **Provenance** — the three hashes (engine version, ruleset, flags) let you verify the report came from a specific tuple. The reproduce-command block lets you re-run it locally. | ||
|
|
||
| A reader who reads only steps 1–4 should still get the answer correct. The rest is depth on demand. | ||
|
|
||
| See [`docs/audit-trail.md`](audit-trail.md) for replay mechanics and [`docs/compliance.md`](compliance.md) for which sections speak to which regulation. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -104,13 +104,20 @@ export function validateExtraction( | |
| continue; | ||
| } | ||
| if (value === null) { | ||
| warnings.push({ | ||
| severity: "warning", | ||
| code: "null-value-omit-instead", | ||
| // Codex round 16 P1: previously a warning, but the engine's | ||
| // `hasOwnProperty` presence check treats null as PRESENT — so | ||
| // `exists`/`ne` rules behave differently against {flag: null} | ||
| // than against {} despite the validator's old "null ≈ missing" | ||
| // claim. The right fix is to refuse null at the gate so the | ||
| // engine never sees it; LLMs are instructed to omit unknowns. | ||
| errors.push({ | ||
| severity: "error", | ||
| code: "null-value-not-allowed", | ||
| path: key, | ||
| message: | ||
| `Flag '${key}' is null; prefer omitting unknowns over emitting null. ` + | ||
| `The engine will treat null and missing identically.`, | ||
| `Flag '${key}' is null; omit the field instead. ` + | ||
| `The engine's presence check treats null as PRESENT, which can ` + | ||
| `silently change verdicts for rules using 'exists' or 'ne'.`, | ||
| }); | ||
| continue; | ||
|
Comment on lines
+106
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The Useful? React with 👍 / 👎. |
||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
priorityis only checked withtypeof === "number", so YAML values like.nanare accepted. Inevaluate, sorting then doesb.priority - a.priority; withNaNthat comparator returnsNaN(treated like 0), which silently breaks the documentedpriority desc, id ascordering and can reorder audit/action output unpredictably. This loader should reject non-finite priorities the same way it already rejects non-finite numeric operators and weights.Useful? React with 👍 / 👎.