Skip to content

feat(gate): insight quality gate — audit + forward enforcement#2

Merged
gorajing merged 3 commits into
mainfrom
feat/insight-gate-audit
May 25, 2026
Merged

feat(gate): insight quality gate — audit + forward enforcement#2
gorajing merged 3 commits into
mainfrom
feat/insight-gate-audit

Conversation

@gorajing
Copy link
Copy Markdown
Owner

@gorajing gorajing commented May 25, 2026

A deterministic, recomputable semantic-quality gate for the insight corpus, in two commits.

Phase 1 — read-only audit (093f344)

  • scripts/lib/insight-gate.ts + scripts/insight-gate.ts: the layer health.ts (structural) and verify-contracts.ts (doc/code surface) don't cover.
  • Checks (no LLM in the gate): stance present / directional (heuristic), attribution resolves, topic matches path, nearest-INSIGHT-neighbor cosine novelty (exact cosine on stored float32 vectors — vec0 returns L2 — INS-only via k=64 overfetch, with block-threshold simulation + deduped triage pairs).
  • --audit never exits 1 on quality; read-only DB; outputs to gitignored meta/.
  • First X-ray (11,520): stance 100% / directional 99.7% / attribution 99.0% / topic 100%; near-dupes ≥0.95→24, ≥0.90→186, ≥0.85→541 (mean 0.747).

Phase 2 — forward enforcement (67ffd2d)

  • enforceGate(): blocking failures vs non-blocking warnings. Conservative default: blocks only on missing stance + near-duplicate (≥0.95). stance_directional (heuristic) and attribution (would reject synthetic insights) are warnings, promotable via blockingChecks. Unembedded new insight → warning, not silent skip.
  • --enforce --changed: scopes to uncommitted insight files, fails closed on git error and on changed files that don't load.
  • Fatal Gate step in post-ingest — after embed, before learn/auto-git (placement preserves forward-only scoping). Mirrors the fatal reindex step.

Review

  • Local codex: Phase 1 converged in 3 rounds (caught L2-vs-cosine, read-only DB, k-overfetch, precision, availability-state); Phase 2 converged in 3 rounds (caught two fail-open paths + an unaccounted-insight bypass).

Test plan

  • 55 tests (unit + in-memory sqlite-vec integration + enforceGate)
  • full suite green (548 passed / 3 skipped)
  • typecheck clean
  • e2e: audit over 11,520; enforce --changed exit 0; blocking scope exit 1
  • CI

🤖 Generated with Claude Code

gorajing and others added 2 commits May 25, 2026 14:55
Adds scripts/insight-gate.ts + scripts/lib/insight-gate.ts: a deterministic,
recomputable semantic-quality audit over the insight corpus. Sibling to
health.ts (structural validation) and verify-contracts.ts (doc/code surface) —
this layer asks the quality questions those don't: is the stance directional?
is it attributed to a real source? does its topic match its path? is it a
near-duplicate?

Checks (all deterministic, no LLM in the gate):
- stance present / stance directional (heuristic floor)
- attribution resolves to a known source (by normalized title or url)
- topic matches file path
- nearest-INSIGHT-neighbor cosine novelty: exact cosine on the stored float32
  vectors (the vec0 table returns L2 distance, not cosine), INS-only via k=64
  overfetch, with a block-threshold simulation and deduped triage pairs

Phase 1 is audit-only: it reports, never blocks (exit 1 only on operational
failure). brain.db is opened read-only. Outputs land in the gitignored
knowledge-base/meta/. Forward enforcement (Phase 2) will reuse these checks.

Tests: 47 (unit + in-memory sqlite-vec integration covering blob round-trip,
PRI-/MM- filtering, and exact cosine). Run via: npm run gate -- --audit --all

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the audit checks into a forward-only enforcement gate.

- enforceGate() (lib): partitions blocking failures from non-blocking
  warnings. Near-duplicate (>= cosine threshold, default 0.95) blocks when
  measurable; missing stance blocks by default. stance_directional (a
  heuristic) and attribution_resolves (would reject legitimate synthetic
  insights) are warnings by default, promotable via blockingChecks. An
  unembedded new insight WARNS (dup check unmeasurable) rather than silently
  skipping — fail-closing there would halt ingestion whenever Ollama is down.

- insight-gate.ts --enforce: scopes via --changed (uncommitted insight files;
  fails CLOSED on git error and on changed files that don't load), --since, or
  all. Exits 1 on blocking failures.

- post-ingest gate step: fatal, after embed (needs vectors) and before
  learn/auto-git (a blocked batch must not commit; placement preserves
  forward-only scoping since learn mutates existing insights). Mirrors the
  existing fatal reindex step.

Conservative by design: blocks only on missing stance + near-identical
duplicate, so it won't false-block legitimate or synthetic insights. Ratchet
--max-similarity or promote warning-checks once trusted.

Tests: 8 enforceGate cases. Verified e2e: --changed exit 0; blocking scope exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gorajing gorajing changed the title feat(gate): read-only insight quality audit (Phase 1) feat(gate): insight quality gate — audit + forward enforcement May 25, 2026
The Phase 1/2 gate imports KB_ROOT from ./lib/kb-root, but the module was
untracked (created by an in-progress refactor that was never committed), so the
gate would not build on a fresh checkout. Track it here as the first committed
code to depend on it. Self-contained (only imports node:path); resolves
ZUHN_KB_ROOT or defaults to <repo>/knowledge-base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gorajing gorajing closed this May 25, 2026
@gorajing gorajing reopened this May 25, 2026
@gorajing gorajing merged commit 75f9154 into main May 25, 2026
2 checks passed
@gorajing gorajing deleted the feat/insight-gate-audit branch May 26, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant