test: add property-based tests for export and dedup invariants (issue #44 phase 2)#61
Conversation
Review — issue #44 phase 2 (export + dedup property suites)Reviewed in a detached worktree at head Scope, CI, acceptance criteria
1) Oracle purity (the PR #57 lesson) — upheldScanned every assertion in both files; nothing is read back from the implementation under test.
2) Mutation kills — spot-checked 4 of the 5 claims, all exact
All mutations reverted; tree verified clean before posting. 3) Judgment call — the cross-run one-hop scope noteRuling: the scope note is accurate and the right call for this PR, but the cross-run behavior it describes is a real gap — filed #63.
4)
|
Closes #44 — this completes the issue: phase 1 (PR #57) covered the import + chunking property groups; this PR delivers the two groups deferred there, now that #43 export (PR #59) and #45 dedup (PR #60) are merged.
Summary
Adds two property-based test suites (fast-check under
bun:test) over the pure export renderers (src/lib/export.ts) and dedup decision logic (src/lib/dedup.ts), plus a small shared in-memory-DB helper (tests/helpers/memdb.ts). Tests only — nosrc/changes.Per the PR #57 review lesson, every oracle is computed from generator inputs, never from the implementation under test: duplicate groups are built structurally (canonical text + case/whitespace/quote decorations the documented normalization erases), the survivor comparator and provenance priority are restated from the issue #45 spec, and semantic similarities are derived as
cos(Δangle)over generated 2-D unit vectors (float32-storage tolerance 1e-4).Export properties —
tests/lib/export.property.test.ts(5 tests)(text, provenance)tuple multisets, NULL provenance rendered as the explicit literal'unknown'(never omitted), no provenance field invented onsessions, per-table provenance histograms with the always-presentunknownkey## table (N rows)heading with the generated count per table, one### table #idheading per generated row (sound even for adversarial generated text, since value lines can never start at column 0 with#)restore ∘ dump= identity on every exported table), counts match the generator, and — unlike JSON — NULL provenance restores as NULL (raw-storage fidelity). Apostrophes and newlines are injected with probability 1/2 each so quoting is exercised every runDedup properties —
tests/lib/dedup.property.test.ts(7 tests)(survivor, duplicate, reason, similarity)tuple multiset — pins survivor order under arbitrary provenance mixes (provenance > richness > importance > recency > lowest id) and non-duplicate preservation (singletons and below-MIN_DEDUP_TEXT_LENGTHrecords appear in no tuple);scanned/tooShortaccounting from generator countsplanDedupalreadyMarkedaccountingMutation checks (all killed, then reverted)
toExportRow: drop?? 'unknown'(NULL passes through)sqlQuote: drop single-quote escapingplanDedup: remove theplannedSurvivorKeysone-hop guard (the PR #60 blocker fix)provenanceRank: reverse priority orderfindSemanticPairs:>= threshold→>= threshold - 0.1Testing
bun run lintclean (tsc --noEmit)Post-Deploy Monitoring & Validation
No operational monitoring required — test-only change, no runtime code touched. CI green on both platforms is the validation signal.