test: add property-based tests for export and dedup invariants (issue #44 phase 2) by edheltzel · Pull Request #61 · edheltzel/Recall

edheltzel · 2026-06-11T08:29:02Z

Closes #44 — this completes the issue: phase 1 (PR #57) covered the import + chunking property groups; this PR delivers the two groups deferred there, now that #43 export (PR #59) and #45 dedup (PR #60) are merged.

Summary

Adds two property-based test suites (fast-check under bun:test) over the pure export renderers (src/lib/export.ts) and dedup decision logic (src/lib/dedup.ts), plus a small shared in-memory-DB helper (tests/helpers/memdb.ts). Tests only — no src/ changes.

Per the PR #57 review lesson, every oracle is computed from generator inputs, never from the implementation under test: duplicate groups are built structurally (canonical text + case/whitespace/quote decorations the documented normalization erases), the survivor comparator and provenance priority are restated from the issue #45 spec, and semantic similarities are derived as cos(Δangle) over generated 2-D unit vectors (float32-storage tolerance 1e-4).

Export properties — `tests/lib/export.property.test.ts` (5 tests)

JSON export: manifest counts, per-table (text, provenance) tuple multisets, NULL provenance rendered as the explicit literal 'unknown' (never omitted), no provenance field invented on sessions, per-table provenance histograms with the always-present unknown key
Markdown export: structural stability — one ## table (N rows) heading with the generated count per table, one ### table #id heading per generated row (sound even for adversarial generated text, since value lines can never start at column 0 with #)
SQL dump round-trip: dump restores into an empty database with full row fidelity (restore ∘ dump = identity on every exported table), counts match the generator, and — unlike JSON — NULL provenance restores as NULL (raw-storage fidelity). Apostrophes and newlines are injected with probability 1/2 each so quoting is exercised every run
Manifest subsets: counts accurate for arbitrary non-empty table subsets; provenance histograms always cover all five provenance-bearing tables

Dedup properties — `tests/lib/dedup.property.test.ts` (7 tests)

Exact pass: planned lineage equals the generator-expected (survivor, duplicate, reason, similarity) tuple multiset — pins survivor order under arbitrary provenance mixes (provenance > richness > importance > recency > lowest id) and non-duplicate preservation (singletons and below-MIN_DEDUP_TEXT_LENGTH records appear in no tuple); scanned/tooShort accounting from generator counts
Threshold respect: a threshold above every pairwise generated similarity plans nothing; every planned semantic pair is at/above the threshold, records its true similarity, and keeps the spec-ordered survivor (richness tie-breaker live here via varied generated lengths)
One-hop lineage (within-plan): survivor and duplicate key sets are disjoint and every duplicate is marked at most once — the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 blocker guarantee. Scope note in the file: cross-run one-hop is intentionally not asserted (a prior survivor may legitimately be re-marked by a later semantic run)
Dry-run write-freeness: the database is byte-identical (full-table snapshot) before and after planDedup
Apply / idempotence (exact): apply marks exactly the planned tuples, preserves every record (non-destructive default), and a re-plan finds nothing with correct alreadyMarked accounting
Repeated semantic runs: active marks stay unique, already-marked records are never re-planned (as survivor or duplicate), and all records survive both cycles

Mutation checks (all killed, then reverted)

Mutation	Result
`toExportRow`: drop `?? 'unknown'` (NULL passes through)	JSON property fails (1/5)
`sqlQuote`: drop single-quote escaping	SQL round-trip fails (1/5)
`planDedup`: remove the `plannedSurvivorKeys` one-hop guard (the PR #60 blocker fix)	exactly the one-hop property fails (1/7)
`provenanceRank`: reverse priority order	3/7 fail (exact tuples, semantic survivor, apply/idempotence)
`findSemanticPairs`: `>= threshold` → `>= threshold - 0.1`	both threshold properties fail (2/7)

Testing

bun run lint clean (tsc --noEmit)
Full suite: 587 pass / 0 fail across 53 files (~16s); the two property files add 12 tests / ~11k assertions in ~1.8s
Generator sizes bounded (≤6 rows per table for export; ≤5 groups × ≤4 members for dedup) per the issue's practicality requirement

Post-Deploy Monitoring & Validation

No operational monitoring required — test-only change, no runtime code touched. CI green on both platforms is the validation signal.

Phase 2 of issue #44 — the export (#43) and dedup (#45) property groups deferred from PR #57. Oracles are computed from generator inputs only (tuple-multiset pattern); duplicate groups are built structurally so expected grouping, survivors, and similarities never read the implementation under test.

edheltzel · 2026-06-11T08:57:22Z

Review — issue #44 phase 2 (export + dedup property suites)

Reviewed in a detached worktree at head 5189179. Every claim below was executed locally, not taken from the PR body.

Scope, CI, acceptance criteria

Diff vs merge base is exactly tests/helpers/memdb.ts, tests/lib/dedup.property.test.ts, tests/lib/export.property.test.ts — src/ untouched ✓
CI green on the head SHA itself (commit check-runs API: macOS primary ✓, Ubuntu compatibility smoke ✓)
Local at head: bun run lint clean · full suite 587 pass / 0 fail across 53 files · the 12 property tests run ~10.8k assertions in ~1.8s — all matching the PR body
Issue Add focused property-based tests for import, chunking, export, and dedup invariants #44's export bullets (JSON/Markdown structural stability, provenance presence, manifest counts) and dedup bullets (idempotence, survivor priority by provenance, non-duplicate preservation, lineage auditability after repeated runs) each map onto a concrete property; generator sizes bounded as required. This closes Add focused property-based tests for import, chunking, export, and dedup invariants #44 legitimately.
Stability: 30 consecutive unseeded runs of both property files — 0 failures (~36k generated cases), plus a 2,000-run pinned-seed replay of the one-hop property. Unseeded fc.assert matches the phase-1 convention; fast-check prints seed/path/shrunk counterexample on failure, so any CI failure is reproducible. Do not pin seeds.

1) Oracle purity (the PR #57 lesson) — upheld

Scanned every assertion in both files; nothing is read back from the implementation under test.

SURVIVOR_PRIORITY/specCompare (tests/lib/dedup.property.test.ts:41-66) restate the issue Add non-destructive dedup with provenance-aware survivor selection #45 spec verbatim — user_authored > verbatim > extracted > derived > unknown, then richness → importance → recency; the final lowest-id tie-break matches the determinism rule the implementation documents. Verified the restatement is genuinely independent: reversing provenanceRank in src/lib/dedup.ts makes the spec oracle disagree on the first generated case — the oracle does not follow the implementation.
Duplicate groups are built structurally (canonical text + case/whitespace/quote decorations the documented normalization erases), so grouping is generator-known without calling normalizeText. Similarities are cos(Δangle) over generated 2-D unit vectors; F32_TOLERANCE = 1e-4 correctly absorbs float32 blob storage.
Two patterns that look like purity leaks and are not (recording so future reviews don't re-litigate): the SQL round-trip compares restored rows to source-DB rows — a legitimate inverse-pair property (restore ∘ dump = id) independently anchored to generator tuples and counts; and markedAfterFirst in the repeated-runs test reads run-1 output — acceptable because that property is deliberately relational and run 1 is pinned independently by the exact-multiset property.

2) Mutation kills — spot-checked 4 of the 5 claims, all exact

Mutation	Claimed	Observed
remove `plannedSurvivorKeys` guard (`dedup.ts:526`)	1/7: one-hop	✓ exactly the one-hop disjointness property (counterexample in 19 runs)
reverse `provenanceRank`	3/7	✓ exactly: exact tuples, semantic survivor, apply/idempotence
`toExportRow` drop `?? 'unknown'`	1/5: JSON	✓ exactly (fails on the first generated case)
`sqlQuote` drop quote escaping	1/5: SQL round-trip	✓ exactly (the 1/2-probability apostrophe injection makes this near-deterministic)

All mutations reverted; tree verified clean before posting.

3) Judgment call — the cross-run one-hop scope note

Ruling: the scope note is accurate and the right call for this PR, but the cross-run behavior it describes is a real gap — filed #63.

The within-plan one-hop property correctly pins the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 blocker fix, and asserting cross-run one-hop would (correctly) fail today — the suite is right not to paper over that with a weaker assertion.
However, "a prior survivor may legitimately be re-marked by a later run" should not be read as principle-consistent. PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60's Blocking 1 grounded "no transitive chaining" in a user-facing property — every hidden record has a visible survivor at ≥ threshold — which is a whole-database, cross-run property. It decays across runs: run 1 marks B→A; run 2 may mark A→C; B's visible representative becomes C with sim(B,C) possibly below threshold (semantic→semantic chains; exact chains are benign). The per-row invariant in dedup.ts's header survives; the visible-survivor property does not. That is an implementation gap from Add non-destructive dedup with provenance-aware survivor selection #45/feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 — not a defect in these tests — so it is tracked in dedup: cross-run re-marking of a prior survivor breaks the visible-survivor safety property (transitive chains across runs) #63 rather than blocking a tests-only PR.

4) `tests/helpers/memdb.ts` — DRY clean

No duplication of tests/helpers/setup.ts: that helper is a file-backed initDb() + env-var harness for code using the getDb() singleton; memdb.ts is an in-memory schema-only build for per-case property loops — complementary strategies, and the DDL comes from the canonical src/db/schema.ts constants (the same source connection.ts uses). The FTS-omission claim was verified against the code: nothing under property test touches FTS, and only the non-destructive applyDedupPlan path (pure dedup_lineage INSERTs) is exercised. PRAGMA foreign_keys = ON matches production.

5) Generators & shrinking

Only record/array/integer/constantFrom/tuple().map() combinators — cleanly shrinkable, no side effects in generators, no Date.now()/env mutation, per-case DBs closed in finally. Adversarial inputs are genuinely attempted and defended: # heading injection (the Markdown column-0 argument checks out against markdownValue), quotes, newlines, apostrophes, null provenance/project.

Non-blocking suggestions (follow-up material; none block merge)

Semantic completeness is soundness-only. Every semantic property quantifies over planned entries, so a regression that silently drops eligible pairs would pass the property suite (only tests/commands/dedup.test.ts:186-203 pins one positive case). A two-record exactly-determined property — sim ≥ t+tol ⇒ exactly one spec tuple, sim < t−tol ⇒ empty plan — closes this and stays oracle-pure.
planDedup({ project }) is untested anywhere despite CLI exposure; cheap property against expectedExactTuples(records.filter(...)) on a scoped run.
Multi-text-column scanning (learnings problem+solution, loa_entries title+fabric_extract in TABLE_SCAN_CONFIG) has no coverage in any suite (predates this PR).
plan.crossTable.semanticPairs is incremented but never asserted.
textArb is printable-ASCII; a fc.string({ unit: 'grapheme' }) branch would widen unicode coverage for free.

Verdict

APPROVE

edheltzel mentioned this pull request Jun 11, 2026

dedup: cross-run re-marking of a prior survivor breaks the visible-survivor safety property (transitive chains across runs) #63

Closed

edheltzel merged commit 841d5bb into main Jun 11, 2026
2 checks passed

edheltzel deleted the test/44-export-dedup-properties branch June 11, 2026 20:06

edheltzel mentioned this pull request Jun 11, 2026

tests: optional follow-ups — single-file slack import test; semantic-completeness property #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add property-based tests for export and dedup invariants (issue #44 phase 2)#61

test: add property-based tests for export and dedup invariants (issue #44 phase 2)#61
edheltzel merged 1 commit into
mainfrom
test/44-export-dedup-properties

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edheltzel commented Jun 11, 2026

Summary

Export properties — tests/lib/export.property.test.ts (5 tests)

Dedup properties — tests/lib/dedup.property.test.ts (7 tests)

Mutation checks (all killed, then reverted)

Testing

Post-Deploy Monitoring & Validation

Uh oh!

edheltzel commented Jun 11, 2026

Review — issue #44 phase 2 (export + dedup property suites)

Scope, CI, acceptance criteria

1) Oracle purity (the PR #57 lesson) — upheld

2) Mutation kills — spot-checked 4 of the 5 claims, all exact

3) Judgment call — the cross-run one-hop scope note

4) tests/helpers/memdb.ts — DRY clean

5) Generators & shrinking

Non-blocking suggestions (follow-up material; none block merge)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Export properties — `tests/lib/export.property.test.ts` (5 tests)

Dedup properties — `tests/lib/dedup.property.test.ts` (7 tests)

4) `tests/helpers/memdb.ts` — DRY clean