Skip to content

test: add property-based tests for export and dedup invariants (issue #44 phase 2)#61

Merged
edheltzel merged 1 commit into
mainfrom
test/44-export-dedup-properties
Jun 11, 2026
Merged

test: add property-based tests for export and dedup invariants (issue #44 phase 2)#61
edheltzel merged 1 commit into
mainfrom
test/44-export-dedup-properties

Conversation

@edheltzel

Copy link
Copy Markdown
Owner

Closes #44 — this completes the issue: phase 1 (PR #57) covered the import + chunking property groups; this PR delivers the two groups deferred there, now that #43 export (PR #59) and #45 dedup (PR #60) are merged.

Summary

Adds two property-based test suites (fast-check under bun:test) over the pure export renderers (src/lib/export.ts) and dedup decision logic (src/lib/dedup.ts), plus a small shared in-memory-DB helper (tests/helpers/memdb.ts). Tests only — no src/ changes.

Per the PR #57 review lesson, every oracle is computed from generator inputs, never from the implementation under test: duplicate groups are built structurally (canonical text + case/whitespace/quote decorations the documented normalization erases), the survivor comparator and provenance priority are restated from the issue #45 spec, and semantic similarities are derived as cos(Δangle) over generated 2-D unit vectors (float32-storage tolerance 1e-4).

Export properties — tests/lib/export.property.test.ts (5 tests)

  • JSON export: manifest counts, per-table (text, provenance) tuple multisets, NULL provenance rendered as the explicit literal 'unknown' (never omitted), no provenance field invented on sessions, per-table provenance histograms with the always-present unknown key
  • Markdown export: structural stability — one ## table (N rows) heading with the generated count per table, one ### table #id heading per generated row (sound even for adversarial generated text, since value lines can never start at column 0 with #)
  • SQL dump round-trip: dump restores into an empty database with full row fidelity (restore ∘ dump = identity on every exported table), counts match the generator, and — unlike JSON — NULL provenance restores as NULL (raw-storage fidelity). Apostrophes and newlines are injected with probability 1/2 each so quoting is exercised every run
  • Manifest subsets: counts accurate for arbitrary non-empty table subsets; provenance histograms always cover all five provenance-bearing tables

Dedup properties — tests/lib/dedup.property.test.ts (7 tests)

  • Exact pass: planned lineage equals the generator-expected (survivor, duplicate, reason, similarity) tuple multiset — pins survivor order under arbitrary provenance mixes (provenance > richness > importance > recency > lowest id) and non-duplicate preservation (singletons and below-MIN_DEDUP_TEXT_LENGTH records appear in no tuple); scanned/tooShort accounting from generator counts
  • Threshold respect: a threshold above every pairwise generated similarity plans nothing; every planned semantic pair is at/above the threshold, records its true similarity, and keeps the spec-ordered survivor (richness tie-breaker live here via varied generated lengths)
  • One-hop lineage (within-plan): survivor and duplicate key sets are disjoint and every duplicate is marked at most once — the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 blocker guarantee. Scope note in the file: cross-run one-hop is intentionally not asserted (a prior survivor may legitimately be re-marked by a later semantic run)
  • Dry-run write-freeness: the database is byte-identical (full-table snapshot) before and after planDedup
  • Apply / idempotence (exact): apply marks exactly the planned tuples, preserves every record (non-destructive default), and a re-plan finds nothing with correct alreadyMarked accounting
  • Repeated semantic runs: active marks stay unique, already-marked records are never re-planned (as survivor or duplicate), and all records survive both cycles

Mutation checks (all killed, then reverted)

Mutation Result
toExportRow: drop ?? 'unknown' (NULL passes through) JSON property fails (1/5)
sqlQuote: drop single-quote escaping SQL round-trip fails (1/5)
planDedup: remove the plannedSurvivorKeys one-hop guard (the PR #60 blocker fix) exactly the one-hop property fails (1/7)
provenanceRank: reverse priority order 3/7 fail (exact tuples, semantic survivor, apply/idempotence)
findSemanticPairs: >= threshold>= threshold - 0.1 both threshold properties fail (2/7)

Testing

  • bun run lint clean (tsc --noEmit)
  • Full suite: 587 pass / 0 fail across 53 files (~16s); the two property files add 12 tests / ~11k assertions in ~1.8s
  • Generator sizes bounded (≤6 rows per table for export; ≤5 groups × ≤4 members for dedup) per the issue's practicality requirement

Post-Deploy Monitoring & Validation

No operational monitoring required — test-only change, no runtime code touched. CI green on both platforms is the validation signal.

Phase 2 of issue #44 — the export (#43) and dedup (#45) property groups
deferred from PR #57. Oracles are computed from generator inputs only
(tuple-multiset pattern); duplicate groups are built structurally so
expected grouping, survivors, and similarities never read the
implementation under test.
@edheltzel

Copy link
Copy Markdown
Owner Author

Review — issue #44 phase 2 (export + dedup property suites)

Reviewed in a detached worktree at head 5189179. Every claim below was executed locally, not taken from the PR body.

Scope, CI, acceptance criteria

  • Diff vs merge base is exactly tests/helpers/memdb.ts, tests/lib/dedup.property.test.ts, tests/lib/export.property.test.tssrc/ untouched
  • CI green on the head SHA itself (commit check-runs API: macOS primary ✓, Ubuntu compatibility smoke ✓)
  • Local at head: bun run lint clean · full suite 587 pass / 0 fail across 53 files · the 12 property tests run ~10.8k assertions in ~1.8s — all matching the PR body
  • Issue Add focused property-based tests for import, chunking, export, and dedup invariants #44's export bullets (JSON/Markdown structural stability, provenance presence, manifest counts) and dedup bullets (idempotence, survivor priority by provenance, non-duplicate preservation, lineage auditability after repeated runs) each map onto a concrete property; generator sizes bounded as required. This closes Add focused property-based tests for import, chunking, export, and dedup invariants #44 legitimately.
  • Stability: 30 consecutive unseeded runs of both property files — 0 failures (~36k generated cases), plus a 2,000-run pinned-seed replay of the one-hop property. Unseeded fc.assert matches the phase-1 convention; fast-check prints seed/path/shrunk counterexample on failure, so any CI failure is reproducible. Do not pin seeds.

1) Oracle purity (the PR #57 lesson) — upheld

Scanned every assertion in both files; nothing is read back from the implementation under test.

  • SURVIVOR_PRIORITY/specCompare (tests/lib/dedup.property.test.ts:41-66) restate the issue Add non-destructive dedup with provenance-aware survivor selection #45 spec verbatim — user_authored > verbatim > extracted > derived > unknown, then richness → importance → recency; the final lowest-id tie-break matches the determinism rule the implementation documents. Verified the restatement is genuinely independent: reversing provenanceRank in src/lib/dedup.ts makes the spec oracle disagree on the first generated case — the oracle does not follow the implementation.
  • Duplicate groups are built structurally (canonical text + case/whitespace/quote decorations the documented normalization erases), so grouping is generator-known without calling normalizeText. Similarities are cos(Δangle) over generated 2-D unit vectors; F32_TOLERANCE = 1e-4 correctly absorbs float32 blob storage.
  • Two patterns that look like purity leaks and are not (recording so future reviews don't re-litigate): the SQL round-trip compares restored rows to source-DB rows — a legitimate inverse-pair property (restore ∘ dump = id) independently anchored to generator tuples and counts; and markedAfterFirst in the repeated-runs test reads run-1 output — acceptable because that property is deliberately relational and run 1 is pinned independently by the exact-multiset property.

2) Mutation kills — spot-checked 4 of the 5 claims, all exact

Mutation Claimed Observed
remove plannedSurvivorKeys guard (dedup.ts:526) 1/7: one-hop ✓ exactly the one-hop disjointness property (counterexample in 19 runs)
reverse provenanceRank 3/7 ✓ exactly: exact tuples, semantic survivor, apply/idempotence
toExportRow drop ?? 'unknown' 1/5: JSON ✓ exactly (fails on the first generated case)
sqlQuote drop quote escaping 1/5: SQL round-trip ✓ exactly (the 1/2-probability apostrophe injection makes this near-deterministic)

All mutations reverted; tree verified clean before posting.

3) Judgment call — the cross-run one-hop scope note

Ruling: the scope note is accurate and the right call for this PR, but the cross-run behavior it describes is a real gap — filed #63.

4) tests/helpers/memdb.ts — DRY clean

No duplication of tests/helpers/setup.ts: that helper is a file-backed initDb() + env-var harness for code using the getDb() singleton; memdb.ts is an in-memory schema-only build for per-case property loops — complementary strategies, and the DDL comes from the canonical src/db/schema.ts constants (the same source connection.ts uses). The FTS-omission claim was verified against the code: nothing under property test touches FTS, and only the non-destructive applyDedupPlan path (pure dedup_lineage INSERTs) is exercised. PRAGMA foreign_keys = ON matches production.

5) Generators & shrinking

Only record/array/integer/constantFrom/tuple().map() combinators — cleanly shrinkable, no side effects in generators, no Date.now()/env mutation, per-case DBs closed in finally. Adversarial inputs are genuinely attempted and defended: # heading injection (the Markdown column-0 argument checks out against markdownValue), quotes, newlines, apostrophes, null provenance/project.

Non-blocking suggestions (follow-up material; none block merge)

  1. Semantic completeness is soundness-only. Every semantic property quantifies over planned entries, so a regression that silently drops eligible pairs would pass the property suite (only tests/commands/dedup.test.ts:186-203 pins one positive case). A two-record exactly-determined property — sim ≥ t+tol ⇒ exactly one spec tuple, sim < t−tol ⇒ empty plan — closes this and stays oracle-pure.
  2. planDedup({ project }) is untested anywhere despite CLI exposure; cheap property against expectedExactTuples(records.filter(...)) on a scoped run.
  3. Multi-text-column scanning (learnings problem+solution, loa_entries title+fabric_extract in TABLE_SCAN_CONFIG) has no coverage in any suite (predates this PR).
  4. plan.crossTable.semanticPairs is incremented but never asserted.
  5. textArb is printable-ASCII; a fc.string({ unit: 'grapheme' }) branch would widen unicode coverage for free.

Verdict

APPROVE

@edheltzel edheltzel merged commit 841d5bb into main Jun 11, 2026
2 checks passed
@edheltzel edheltzel deleted the test/44-export-dedup-properties branch June 11, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add focused property-based tests for import, chunking, export, and dedup invariants

1 participant