fix(dedup): make recorded survivors sticky across runs#79
Conversation
Cross-run re-marking of a prior survivor could break the visible-survivor safety property: dedup_lineage accumulated marked->marked chains, leaving hidden (or, with --delete, permanently removed) records whose only visible neighbor was never measured against them. - Sticky survivors (option a): planDedup loads survivor keys from dedup_lineage (status 'marked'/'deleted', mirroring loadMarkedDuplicates) and seeds plannedSurvivorKeys with them, extending the within-plan one-hop guard across runs; the exact pass skips them as duplicates too. Skips are reported per table (stickySkipped) and in the CLI summary. - Destructive-mode guard (option c, defense-in-depth): applyDedupPlan refuses — throws before writing — when a planned duplicate is itself a recorded survivor; runDedup surfaces the refusal as a clean error with exit code 1. - Property suite: cross-run sticky-survivor property (two waves plus angle-targeted challengers) pins 'a recorded survivor is never re-marked across runs' and final-lineage survivor/duplicate disjointness; the scope note now states the cross-run invariant. Mutation-checked: removing the exclusion fails exactly the new property. Chain compression (option b) and mark->delete escalation (#73) are deliberately untouched. Closes #63
edheltzel
left a comment
There was a problem hiding this comment.
Review — PR #79 (sticky survivors, issue #63)
Reviewed at head 6047ece via the pr-review-toolkit multi-agent workflow (code review, test coverage, comment accuracy, silent-failure audit) plus independent hands-on verification: scratch-DB reproduction of issue #63's exact scenario against both main and this branch, and three mutation runs. CI is green on the head commit (macOS primary + Ubuntu smoke, run on 6047ece).
Independent verification results
- Issue #63 reproduced and fixed. Scratch in-memory DB, the issue's exact two-run A-B-C composition (angles 0°/18°/342° → sim(A,B)=sim(A,C)=0.9511 ≥ 0.95, sim(B,C)=0.809 < 0.95; C out-ranks A, A out-ranks B). On
main: run 2 plansA → C, leaving B's only visible representative below threshold — and in destructive mode B then A are hard-deleted, leaving only C. On this branch: run 2 plans nothing, reportsstickySkipped: 1, and A stays visible — in both marking and destructive modes (the'deleted'status path works correctly). loadRecordedSurvivorsgenuinely mirrorsloadMarkedDuplicates(same prepare/all/table:id-Set shape, src/lib/dedup.ts:350-355 vs 364-370); the two deltas (DISTINCT,status IN ('marked','deleted')) are correct and explained in the docstring. Codegraph confirms blast radius:planDedup/applyDedupPlanhave exactly one production caller (runDedup), so the new throw reaches only the new CLI catch; the MCP server is untouched.- Destructive guard: verified the throw precedes the INSERT prepare and the transaction (src/lib/dedup.ts:613-624 vs 626/644) — genuinely write-free; the handcrafted-plan test pins both the throw and zero-writes via full-DB snapshot. The unreachability-through-
planDedupclaim holds by construction and in my repro. Library-level coverage of the guard itself is sufficient. - Mutation checks (run locally):
- Call-site stub (
stickySurvivors = new Set()inplanDedup): 653 pass / 1 fail — exactly the new cross-run property, first generated case. The PR's mutation claim verifies verbatim. - Whole-function stub: 652 pass / 2 fail — cross-run property + destructive-guard test. Both defense layers are load-bearing.
WHERE status IN ('marked','deleted')→WHERE status = 'marked': 654 pass / 0 fail. See Required below.
- Call-site stub (
- Nothing precludes option (b) or #73: zero
UPDATE dedup_lineagestatements insrc/;planDedupis read-only andapplyDedupPlanonly INSERTs lineage rows. Chain compression and mark→delete escalation remain open. - Docs: the dedup.ts header bullet and the cli-reference.md safety-model bullet match verified behavior (refusal wording, order-dependence tradeoff, no-writes claim). Oracle discipline in the new property block is upheld (relational invariants over generator-driven inputs;
survivorsAfterRun1is used as a premise, never compared back to itself), and the scope-note update accurately retires the now-false cross-run sentence.
Required (blocking)
R1. Pin 'deleted'-status stickiness with a deterministic test. Mutating loadRecordedSurvivors from status IN ('marked','deleted') to status = 'marked' passes the entire suite (654/0). That mutation re-exposes precisely the data-loss half of issue #63 — the half the issue calls "worse" — and it defeats both defense layers at once, since planning stickiness and the destructive guard share this one function. Cause: the cross-run property only applies plans non-destructively, and the guard test's run-1 lineage is 'marked', so no test ever creates a 'deleted' lineage row and asserts stickiness on its survivor. The triage asked for a test pinning the chosen invariant, and the PR's own verification section sets mutation-resistance as the proof standard — this one-token mutation ships invisibly today. Fix is ~10-15 lines in the new describe block: run 1 with { destructive: true } on an exact pair, then assert (a) run 2 never plans the recorded survivor as a duplicate, and (b) a handcrafted destructive plan against that survivor still refuses. (My scratch repro confirms the implementation already behaves correctly here — this is purely a regression-protection gap, which is why it's the only blocker.)
Strongly recommended (non-blocking, but cheap while you're in here)
S1. Document the semantic-pass consolidation blind spot. Because the sticky seed lives in plannedSurvivorKeys and the semantic guard skips the entire pair when either side is a survivor (src/lib/dedup.ts:563-568), a recorded survivor can never acquire new semantic duplicates in any later run — fresh near-duplicates of an old survivor stay separate visible records forever (they're re-skipped and re-counted as stickySkipped on every run). Exact-text duplicates still consolidate under a sticky survivor that wins its group (src/lib/dedup.ts:507-516), so the two passes behave asymmetrically for the same record. This is consistent with PR #60's established greedy semantics and conservative in the safe direction — not a bug — but the header, the cli-reference bullet, and the seeding comment all document only the "survivor never re-marked" direction. One sentence in the dedup.ts header + the docs bullet (and ideally a one-line comment at the semantic guard distinguishing it from the exact-pass treatment) prevents this being rediscovered as a bug. Two of four review agents flagged it independently.
S2. Scope the invariant prose to dedup runs, and file the prune interaction as a follow-up. "Every hidden or deleted record keeps a visible survivor" is enforced only against dedup runs: recall prune hard-deletes expired breadcrumbs and aged superseded/reverted decisions with no dedup_lineage awareness (src/commands/prune.ts:142-154), so a pruned survivor orphans its hidden duplicates. Suggest "no later dedup run ever hides or deletes a recorded survivor" phrasing, plus a follow-up issue for prune × dedup_lineage (searched open+closed issues — only #53 is adjacent, and it doesn't cover this).
Suggestions (non-blocking)
- CLI catch is broader than its comment (src/commands/dedup.ts:137-145): it swallows every
applyDedupPlanerror — SQLite busy/corrupt, constraint failures, future bugs — printing message-only stderr with no stack, and only the guard's message carries the "No changes were made" reassurance. Day-to-day the guard is unreachable via the CLI (plan+apply run back-to-back on one handle), so this catch will mostly fire on exactly the errors it handles worst. Consider a typedDedupRefusedErrorthrown by the guard, with rethrow for everything else (precedent: src/commands/export.ts:138-144 handles-then-rethrows) — or at minimum widen the comment. stickySkippedaccounting nits: the CLI label "N prior survivor(s) kept visible" counts skipped marks/pairs, not survivors (one sticky survivor in three semantic pairs prints 3); the semantic-pass counter also increments when stickiness wasn't the deciding factor, and a record can be counted in both passes in one run. Cosmetic; consider "N mark(s) skipped (prior survivor kept visible)". Relatedly, a dry run whose only findings are sticky-skipped prints "No duplicates found." (src/commands/dedup.ts:126-132) one line after reporting the skips.- Untested surfaces (follow-up material): the CLI catch path (stderr message +
process.exitCode = 1— the existing harness in tests/commands/dedup.test.ts already captures both),stickySkippedassertions (one exact, one semantic), and the'reverted'exclusion (unreachable today — schema.ts:188 reserves it for a future unmark path — but a 10-line test inserting a reverted row directly would protect that feature from day one).
Strengths
- The core mechanism is exactly the triaged remediation and it verifiably works: my pre/post reproduction shows the precise issue-#63 failure on
mainand the refusal on this branch, both modes. - The destructive guard is defense-in-depth done right: write-free by construction, pinned by a full-DB snapshot test, with an exemplary error message (count, sample id, rationale, "No changes were made").
- The property test attacks the hazard surgically (angle-targeted, top-ranked challengers shadowing wave-1 records) rather than hoping random inputs collide, and it pins the invariant without freezing the conservative implementation — a future precision improvement to the semantic pass would still pass.
- Surgical diff, conventions held, comments updated in lockstep (the stale "may legitimately be re-marked" scope-note sentence was correctly retired);
stickySkippedis surfaced so no exclusion is silent. - DRY: one loader serves both the planner seed and the guard.
Verdict
REQUEST CHANGES — one blocking item (R1, ~10-15 line test pinning 'deleted'-status stickiness); everything else is correct and verified, so this should turn around fast.
Review R1 on PR #79: mutating loadRecordedSurvivors' filter from status IN ('marked','deleted') to status = 'marked' passed the whole suite — no test created a 'deleted' lineage row and asserted stickiness on its survivor, so a one-token regression would silently re-expose the destructive data-loss half of issue #63 and defeat both defense layers at once (planner seed and guard share the loader). New deterministic test: run 1 applies an exact pair destructively (lineage status 'deleted'), then asserts run 2 never re-plans the recorded survivor (stickySkipped reported instead) and that a handcrafted destructive plan against that survivor still refuses write-free. Mutation re-run: dropping 'deleted' from the filter now fails exactly this test (654 pass / 1 fail); restored, full suite 655 pass / 0 fail, lint clean. No source change — the filter was already correct.
|
R1 addressed in fb60cf5 — deterministic test pinning 'deleted'-status survivor stickiness, per the shape specified in the review:
Mutation re-run (the reviewer's exact mutation): No source change — as the review's scratch repro anticipated, the filter was already correct; this was purely the regression-protection gap. S1/S2 doc items and the suggestions are deferred per scope. |
edheltzel
left a comment
There was a problem hiding this comment.
Re-review — delta since 6047ece (blocker R1)
Re-verified at head fb60cf5 (single test-only commit, +58 lines in tests/lib/dedup.property.test.ts; no source or docs changes, so the deferred non-blocking items S1/S2 and suggestions are untouched as expected).
- Mutation re-run (independent): applied my exact R1 mutation —
WHERE status IN ('marked','deleted')→WHERE status = 'marked'inloadRecordedSurvivors— in a fresh worktree atfb60cf5. Result: 654 pass / 1 fail, the sole failure being the new test. Clean checkout: 655 pass / 0 fail. Both numbers match the worker's report verbatim. The previously surviving mutation is now caught. - Test quality: fully deterministic (fixed inputs, no fast-check,
semantic: false), correctly scoped to the blocker, and it pins both defense layers through the'deleted'path — the planner seed (run-2 exact group where a newer equal-rank record wins on recency → empty plan +stickySkipped === 1) and the destructive guard (handcrafted plan →toThrow+ full-DB snapshot equality, proving write-freeness). It also pins the run-1 lineage status as'deleted'explicitly, so the test can't silently degrade into a second'marked'-path test. - CI: green on head
fb60cf5(macOS primary + Ubuntu smoke, run 27378002026, completed/success).
The non-blocking items from the original review (semantic-pass consolidation blind-spot docs, prune × dedup_lineage follow-up issue, CLI catch narrowing, stickySkipped labeling) remain valid follow-up material and were explicitly not gating.
Verdict
APPROVE
Closes #63
What
Implements the triage decision on #63: option (a) sticky survivors plus option (c)'s destructive-mode guard as defense-in-depth.
loadRecordedSurvivorsloads survivor keys fromdedup_lineage(statusmarked/deleted, mirroringloadMarkedDuplicates).planDedupseedsplannedSurvivorKeyswith them, extending the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 within-plan one-hop guard across runs; the exact pass additionally skips sticky survivors as duplicates (a newer record can still out-rank one within its group — the sticky record just stays visible). Skips are counted per table (stickySkipped) and surfaced in the CLI summary, so the exclusion is never silent.applyDedupPlanwithdestructiverefuses (throws before any write) when a planned duplicate is itself a recorded survivor;runDedupsurfaces the refusal as a clean error message with exit code 1.planDedupcan no longer produce such a plan, so this path is unreachable end-to-end by design — it exists as belt-and-braces should (1) ever regress.planDedup, so the mutation check below isolates the property). Oracle discipline preserved: assertions are relational invariants over generator-driven inputs.Also documents the new safety-model bullet in
docs/cli-reference.mdand thededup.tsheader.What this deliberately does NOT do
Verification
bun run lintclean; fullbun test: 654 pass / 0 fail.loadRecordedSurvivors(db)with an empty set fails exactly one test — the new cross-run property — on the first generated case (653 pass / 1 fail). Restored and re-verified green.Tradeoff (per triage)
Survivorship is order-dependent across runs: an early survivor is never consolidated under a genuinely better later record (it stays visible instead). Conservative in the safe direction; cosmetic next to the data-loss risk.