Skip to content

fix(dedup): make recorded survivors sticky across runs#79

Merged
edheltzel merged 2 commits into
mainfrom
fix/issue-63-sticky-survivors
Jun 11, 2026
Merged

fix(dedup): make recorded survivors sticky across runs#79
edheltzel merged 2 commits into
mainfrom
fix/issue-63-sticky-survivors

Conversation

@edheltzel

Copy link
Copy Markdown
Owner

Closes #63

What

Implements the triage decision on #63: option (a) sticky survivors plus option (c)'s destructive-mode guard as defense-in-depth.

  1. Sticky survivorsloadRecordedSurvivors loads survivor keys from dedup_lineage (status marked/deleted, mirroring loadMarkedDuplicates). planDedup seeds plannedSurvivorKeys with them, extending the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 within-plan one-hop guard across runs; the exact pass additionally skips sticky survivors as duplicates (a newer record can still out-rank one within its group — the sticky record just stays visible). Skips are counted per table (stickySkipped) and surfaced in the CLI summary, so the exclusion is never silent.
  2. Destructive-mode guardapplyDedupPlan with destructive refuses (throws before any write) when a planned duplicate is itself a recorded survivor; runDedup surfaces the refusal as a clean error message with exit code 1. planDedup can no longer produce such a plan, so this path is unreachable end-to-end by design — it exists as belt-and-braces should (1) ever regress.
  3. Cross-run property test — two-wave generation (second corpus joins existing exact groups; angle-targeted, top-priority semantic challengers shadow wave-1 records) pins a recorded survivor is never re-marked across runs, both in the run-2 plan and as survivor/duplicate disjointness over final lineage. The scope note at the top of the suite now states the cross-run invariant. A deterministic test pins the destructive guard via a handcrafted plan (independent of planDedup, so the mutation check below isolates the property). Oracle discipline preserved: assertions are relational invariants over generator-driven inputs.

Also documents the new safety-model bullet in docs/cli-reference.md and the dedup.ts header.

What this deliberately does NOT do

Verification

  • bun run lint clean; full bun test: 654 pass / 0 fail.
  • Mutation check: replacing loadRecordedSurvivors(db) with an empty set fails exactly one test — the new cross-run property — on the first generated case (653 pass / 1 fail). Restored and re-verified green.

Tradeoff (per triage)

Survivorship is order-dependent across runs: an early survivor is never consolidated under a genuinely better later record (it stays visible instead). Conservative in the safe direction; cosmetic next to the data-loss risk.

Cross-run re-marking of a prior survivor could break the visible-survivor
safety property: dedup_lineage accumulated marked->marked chains, leaving
hidden (or, with --delete, permanently removed) records whose only visible
neighbor was never measured against them.

- Sticky survivors (option a): planDedup loads survivor keys from
  dedup_lineage (status 'marked'/'deleted', mirroring loadMarkedDuplicates)
  and seeds plannedSurvivorKeys with them, extending the within-plan one-hop
  guard across runs; the exact pass skips them as duplicates too. Skips are
  reported per table (stickySkipped) and in the CLI summary.
- Destructive-mode guard (option c, defense-in-depth): applyDedupPlan
  refuses — throws before writing — when a planned duplicate is itself a
  recorded survivor; runDedup surfaces the refusal as a clean error with
  exit code 1.
- Property suite: cross-run sticky-survivor property (two waves plus
  angle-targeted challengers) pins 'a recorded survivor is never re-marked
  across runs' and final-lineage survivor/duplicate disjointness; the
  scope note now states the cross-run invariant. Mutation-checked: removing
  the exclusion fails exactly the new property.

Chain compression (option b) and mark->delete escalation (#73) are
deliberately untouched.

Closes #63

@edheltzel edheltzel left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #79 (sticky survivors, issue #63)

Reviewed at head 6047ece via the pr-review-toolkit multi-agent workflow (code review, test coverage, comment accuracy, silent-failure audit) plus independent hands-on verification: scratch-DB reproduction of issue #63's exact scenario against both main and this branch, and three mutation runs. CI is green on the head commit (macOS primary + Ubuntu smoke, run on 6047ece).

Independent verification results

  1. Issue #63 reproduced and fixed. Scratch in-memory DB, the issue's exact two-run A-B-C composition (angles 0°/18°/342° → sim(A,B)=sim(A,C)=0.9511 ≥ 0.95, sim(B,C)=0.809 < 0.95; C out-ranks A, A out-ranks B). On main: run 2 plans A → C, leaving B's only visible representative below threshold — and in destructive mode B then A are hard-deleted, leaving only C. On this branch: run 2 plans nothing, reports stickySkipped: 1, and A stays visible — in both marking and destructive modes (the 'deleted' status path works correctly).
  2. loadRecordedSurvivors genuinely mirrors loadMarkedDuplicates (same prepare/all/table:id-Set shape, src/lib/dedup.ts:350-355 vs 364-370); the two deltas (DISTINCT, status IN ('marked','deleted')) are correct and explained in the docstring. Codegraph confirms blast radius: planDedup/applyDedupPlan have exactly one production caller (runDedup), so the new throw reaches only the new CLI catch; the MCP server is untouched.
  3. Destructive guard: verified the throw precedes the INSERT prepare and the transaction (src/lib/dedup.ts:613-624 vs 626/644) — genuinely write-free; the handcrafted-plan test pins both the throw and zero-writes via full-DB snapshot. The unreachability-through-planDedup claim holds by construction and in my repro. Library-level coverage of the guard itself is sufficient.
  4. Mutation checks (run locally):
    • Call-site stub (stickySurvivors = new Set() in planDedup): 653 pass / 1 fail — exactly the new cross-run property, first generated case. The PR's mutation claim verifies verbatim.
    • Whole-function stub: 652 pass / 2 fail — cross-run property + destructive-guard test. Both defense layers are load-bearing.
    • WHERE status IN ('marked','deleted')WHERE status = 'marked': 654 pass / 0 fail. See Required below.
  5. Nothing precludes option (b) or #73: zero UPDATE dedup_lineage statements in src/; planDedup is read-only and applyDedupPlan only INSERTs lineage rows. Chain compression and mark→delete escalation remain open.
  6. Docs: the dedup.ts header bullet and the cli-reference.md safety-model bullet match verified behavior (refusal wording, order-dependence tradeoff, no-writes claim). Oracle discipline in the new property block is upheld (relational invariants over generator-driven inputs; survivorsAfterRun1 is used as a premise, never compared back to itself), and the scope-note update accurately retires the now-false cross-run sentence.

Required (blocking)

R1. Pin 'deleted'-status stickiness with a deterministic test. Mutating loadRecordedSurvivors from status IN ('marked','deleted') to status = 'marked' passes the entire suite (654/0). That mutation re-exposes precisely the data-loss half of issue #63 — the half the issue calls "worse" — and it defeats both defense layers at once, since planning stickiness and the destructive guard share this one function. Cause: the cross-run property only applies plans non-destructively, and the guard test's run-1 lineage is 'marked', so no test ever creates a 'deleted' lineage row and asserts stickiness on its survivor. The triage asked for a test pinning the chosen invariant, and the PR's own verification section sets mutation-resistance as the proof standard — this one-token mutation ships invisibly today. Fix is ~10-15 lines in the new describe block: run 1 with { destructive: true } on an exact pair, then assert (a) run 2 never plans the recorded survivor as a duplicate, and (b) a handcrafted destructive plan against that survivor still refuses. (My scratch repro confirms the implementation already behaves correctly here — this is purely a regression-protection gap, which is why it's the only blocker.)

Strongly recommended (non-blocking, but cheap while you're in here)

S1. Document the semantic-pass consolidation blind spot. Because the sticky seed lives in plannedSurvivorKeys and the semantic guard skips the entire pair when either side is a survivor (src/lib/dedup.ts:563-568), a recorded survivor can never acquire new semantic duplicates in any later run — fresh near-duplicates of an old survivor stay separate visible records forever (they're re-skipped and re-counted as stickySkipped on every run). Exact-text duplicates still consolidate under a sticky survivor that wins its group (src/lib/dedup.ts:507-516), so the two passes behave asymmetrically for the same record. This is consistent with PR #60's established greedy semantics and conservative in the safe direction — not a bug — but the header, the cli-reference bullet, and the seeding comment all document only the "survivor never re-marked" direction. One sentence in the dedup.ts header + the docs bullet (and ideally a one-line comment at the semantic guard distinguishing it from the exact-pass treatment) prevents this being rediscovered as a bug. Two of four review agents flagged it independently.

S2. Scope the invariant prose to dedup runs, and file the prune interaction as a follow-up. "Every hidden or deleted record keeps a visible survivor" is enforced only against dedup runs: recall prune hard-deletes expired breadcrumbs and aged superseded/reverted decisions with no dedup_lineage awareness (src/commands/prune.ts:142-154), so a pruned survivor orphans its hidden duplicates. Suggest "no later dedup run ever hides or deletes a recorded survivor" phrasing, plus a follow-up issue for prune × dedup_lineage (searched open+closed issues — only #53 is adjacent, and it doesn't cover this).

Suggestions (non-blocking)

  • CLI catch is broader than its comment (src/commands/dedup.ts:137-145): it swallows every applyDedupPlan error — SQLite busy/corrupt, constraint failures, future bugs — printing message-only stderr with no stack, and only the guard's message carries the "No changes were made" reassurance. Day-to-day the guard is unreachable via the CLI (plan+apply run back-to-back on one handle), so this catch will mostly fire on exactly the errors it handles worst. Consider a typed DedupRefusedError thrown by the guard, with rethrow for everything else (precedent: src/commands/export.ts:138-144 handles-then-rethrows) — or at minimum widen the comment.
  • stickySkipped accounting nits: the CLI label "N prior survivor(s) kept visible" counts skipped marks/pairs, not survivors (one sticky survivor in three semantic pairs prints 3); the semantic-pass counter also increments when stickiness wasn't the deciding factor, and a record can be counted in both passes in one run. Cosmetic; consider "N mark(s) skipped (prior survivor kept visible)". Relatedly, a dry run whose only findings are sticky-skipped prints "No duplicates found." (src/commands/dedup.ts:126-132) one line after reporting the skips.
  • Untested surfaces (follow-up material): the CLI catch path (stderr message + process.exitCode = 1 — the existing harness in tests/commands/dedup.test.ts already captures both), stickySkipped assertions (one exact, one semantic), and the 'reverted' exclusion (unreachable today — schema.ts:188 reserves it for a future unmark path — but a 10-line test inserting a reverted row directly would protect that feature from day one).

Strengths

  • The core mechanism is exactly the triaged remediation and it verifiably works: my pre/post reproduction shows the precise issue-#63 failure on main and the refusal on this branch, both modes.
  • The destructive guard is defense-in-depth done right: write-free by construction, pinned by a full-DB snapshot test, with an exemplary error message (count, sample id, rationale, "No changes were made").
  • The property test attacks the hazard surgically (angle-targeted, top-ranked challengers shadowing wave-1 records) rather than hoping random inputs collide, and it pins the invariant without freezing the conservative implementation — a future precision improvement to the semantic pass would still pass.
  • Surgical diff, conventions held, comments updated in lockstep (the stale "may legitimately be re-marked" scope-note sentence was correctly retired); stickySkipped is surfaced so no exclusion is silent.
  • DRY: one loader serves both the planner seed and the guard.

Verdict

REQUEST CHANGES — one blocking item (R1, ~10-15 line test pinning 'deleted'-status stickiness); everything else is correct and verified, so this should turn around fast.

Review R1 on PR #79: mutating loadRecordedSurvivors' filter from
status IN ('marked','deleted') to status = 'marked' passed the whole
suite — no test created a 'deleted' lineage row and asserted stickiness
on its survivor, so a one-token regression would silently re-expose the
destructive data-loss half of issue #63 and defeat both defense layers
at once (planner seed and guard share the loader).

New deterministic test: run 1 applies an exact pair destructively
(lineage status 'deleted'), then asserts run 2 never re-plans the
recorded survivor (stickySkipped reported instead) and that a
handcrafted destructive plan against that survivor still refuses
write-free. Mutation re-run: dropping 'deleted' from the filter now
fails exactly this test (654 pass / 1 fail); restored, full suite
655 pass / 0 fail, lint clean. No source change — the filter was
already correct.
@edheltzel

Copy link
Copy Markdown
Owner Author

R1 addressed in fb60cf5 — deterministic test pinning 'deleted'-status survivor stickiness, per the shape specified in the review:

  • Run 1 destructive on an exact pair at library level → asserts the lineage row carries status 'deleted' and the survivor id.
  • (a) Run 2 stickiness: a newer, equally-ranked exact duplicate arrives and wins the group on recency; asserts the plan is empty and stickySkipped is 1 — the 'deleted'-row survivor is never re-planned.
  • (b) Guard: a handcrafted destructive plan claiming that survivor as a duplicate still refuses, write-free (full-DB snapshot equality).

Mutation re-run (the reviewer's exact mutation): status IN ('marked','deleted')status = 'marked' now fails exactly this one test (654 pass / 1 fail). Restored: 655 pass / 0 fail, bun run lint clean.

No source change — as the review's scratch repro anticipated, the filter was already correct; this was purely the regression-protection gap. S1/S2 doc items and the suggestions are deferred per scope.

@edheltzel edheltzel left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — delta since 6047ece (blocker R1)

Re-verified at head fb60cf5 (single test-only commit, +58 lines in tests/lib/dedup.property.test.ts; no source or docs changes, so the deferred non-blocking items S1/S2 and suggestions are untouched as expected).

  • Mutation re-run (independent): applied my exact R1 mutation — WHERE status IN ('marked','deleted')WHERE status = 'marked' in loadRecordedSurvivors — in a fresh worktree at fb60cf5. Result: 654 pass / 1 fail, the sole failure being the new test. Clean checkout: 655 pass / 0 fail. Both numbers match the worker's report verbatim. The previously surviving mutation is now caught.
  • Test quality: fully deterministic (fixed inputs, no fast-check, semantic: false), correctly scoped to the blocker, and it pins both defense layers through the 'deleted' path — the planner seed (run-2 exact group where a newer equal-rank record wins on recency → empty plan + stickySkipped === 1) and the destructive guard (handcrafted plan → toThrow + full-DB snapshot equality, proving write-freeness). It also pins the run-1 lineage status as 'deleted' explicitly, so the test can't silently degrade into a second 'marked'-path test.
  • CI: green on head fb60cf5 (macOS primary + Ubuntu smoke, run 27378002026, completed/success).

The non-blocking items from the original review (semantic-pass consolidation blind-spot docs, prune × dedup_lineage follow-up issue, CLI catch narrowing, stickySkipped labeling) remain valid follow-up material and were explicitly not gating.

Verdict

APPROVE

@edheltzel edheltzel merged commit 1c111cb into main Jun 11, 2026
2 checks passed
@edheltzel edheltzel deleted the fix/issue-63-sticky-survivors branch June 11, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dedup: cross-run re-marking of a prior survivor breaks the visible-survivor safety property (transitive chains across runs)

1 participant