fix(dedup): make recorded survivors sticky across runs by edheltzel · Pull Request #79 · edheltzel/Recall

edheltzel · 2026-06-11T20:50:14Z

Closes #63

What

Implements the triage decision on #63: option (a) sticky survivors plus option (c)'s destructive-mode guard as defense-in-depth.

Sticky survivors — loadRecordedSurvivors loads survivor keys from dedup_lineage (status marked/deleted, mirroring loadMarkedDuplicates). planDedup seeds plannedSurvivorKeys with them, extending the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 within-plan one-hop guard across runs; the exact pass additionally skips sticky survivors as duplicates (a newer record can still out-rank one within its group — the sticky record just stays visible). Skips are counted per table (stickySkipped) and surfaced in the CLI summary, so the exclusion is never silent.
Destructive-mode guard — applyDedupPlan with destructive refuses (throws before any write) when a planned duplicate is itself a recorded survivor; runDedup surfaces the refusal as a clean error message with exit code 1. planDedup can no longer produce such a plan, so this path is unreachable end-to-end by design — it exists as belt-and-braces should (1) ever regress.
Cross-run property test — two-wave generation (second corpus joins existing exact groups; angle-targeted, top-priority semantic challengers shadow wave-1 records) pins a recorded survivor is never re-marked across runs, both in the run-2 plan and as survivor/duplicate disjointness over final lineage. The scope note at the top of the suite now states the cross-run invariant. A deterministic test pins the destructive guard via a handcrafted plan (independent of planDedup, so the mutation check below isolates the property). Oracle discipline preserved: assertions are relational invariants over generator-driven inputs.

Also documents the new safety-model bullet in docs/cli-reference.md and the dedup.ts header.

What this deliberately does NOT do

No option (b) chain compression (future enhancement; nothing here precludes it — lineage rows are untouched at plan time).
No change to mark→delete escalation semantics (dedup: mark→delete escalation is a no-op — decide semantics and document #73, sequenced after this).

Verification

bun run lint clean; full bun test: 654 pass / 0 fail.
Mutation check: replacing loadRecordedSurvivors(db) with an empty set fails exactly one test — the new cross-run property — on the first generated case (653 pass / 1 fail). Restored and re-verified green.

Tradeoff (per triage)

Survivorship is order-dependent across runs: an early survivor is never consolidated under a genuinely better later record (it stays visible instead). Conservative in the safe direction; cosmetic next to the data-loss risk.

Cross-run re-marking of a prior survivor could break the visible-survivor safety property: dedup_lineage accumulated marked->marked chains, leaving hidden (or, with --delete, permanently removed) records whose only visible neighbor was never measured against them. - Sticky survivors (option a): planDedup loads survivor keys from dedup_lineage (status 'marked'/'deleted', mirroring loadMarkedDuplicates) and seeds plannedSurvivorKeys with them, extending the within-plan one-hop guard across runs; the exact pass skips them as duplicates too. Skips are reported per table (stickySkipped) and in the CLI summary. - Destructive-mode guard (option c, defense-in-depth): applyDedupPlan refuses — throws before writing — when a planned duplicate is itself a recorded survivor; runDedup surfaces the refusal as a clean error with exit code 1. - Property suite: cross-run sticky-survivor property (two waves plus angle-targeted challengers) pins 'a recorded survivor is never re-marked across runs' and final-lineage survivor/duplicate disjointness; the scope note now states the cross-run invariant. Mutation-checked: removing the exclusion fails exactly the new property. Chain compression (option b) and mark->delete escalation (#73) are deliberately untouched. Closes #63

edheltzel

Review — PR #79 (sticky survivors, issue #63)

Reviewed at head 6047ece via the pr-review-toolkit multi-agent workflow (code review, test coverage, comment accuracy, silent-failure audit) plus independent hands-on verification: scratch-DB reproduction of issue #63's exact scenario against both main and this branch, and three mutation runs. CI is green on the head commit (macOS primary + Ubuntu smoke, run on 6047ece).

Independent verification results

Issue #63 reproduced and fixed. Scratch in-memory DB, the issue's exact two-run A-B-C composition (angles 0°/18°/342° → sim(A,B)=sim(A,C)=0.9511 ≥ 0.95, sim(B,C)=0.809 < 0.95; C out-ranks A, A out-ranks B). On main: run 2 plans A → C, leaving B's only visible representative below threshold — and in destructive mode B then A are hard-deleted, leaving only C. On this branch: run 2 plans nothing, reports stickySkipped: 1, and A stays visible — in both marking and destructive modes (the 'deleted' status path works correctly).
loadRecordedSurvivors genuinely mirrors loadMarkedDuplicates (same prepare/all/table:id-Set shape, src/lib/dedup.ts:350-355 vs 364-370); the two deltas (DISTINCT, status IN ('marked','deleted')) are correct and explained in the docstring. Codegraph confirms blast radius: planDedup/applyDedupPlan have exactly one production caller (runDedup), so the new throw reaches only the new CLI catch; the MCP server is untouched.
Destructive guard: verified the throw precedes the INSERT prepare and the transaction (src/lib/dedup.ts:613-624 vs 626/644) — genuinely write-free; the handcrafted-plan test pins both the throw and zero-writes via full-DB snapshot. The unreachability-through-planDedup claim holds by construction and in my repro. Library-level coverage of the guard itself is sufficient.
Mutation checks (run locally):
- Call-site stub (stickySurvivors = new Set() in planDedup): 653 pass / 1 fail — exactly the new cross-run property, first generated case. The PR's mutation claim verifies verbatim.
- Whole-function stub: 652 pass / 2 fail — cross-run property + destructive-guard test. Both defense layers are load-bearing.
- WHERE status IN ('marked','deleted') → WHERE status = 'marked': 654 pass / 0 fail. See Required below.
Nothing precludes option (b) or #73: zero UPDATE dedup_lineage statements in src/; planDedup is read-only and applyDedupPlan only INSERTs lineage rows. Chain compression and mark→delete escalation remain open.
Docs: the dedup.ts header bullet and the cli-reference.md safety-model bullet match verified behavior (refusal wording, order-dependence tradeoff, no-writes claim). Oracle discipline in the new property block is upheld (relational invariants over generator-driven inputs; survivorsAfterRun1 is used as a premise, never compared back to itself), and the scope-note update accurately retires the now-false cross-run sentence.

Required (blocking)

R1. Pin 'deleted'-status stickiness with a deterministic test. Mutating loadRecordedSurvivors from status IN ('marked','deleted') to status = 'marked' passes the entire suite (654/0). That mutation re-exposes precisely the data-loss half of issue #63 — the half the issue calls "worse" — and it defeats both defense layers at once, since planning stickiness and the destructive guard share this one function. Cause: the cross-run property only applies plans non-destructively, and the guard test's run-1 lineage is 'marked', so no test ever creates a 'deleted' lineage row and asserts stickiness on its survivor. The triage asked for a test pinning the chosen invariant, and the PR's own verification section sets mutation-resistance as the proof standard — this one-token mutation ships invisibly today. Fix is ~10-15 lines in the new describe block: run 1 with { destructive: true } on an exact pair, then assert (a) run 2 never plans the recorded survivor as a duplicate, and (b) a handcrafted destructive plan against that survivor still refuses. (My scratch repro confirms the implementation already behaves correctly here — this is purely a regression-protection gap, which is why it's the only blocker.)

Strongly recommended (non-blocking, but cheap while you're in here)

S1. Document the semantic-pass consolidation blind spot. Because the sticky seed lives in plannedSurvivorKeys and the semantic guard skips the entire pair when either side is a survivor (src/lib/dedup.ts:563-568), a recorded survivor can never acquire new semantic duplicates in any later run — fresh near-duplicates of an old survivor stay separate visible records forever (they're re-skipped and re-counted as stickySkipped on every run). Exact-text duplicates still consolidate under a sticky survivor that wins its group (src/lib/dedup.ts:507-516), so the two passes behave asymmetrically for the same record. This is consistent with PR #60's established greedy semantics and conservative in the safe direction — not a bug — but the header, the cli-reference bullet, and the seeding comment all document only the "survivor never re-marked" direction. One sentence in the dedup.ts header + the docs bullet (and ideally a one-line comment at the semantic guard distinguishing it from the exact-pass treatment) prevents this being rediscovered as a bug. Two of four review agents flagged it independently.

S2. Scope the invariant prose to dedup runs, and file the prune interaction as a follow-up. "Every hidden or deleted record keeps a visible survivor" is enforced only against dedup runs: recall prune hard-deletes expired breadcrumbs and aged superseded/reverted decisions with no dedup_lineage awareness (src/commands/prune.ts:142-154), so a pruned survivor orphans its hidden duplicates. Suggest "no later dedup run ever hides or deletes a recorded survivor" phrasing, plus a follow-up issue for prune × dedup_lineage (searched open+closed issues — only #53 is adjacent, and it doesn't cover this).

Suggestions (non-blocking)

CLI catch is broader than its comment (src/commands/dedup.ts:137-145): it swallows every applyDedupPlan error — SQLite busy/corrupt, constraint failures, future bugs — printing message-only stderr with no stack, and only the guard's message carries the "No changes were made" reassurance. Day-to-day the guard is unreachable via the CLI (plan+apply run back-to-back on one handle), so this catch will mostly fire on exactly the errors it handles worst. Consider a typed DedupRefusedError thrown by the guard, with rethrow for everything else (precedent: src/commands/export.ts:138-144 handles-then-rethrows) — or at minimum widen the comment.
stickySkipped accounting nits: the CLI label "N prior survivor(s) kept visible" counts skipped marks/pairs, not survivors (one sticky survivor in three semantic pairs prints 3); the semantic-pass counter also increments when stickiness wasn't the deciding factor, and a record can be counted in both passes in one run. Cosmetic; consider "N mark(s) skipped (prior survivor kept visible)". Relatedly, a dry run whose only findings are sticky-skipped prints "No duplicates found." (src/commands/dedup.ts:126-132) one line after reporting the skips.
Untested surfaces (follow-up material): the CLI catch path (stderr message + process.exitCode = 1 — the existing harness in tests/commands/dedup.test.ts already captures both), stickySkipped assertions (one exact, one semantic), and the 'reverted' exclusion (unreachable today — schema.ts:188 reserves it for a future unmark path — but a 10-line test inserting a reverted row directly would protect that feature from day one).

Strengths

The core mechanism is exactly the triaged remediation and it verifiably works: my pre/post reproduction shows the precise issue-#63 failure on main and the refusal on this branch, both modes.
The destructive guard is defense-in-depth done right: write-free by construction, pinned by a full-DB snapshot test, with an exemplary error message (count, sample id, rationale, "No changes were made").
The property test attacks the hazard surgically (angle-targeted, top-ranked challengers shadowing wave-1 records) rather than hoping random inputs collide, and it pins the invariant without freezing the conservative implementation — a future precision improvement to the semantic pass would still pass.
Surgical diff, conventions held, comments updated in lockstep (the stale "may legitimately be re-marked" scope-note sentence was correctly retired); stickySkipped is surfaced so no exclusion is silent.
DRY: one loader serves both the planner seed and the guard.

Verdict

REQUEST CHANGES — one blocking item (R1, ~10-15 line test pinning 'deleted'-status stickiness); everything else is correct and verified, so this should turn around fast.

Review R1 on PR #79: mutating loadRecordedSurvivors' filter from status IN ('marked','deleted') to status = 'marked' passed the whole suite — no test created a 'deleted' lineage row and asserted stickiness on its survivor, so a one-token regression would silently re-expose the destructive data-loss half of issue #63 and defeat both defense layers at once (planner seed and guard share the loader). New deterministic test: run 1 applies an exact pair destructively (lineage status 'deleted'), then asserts run 2 never re-plans the recorded survivor (stickySkipped reported instead) and that a handcrafted destructive plan against that survivor still refuses write-free. Mutation re-run: dropping 'deleted' from the filter now fails exactly this test (654 pass / 1 fail); restored, full suite 655 pass / 0 fail, lint clean. No source change — the filter was already correct.

edheltzel · 2026-06-11T21:17:19Z

R1 addressed in fb60cf5 — deterministic test pinning 'deleted'-status survivor stickiness, per the shape specified in the review:

Run 1 destructive on an exact pair at library level → asserts the lineage row carries status 'deleted' and the survivor id.
(a) Run 2 stickiness: a newer, equally-ranked exact duplicate arrives and wins the group on recency; asserts the plan is empty and stickySkipped is 1 — the 'deleted'-row survivor is never re-planned.
(b) Guard: a handcrafted destructive plan claiming that survivor as a duplicate still refuses, write-free (full-DB snapshot equality).

Mutation re-run (the reviewer's exact mutation): status IN ('marked','deleted') → status = 'marked' now fails exactly this one test (654 pass / 1 fail). Restored: 655 pass / 0 fail, bun run lint clean.

No source change — as the review's scratch repro anticipated, the filter was already correct; this was purely the regression-protection gap. S1/S2 doc items and the suggestions are deferred per scope.

edheltzel

Re-review — delta since `6047ece` (blocker R1)

Re-verified at head fb60cf5 (single test-only commit, +58 lines in tests/lib/dedup.property.test.ts; no source or docs changes, so the deferred non-blocking items S1/S2 and suggestions are untouched as expected).

Mutation re-run (independent): applied my exact R1 mutation — WHERE status IN ('marked','deleted') → WHERE status = 'marked' in loadRecordedSurvivors — in a fresh worktree at fb60cf5. Result: 654 pass / 1 fail, the sole failure being the new test. Clean checkout: 655 pass / 0 fail. Both numbers match the worker's report verbatim. The previously surviving mutation is now caught.
Test quality: fully deterministic (fixed inputs, no fast-check, semantic: false), correctly scoped to the blocker, and it pins both defense layers through the 'deleted' path — the planner seed (run-2 exact group where a newer equal-rank record wins on recency → empty plan + stickySkipped === 1) and the destructive guard (handcrafted plan → toThrow + full-DB snapshot equality, proving write-freeness). It also pins the run-1 lineage status as 'deleted' explicitly, so the test can't silently degrade into a second 'marked'-path test.
CI: green on head fb60cf5 (macOS primary + Ubuntu smoke, run 27378002026, completed/success).

The non-blocking items from the original review (semantic-pass consolidation blind-spot docs, prune × dedup_lineage follow-up issue, CLI catch narrowing, stickySkipped labeling) remain valid follow-up material and were explicitly not gating.

Verdict

APPROVE

edheltzel commented Jun 11, 2026

View reviewed changes

edheltzel mentioned this pull request Jun 11, 2026

prune: deletes survivors with no dedup_lineage awareness — orphans marked duplicates #80

Open

edheltzel commented Jun 11, 2026

View reviewed changes

edheltzel merged commit 1c111cb into main Jun 11, 2026
2 checks passed

edheltzel deleted the fix/issue-63-sticky-survivors branch June 11, 2026 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(dedup): make recorded survivors sticky across runs#79

fix(dedup): make recorded survivors sticky across runs#79
edheltzel merged 2 commits into
mainfrom
fix/issue-63-sticky-survivors

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel left a comment

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edheltzel commented Jun 11, 2026

What

What this deliberately does NOT do

Verification

Tradeoff (per triage)

Uh oh!

edheltzel left a comment

Choose a reason for hiding this comment

Review — PR #79 (sticky survivors, issue #63)

Independent verification results

Required (blocking)

Strongly recommended (non-blocking, but cheap while you're in here)

Suggestions (non-blocking)

Strengths

Verdict

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel left a comment

Choose a reason for hiding this comment

Re-review — delta since 6047ece (blocker R1)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Re-review — delta since `6047ece` (blocker R1)