Skip to content

fix(replica): don't re-restore an already-verified or in-progress snapshot#91

Merged
passcod merged 1 commit into
mainfrom
fix/duplicate-restore-verification
Jul 3, 2026
Merged

fix(replica): don't re-restore an already-verified or in-progress snapshot#91
passcod merged 1 commit into
mainfrom
fix/duplicate-restore-verification

Conversation

@passcod

@passcod passcod commented Jul 3, 2026

Copy link
Copy Markdown
Member

🤖 The snapshot-list result handler only compared the picked snapshot against the active restore. For an ephemeral verify replica the active restore is torn down after verification, so a later snapshot-list job resolving to the same snapshot passed the guard, created a second restore, and reported a duplicate restore-verification to canopy (observed as two verify/healthy reports for the same snapshot ~76s apart).

The switchover path itself cannot double-fire for a single restore CR: the phase is flipped to Active before the report is sent, so any reconcile that reaches the report has already left the Switching state. The duplicate therefore came from a second restore CR for the same snapshot.

This tightens the create guard: skip creation when the snapshot is already recorded in status.verifiedSnapshotId (the ephemeral marker that outlives the torn-down restore) or when any non-failed restore is already working on it (Pending/Restoring/Ready/Switching/Active). Failed restores still allow a retry via the failure backoff path.

Defense-in-depth alongside the canopy-side change to drop verify entries from the worklist once their report is received; this closes the propagation-window race and any non-canopy trigger.

…pshot

The snapshot-list result handler only compared the picked snapshot
against the *active* restore. For an ephemeral verify replica the active
restore is torn down after verification, so a later snapshot-list job
resolving to the same snapshot passed the guard, created a second
restore, and reported a duplicate restore-verification to canopy.

Skip creation when the snapshot is already recorded in
status.verifiedSnapshotId or when any non-failed restore is already
working on it (Pending/Restoring/Ready/Switching/Active). Failed
restores still allow a retry via the failure backoff path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@passcod passcod enabled auto-merge July 3, 2026 02:27
@passcod passcod merged commit 3a8a640 into main Jul 3, 2026
19 checks passed
@passcod passcod deleted the fix/duplicate-restore-verification branch July 3, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant