Skip to content

PCSM-346: Make create and drop replay UUID-idempotent#243

Closed
chupe wants to merge 4 commits into
PCSM-342from
PCSM-346
Closed

PCSM-346: Make create and drop replay UUID-idempotent#243
chupe wants to merge 4 commits into
PCSM-342from
PCSM-346

Conversation

@chupe

@chupe chupe commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Problem

PCSM treated every change-stream create and drop as real DDL. That is unsafe during a sharded movePrimary, because MongoDB can emit phantom DDL events for the moved collection without tagging them fromMigrate (SERVER-120349).

The visible failure is data loss on the target: PCSM drops the collection and recreates it empty. The reported symptom was a 10-document source collection landing as 0 documents on the target.

Solution

Compare the change event UUID against the catalog UUID before replaying a create or drop.

For create:

  • same UUID: the collection is already correct, so the create is a no-op
  • different UUID with an event UUID: phantom movePrimary create, so update the catalog UUID and keep the target data
  • no catalog entry, or no event UUID: real create, so run the existing drop-before-create path

For drop:

  • event UUID and catalog UUID both exist and differ: suppress the stale phantom drop
  • matching UUID, missing catalog entry, or nil UUID: run the normal drop path

Time-series create handling stays unchanged. This is the core data-preservation part of the PCSM-249 fix.

Testing

  • TestApplyCreateDDLChange covers same-UUID no-op, different-UUID phantom create (catalog UUID update only), nil event UUID real create, missing catalog entry real create, and the timeseries skip
  • TestApplyDropDDLChange covers UUID-mismatch suppression, UUID-match real drop, missing catalog entry real drop, and nil-UUID real drop

Notes

Part of PCSM-249. Depends on #242 (catalog UUID reader and movePrimary marker scaffolding). This PR is stacked on that branch and will retarget to main once #242 merges.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UUID-idempotent dispatch in applyDDLChange is the right shape for the SERVER-120349 problem and the table-driven tests cover the relevant branches. Two things worth a second look:

  • The mock's SetCollectionUUID discards the UUID argument, so the phantom-create test asserts only that the call happened, not that the catalog was updated with the new event UUID. The whole point of the phantom-create branch is that the stored UUID changes; capturing it in the mock would lock that contract in.
  • The second Create switch case is logically redundant with the first (!uuidEqual(...) is already implied by falling out of the equality case) and quietly classifies catalogUUID == nil && eventUUID != nil && exists as a phantom create. That state is only reachable after a prior nil-event-UUID create stored nil in the catalog, but it's worth confirming the intent — "real create" might be the safer fallback for an entry with no recorded UUID.

Comment thread pcsm/repl/repl_test.go
Comment thread pcsm/repl/repl.go Outdated
@chupe

chupe commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Closing in favor of the consolidated PR: #222

@chupe chupe closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant