RTC: fix data corruption on concurrent list move#78274
Conversation
|
Warning: Type of PR label mismatch To merge this PR, it requires exactly 1 label indicating the type of PR. Other labels are optional and not being checked here.
Read more about Type labels in Gutenberg. Don't worry if you don't have the required permissions to add labels; the PR reviewer should be able to help with the task. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).
What?
Per @alecgeatches's request, this is part 1 of N of splitting #77924 into multiple PRs. This fixes the issue shown in the 3rd video there. I'm opening this single PR rather than doing all N parts to get feedback one to see if other PRs from this series should be modified before submitting them.
AI TEXT
The failure is not from one isolated bad commit. It comes from a few RTC design
choices interacting:
reliable sync protocol foundation. The block merge strategy remained largely
snapshot-oriented: local block trees are converted into Yjs structures, and
later snapshots are merged into the current Yjs document.
checks around collaborative editing and made this path more broadly active.
SyncManager.update()throughyieldToEventLoop(). That improved scheduling,but it also opened a race: a local snapshot can be captured before a remote
update arrives, then applied after that remote update has already modified the
Yjs document.
manifestation of this by waiting for deferred Y.Doc writes before serializing
_crdt_document, confirming that deferred local writes are observable at saveboundaries.
observers after persisted CRDT hydration, fixing a redundant hydration edit.
That did not address stale local snapshots that are already queued before
remote reconciliation finishes.
schema-aware table/query merging. That is relevant to the full PR 77924
branch, but is not part of this first split.
Bug Shape
The list-item stress failure is:
npm run test:e2e:rtc-websocket -- \ --grep "two users concurrently move list items"Two users move different list items at about the same time. The old merge path
could only see the incoming full block snapshot, not the record the local user
edited from. When a queued local snapshot was stale, the merge could not tell:
update", from
For structural block changes, the positional merge also treated moves as
delete/insert-like rewrites. That can replace Yjs block objects instead of moving
the existing objects. Once that happens, rich text and nested block state may
attach to the wrong list item, duplicate an item, or smear text across siblings.
The visible e2e symptom was a persisted/reloaded list such as:
The live editors could converge before save, while a reload immediately after
save exposed stale provider/store reconciliation. The test now waits for the
semantic post-save state on both editors before reloading; without that wait,
the test can race its own save boundary instead of testing the merge semantics.
Fix Plan Used In The Split
This split keeps the fix to the smallest code surface that made the list-move
case reliable.
Pass a pre-edit
baseRecordfromeditEntityRecord()intoSyncManager.update().This gives CRDT block merging a three-way view: base record, current Yjs
document, and incoming local edits.
Thread
baseRecord.blocksintomergeCrdtBlocks().The block merger can now skip fields that are unchanged from the local base
instead of overwriting remote changes with stale local values.
Rebase block order by identity before falling back to positional merging.
For same-size block arrays with stable
clientIds, the merger moves thecurrent Yjs block objects into the incoming order. This preserves nested
Yjs state and avoids rewriting adjacent blocks. Where
clientIds are notenough, the code can use unique semantic block keys.
Track remote key versions in
SyncManager.Remote Yjs updates mark top-level record keys as reconciling. A deferred
local update that was scheduled before a remote change can then filter stale
non-block keys instead of replaying them. Block updates are still passed
through to the base-aware block merger.
Strengthen the list-move e2e assertion.
The old test only checked relative ordering:
The split asserts the exact semantic order:
It also checks that exact order after save and before reload, so the reload
phase does not race a pending sync/save boundary.
What Stayed Out
The first split does not attempt to fix every RTC stress failure that remains
when PR 77924's behavioral changes are layered onto current trunk. In local
validation, the other existing failure,
three users concurrently edit a large post with diverse blocks, still failedafter this reduced branch. Its failures were broader: same-paragraph convergence,
block toolbar movement, and final paragraph propagation. Those require more of
the full PR 77924 surface and should be reviewed in later splits.
The table/query-array merge work and WebSocket/provider reload behavior fixes
should also remain separate review slices. The WebSocket test harness itself is
not a later slice; it comes from PR 78179 and must be preserved by every split
unless a future split explicitly replaces that harness.
END AI TEXT