RTC: fix post draft opening as blank page despite content having been written to the draft#77874
RTC: fix post draft opening as blank page despite content having been written to the draft#77874danluu wants to merge 6 commits into
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
ab349d1 to
a3e4dbc
Compare
a3e4dbc to
22f5a55
Compare
|
@danluu I've spent a while working through this one, and I'm not sure if the situation this protects against is possible without executing code commands and a highly unusual workflow, and even then I can't manually reproduce it. For now, I'll respond to this PR in the same way that I would a human. Reproduction WorkflowLet's talk about the reproduction steps in
In my local testing, even with the ID extraction and URL construction, I was unable to reproduce the behavior on draft-post-id-sync.movI suspect this is due to human-speed timings, but I'm not sure. Testing flakinessTo verify, I tried running this test on trunk: git fetch https://github.com/danluu/gutenberg.git try/draft-reopens-blank-pr:pr-77874
git checkout pr-77874 -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts test/e2e/specs/editor/collaboration/fixtures/collaboration-utils.ts
npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.tsOn my first attempt, the test passed in The test as-is can pass on SummaryI think this may be a valid bug, but as-is the reproduction steps and code here don't do a good job of showing it. I'd like to address these issues:
Thank you! |
| if ( POLLING_MANAGER_ORIGIN === origin ) { | ||
| if ( | ||
| POLLING_MANAGER_ORIGIN === origin || | ||
| LOCAL_SYNC_MANAGER_ORIGIN === origin |
There was a problem hiding this comment.
If this PR is merged, can we also add a note around the LOCAL_SYNC_MANAGER_ORIGIN constant? Currently there's an implicit need to pair any LOCAL_SYNC_MANAGER_ORIGIN transaction with a persistCRDTDoc(), or else the changes made aren't persisted to the room or entity. Both current places using the origin already persist the doc (like below), but it would be good to document that these two functions need to be bundled together.
gutenberg/packages/sync/src/manager.ts
Lines 544 to 547 in 7366274
|
Sorry I didn't respond to this earlier. I must've marked the email for this as read while scrolling and not noticed it! Let me see if I can get a more natural repro for this. Given the usual sequence of steps here, it may not be possible. |
|
Alright, here's a video of a repro that supposedly only does "normal" user actions. As a result of trying to do natural actions, there's a fairly long (roughly 1 minute) near the start of the video as one user waits to see the post show up so they can click on it to open it. pr77874-natural-exact-repro-current-trunk-annotated.mp4AI explanation of similar correct (issue doesn't reproduce) vs incorrect (issue does reproduce) ordering, below: Correct Ordering
Incorrect Ordering
Why It VariesThe bug depends on which async path wins:
The race is among RTC delivery, save persistence, CRDT serialization, fresh editor hydration, and tab unload/disconnect. The user-visible actions can be identical while [END AI TEXT] |
|
Thank you for the reproduction, and interesting! The reproduction seemed a bit odd, as in my knowledge RTC autosaves did not automatically create draft posts on purpose, but it appears #77865 introduced this change which I missed. It seems like that change from |
|
I still haven't had luck with a local reproduction, I believe because there's a tricky step where one user needs to delete the post's content, while the other user adds content and saves, and then that first user needs to exit the post before the save is merged which I haven't been able to manage manually. I think this problem is likely real, but I don't trust the e2e tests. What do you think about removing them? I have a couple of reasons:
I think it would be possible to improve the test result timing by adjusting |
This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).
What?
Here's a video demonstrating the bug described in the title (sorry, there's enough going on that it's not super clear; there are more details below that make it clearer what's going on in the video):
draft-reopens-blank-repro.mp4
Note that the blank draft, if saved, can overwrite existing content.
BEGIN AI GENERATED TEXT
An RTC-enabled editor session can save a real draft whose parent post and revisions contain the expected title and marker content, but a later editor reopen shows a blank "No title" draft. If the user trusts that blank editor state and saves again, the blank state can become an actual overwrite.
The concrete failing shape is:
post_status=draft;post.php?post=<id>&action=editURL shows "No title", "Add title", and an empty canvas.This is an editor rehydration/display path bug, not a save rejection.
Repros Built
Browser Repro
Branch
try/draft-reopens-blank-pradds:The browser repro uses normal editor actions:
admin.createNewPost;The browser repro does not create blocks artificially, mutate
wp.datacontent, stub requests, inject network faults, alter clocks, or directly change the database. The onlywp.datareads/writes in setup are reading the current post ID and disabling editor welcome/fullscreen preferences, matching existing collaboration test conventions.Lower-Level Repro
Branch
try/draft-reopens-blank-pralso adds a polling-manager regression:The focused test reproduces the bad lower-level behavior: a local Yjs update with origin
syncManageris queued while outbound sync is paused, then published after the polling manager detects another collaborator and resumes queues. That is the stale blank bootstrap update that later overwrites the visible editor state.Database/Storage Evidence
On an unfixed trunk-derived run, the browser test failed while SQL showed the saved content was present:
Result shape:
ID=19,post_status=draft, titleSame user saved draft ..., content marker present;ID=21, marker present;ID=22, marker present.The sync room for
postType/post:19contained only blank document bootstrap updates. Decoding those Yjs updates produced:{ "title": "", "status": "auto-draft", "content": "", "blocks": [] }That explains why the DB and revisions were correct while the editor reopened blank.
PHP-Only Repro Status
I did not build a PHP-only repro because the failure requires browser sync-provider origin handling and Yjs room replay. The PHP server is acting as durable storage for sync-room updates; the incorrect write originates in the browser polling provider, then PHP stores and replays it.
Known-Fixes Base Status
Known-fixes base checked:
/Users/danluu/dev/fuzz/gutenberg-fuzz-all-local-known-fixes-clone;try/fuzz-all-local-known-fixes-clone;6a1a8d30794;/Users/danluu/dev/fuzz/gutenberg-draft-reopens-blank-known.I applied the known-fixes clone's tracked dirty patch into the detached verification worktree so the autosaves-controller known fix was included without mutating the caller's dirty worktree.
Known-fixes command:
Result: the control passed, but the same-account saved-draft reopen test failed. The failure screenshot showed a blank "No title" editor with
Draftstatus andRevisions 2. Therefore the known-fixes base still reproduces this bug; none of the known local fixes eliminate it.Browser Video
Local video:
The video is a stitched Playwright trace contact sheet from the failing known-fixes run. It keeps these screens visible at once: saved editor content, same-account stale window, draft list with the visible title, blank reopen, and a running annotation log.
Failure Mechanism
The important sequence is:
auto-draft.LOCAL_SYNC_MANAGER_ORIGIN/syncManagerorigin.POLLING_MANAGER_ORIGIN.wp_sync_storage._crdt_documentalso contains title/content, although it can retain stalestatus: auto-draft.The fix is to prevent sync-manager bootstrap transactions from being published to the collaborative room. Bootstrapping local state from REST/persisted data is not a user edit and should not become a room update. Real editor edits and save markers still use other origins and continue to sync.
Introduction History
This appears to be a composition bug rather than one isolated bad line.
The bad invariant is that
syncManagerbootstrap updates were classified as publishable local collaboration updates. That violates the boundary between "initialize this local document from durable state" and "broadcast a user edit."Distinct From #77865
This is not WordPress/gutenberg#77865.
In #77865, the important failure shape is an auto-draft promotion/discoverability problem: content may only live on an auto-draft/autosave path, so the draft is hard to find or not promoted as expected.
Here:
draft;It is also not WordPress/gutenberg#77669: there is no large update and no "Connection lost" modal in this repro.
Initial Fix Plan
Initial ideas considered:
status: auto-draft;The first three plans either introduce ordering dependencies, special-case post fields too deeply, or treat the symptom after the stale room write already exists. The fourth plan removes the bad state at the source.
Audit: Linus Torvalds
The field-specific plans are too clever. If a local bootstrap transaction is not a user edit, it should not be broadcast. Do not add a pile of special cases for title, blocks, draft status, or autosaves when the origin boundary already exists.
The fix should be small, obvious, and located at the provider boundary where outbound updates are classified.
Audit: Kyle Kingsbury / Jepsen
The failure is an ordering bug across two state stores: durable WordPress posts/revisions and the polling room log. The system let an initialization snapshot enter the operation log, where it could later race against saved durable state.
The fix must preserve the invariant that room updates are user/editor operations, not arbitrary local rehydration snapshots. A test should exercise the queue-paused/queue-resumed interleaving because that is what made a stale solo bootstrap become a later room write.
Audit: Dan Luu
The browser repro must stay realistic. A test that directly mutates
wp.data, creates blocks programmatically, or stubs network responses would be too easy to dismiss as a test artifact. The real report is "I saved a draft, found it, reopened it, and it was blank", so the test has to use the editor the way a user does.The final assertion should be on the visible editor content after reopen, with DB checks proving that a failure is not just a missing save.
Revised Fix Plan
The implemented PR branch follows the audited plan:
syncManagerbootstrap updates must not be published after queue resume.packages/sync/src/providers/http-polling/polling-manager.tssoonDocUpdateignores bothPOLLING_MANAGER_ORIGINandLOCAL_SYNC_MANAGER_ORIGIN.This keeps the provider from writing stale local bootstrap snapshots into
wp_sync_storage, while preserving outbound sync for real editor edits and save-origin updates.False-Positive Analysis
This is not a false save failure: REST and SQL both show the parent draft and revisions contain the marker.
This is not a draft-list discoverability failure: the browser repro finds the draft by title in
edit.php?post_status=draft.This is not a direct-store test artifact: the browser repro inserts title and paragraph through normal UI controls and keyboard input.
This is not a network-fault artifact: no requests are stubbed, aborted, or delayed by the test.
This is not a clock/race injection artifact: no clock manipulation is used. The race exists in ordinary polling queue behavior.
This is not the large-update connection-loss issue: the marker content is tiny and no connection modal appears.
This is not merely #77865: the parent is already a visible draft with saved content, and the failure is replay of stale RTC room state over that saved content during editor reopen.
The residual risk is that existing production rooms can already contain stale bootstrap updates. This fix prevents new stale bootstrap writes. For already-contaminated rooms, a later saved-state compaction or room cleanup strategy may still be needed if the room log is retained indefinitely.
Verification
Pre-fix trunk/known-fixes behavior:
title,content, andblocks.Post-fix branch
try/draft-reopens-blank-pr:Result: 29 passed.
Result: 2 passed.
Additional post-fix SQL check on the successful browser run:
post_status=draft;{"version":1}), not a blankdocumentpayload.END AI GENERATED TEXT