Skip to content

RTC: fix post draft opening as blank page despite content having been written to the draft#77874

Open
danluu wants to merge 6 commits into
WordPress:trunkfrom
danluu:try/draft-reopens-blank-pr
Open

RTC: fix post draft opening as blank page despite content having been written to the draft#77874
danluu wants to merge 6 commits into
WordPress:trunkfrom
danluu:try/draft-reopens-blank-pr

Conversation

@danluu
Copy link
Copy Markdown
Contributor

@danluu danluu commented May 1, 2026

This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).

What?

Here's a video demonstrating the bug described in the title (sorry, there's enough going on that it's not super clear; there are more details below that make it clearer what's going on in the video):

draft-reopens-blank-repro.mp4

Note that the blank draft, if saved, can overwrite existing content.

BEGIN AI GENERATED TEXT

An RTC-enabled editor session can save a real draft whose parent post and revisions contain the expected title and marker content, but a later editor reopen shows a blank "No title" draft. If the user trusts that blank editor state and saves again, the blank state can become an actual overwrite.

The concrete failing shape is:

  • the parent post is post_status=draft;
  • the parent post content contains the expected marker;
  • revisions for the parent also contain the marker;
  • the draft is visible in the Posts list by its real title;
  • reopening the exact post.php?post=<id>&action=edit URL shows "No title", "Add title", and an empty canvas.

This is an editor rehydration/display path bug, not a save rejection.

Repros Built

Browser Repro

Branch try/draft-reopens-blank-pr adds:

WP_ENV_PORT=8911 WP_BASE_URL=http://localhost:8911 npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts --project=chromium

The browser repro uses normal editor actions:

  • enable collaboration through the test fixture;
  • create a new post through admin.createNewPost;
  • open the same post in a second browser context using the same authenticated user session;
  • fill the title field;
  • click "Add default block";
  • type the marker paragraph through the keyboard;
  • click Save draft through the editor helper;
  • navigate through the Posts list and reopen the post by URL.

The browser repro does not create blocks artificially, mutate wp.data content, stub requests, inject network faults, alter clocks, or directly change the database. The only wp.data reads/writes in setup are reading the current post ID and disabling editor welcome/fullscreen preferences, matching existing collaboration test conventions.

Lower-Level Repro

Branch try/draft-reopens-blank-pr also adds a polling-manager regression:

npm run test:unit -- packages/sync/src/providers/http-polling/test/polling-manager.test.ts --runInBand

The focused test reproduces the bad lower-level behavior: a local Yjs update with origin syncManager is queued while outbound sync is paused, then published after the polling manager detects another collaborator and resumes queues. That is the stale blank bootstrap update that later overwrites the visible editor state.

Database/Storage Evidence

On an unfixed trunk-derived run, the browser test failed while SQL showed the saved content was present:

SELECT ID, post_type, post_status, post_parent, post_title,
       LENGTH(post_content) AS len,
       post_content LIKE '%same-user-saved-draft-marker-%' AS marker
FROM wp_posts
WHERE ID = 19 OR post_parent = 19
ORDER BY ID;

Result shape:

  • parent ID=19, post_status=draft, title Same user saved draft ..., content marker present;
  • revision ID=21, marker present;
  • revision ID=22, marker present.

The sync room for postType/post:19 contained only blank document bootstrap updates. Decoding those Yjs updates produced:

{
  "title": "",
  "status": "auto-draft",
  "content": "",
  "blocks": []
}

That explains why the DB and revisions were correct while the editor reopened blank.

PHP-Only Repro Status

I did not build a PHP-only repro because the failure requires browser sync-provider origin handling and Yjs room replay. The PHP server is acting as durable storage for sync-room updates; the incorrect write originates in the browser polling provider, then PHP stores and replays it.

Known-Fixes Base Status

Known-fixes base checked:

  • worktree: /Users/danluu/dev/fuzz/gutenberg-fuzz-all-local-known-fixes-clone;
  • branch: try/fuzz-all-local-known-fixes-clone;
  • caller-observed HEAD: 6a1a8d30794;
  • detached verification worktree: /Users/danluu/dev/fuzz/gutenberg-draft-reopens-blank-known.

I applied the known-fixes clone's tracked dirty patch into the detached verification worktree so the autosaves-controller known fix was included without mutating the caller's dirty worktree.

Known-fixes command:

npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-same-user-saved-draft-reopen-loss.spec.ts --project=chromium

Result: the control passed, but the same-account saved-draft reopen test failed. The failure screenshot showed a blank "No title" editor with Draft status and Revisions 2. Therefore the known-fixes base still reproduces this bug; none of the known local fixes eliminate it.

Browser Video

Local video:

/Users/danluu/dev/fuzz/gutenberg-draft-reopens-blank-pr/test/e2e/artifacts/draft-reopens-blank-video/draft-reopens-blank-repro.mp4

The video is a stitched Playwright trace contact sheet from the failing known-fixes run. It keeps these screens visible at once: saved editor content, same-account stale window, draft list with the visible title, blank reopen, and a running annotation log.

Failure Mechanism

The important sequence is:

  1. A new post is opened with RTC enabled. There is no persisted CRDT document yet.
  2. The sync manager bootstraps a local Yjs document from the current REST record, which is still a blank auto-draft.
  3. That bootstrap transaction uses the internal LOCAL_SYNC_MANAGER_ORIGIN/syncManager origin.
  4. The HTTP polling provider registered before the bootstrap and queued every local update except updates from its own POLLING_MANAGER_ORIGIN.
  5. While the room queue is paused for a solo editor, the blank bootstrap update sits locally. When the same account opens a second editor window, collaborator detection resumes queues.
  6. The stale blank bootstrap update is published to wp_sync_storage.
  7. The primary editor saves. The parent post and revisions are correct in the database; the persisted _crdt_document also contains title/content, although it can retain stale status: auto-draft.
  8. On reopen, the editor applies the saved persisted CRDT document, then connects to the room and receives the stale blank bootstrap update.
  9. Because the stale update contains Y.Map assignments to blank Y.Text/block structures from a different Yjs client, it can hide the saved content in the merged document.
  10. The editor renders the merged blank state even though the durable post/revisions still contain the user's content.

The fix is to prevent sync-manager bootstrap transactions from being published to the collaborative room. Bootstrapping local state from REST/persisted data is not a user edit and should not become a room update. Real editor edits and save markers still use other origins and continue to sync.

Introduction History

This appears to be a composition bug rather than one isolated bad line.

  • CRDT persistence for collaborative editing was introduced in WordPress/gutenberg#72373, including the local persisted/REST bootstrap path.
  • The default HTTP polling provider was introduced in WordPress/gutenberg#74564, including room update queues and filtering only the polling provider's own origin.
  • Later polling changes such as WordPress/gutenberg#76704 made queue pause/resume behavior more explicit. That makes the bug easier to hit: a blank solo bootstrap can be queued and then sent once another editor appears.

The bad invariant is that syncManager bootstrap updates were classified as publishable local collaboration updates. That violates the boundary between "initialize this local document from durable state" and "broadcast a user edit."

Distinct From #77865

This is not WordPress/gutenberg#77865.

In #77865, the important failure shape is an auto-draft promotion/discoverability problem: content may only live on an auto-draft/autosave path, so the draft is hard to find or not promoted as expected.

Here:

  • the parent is already a visible draft;
  • the parent has the expected title/content;
  • revisions also have the expected title/content;
  • the Posts list can find the draft by title;
  • the reopen path renders a blank editor because stale RTC room state is replayed over saved durable content.

It is also not WordPress/gutenberg#77669: there is no large update and no "Connection lost" modal in this repro.

Initial Fix Plan

Initial ideas considered:

  • force the save path to flush the saved CRDT document to the sync room before returning from Save draft;
  • make rehydration prefer the database record whenever the persisted/remote CRDT document contains status: auto-draft;
  • special-case title/content/blocks on reopen so a visible draft cannot be replaced by an empty remote room document;
  • filter bootstrap updates so the room never receives local initialization snapshots.

The first three plans either introduce ordering dependencies, special-case post fields too deeply, or treat the symptom after the stale room write already exists. The fourth plan removes the bad state at the source.

Audit: Linus Torvalds

The field-specific plans are too clever. If a local bootstrap transaction is not a user edit, it should not be broadcast. Do not add a pile of special cases for title, blocks, draft status, or autosaves when the origin boundary already exists.

The fix should be small, obvious, and located at the provider boundary where outbound updates are classified.

Audit: Kyle Kingsbury / Jepsen

The failure is an ordering bug across two state stores: durable WordPress posts/revisions and the polling room log. The system let an initialization snapshot enter the operation log, where it could later race against saved durable state.

The fix must preserve the invariant that room updates are user/editor operations, not arbitrary local rehydration snapshots. A test should exercise the queue-paused/queue-resumed interleaving because that is what made a stale solo bootstrap become a later room write.

Audit: Dan Luu

The browser repro must stay realistic. A test that directly mutates wp.data, creates blocks programmatically, or stubs network responses would be too easy to dismiss as a test artifact. The real report is "I saved a draft, found it, reopened it, and it was blank", so the test has to use the editor the way a user does.

The final assertion should be on the visible editor content after reopen, with DB checks proving that a failure is not just a missing save.

Revised Fix Plan

The implemented PR branch follows the audited plan:

  1. Add a lower-level polling-manager regression showing that syncManager bootstrap updates must not be published after queue resume.
  2. Add a browser Playwright repro using normal editor actions and a second same-account editor window.
  3. Change packages/sync/src/providers/http-polling/polling-manager.ts so onDocUpdate ignores both POLLING_MANAGER_ORIGIN and LOCAL_SYNC_MANAGER_ORIGIN.
  4. Verify unit and browser coverage.

This keeps the provider from writing stale local bootstrap snapshots into wp_sync_storage, while preserving outbound sync for real editor edits and save-origin updates.

False-Positive Analysis

This is not a false save failure: REST and SQL both show the parent draft and revisions contain the marker.

This is not a draft-list discoverability failure: the browser repro finds the draft by title in edit.php?post_status=draft.

This is not a direct-store test artifact: the browser repro inserts title and paragraph through normal UI controls and keyboard input.

This is not a network-fault artifact: no requests are stubbed, aborted, or delayed by the test.

This is not a clock/race injection artifact: no clock manipulation is used. The race exists in ordinary polling queue behavior.

This is not the large-update connection-loss issue: the marker content is tiny and no connection modal appears.

This is not merely #77865: the parent is already a visible draft with saved content, and the failure is replay of stale RTC room state over that saved content during editor reopen.

The residual risk is that existing production rooms can already contain stale bootstrap updates. This fix prevents new stale bootstrap writes. For already-contaminated rooms, a later saved-state compaction or room cleanup strategy may still be needed if the room log is retained indefinitely.

Verification

Pre-fix trunk/known-fixes behavior:

  • known-fixes browser repro: control passed, same-account reopen failed blank;
  • trunk-derived browser repro: control passed, same-account reopen failed blank;
  • SQL showed parent/revisions contained marker while the editor reopened blank;
  • decoded sync room showed blank title, content, and blocks.

Post-fix branch try/draft-reopens-blank-pr:

npm run test:unit -- packages/sync/src/providers/http-polling/test/polling-manager.test.ts --runInBand

Result: 29 passed.

WP_ENV_PORT=8911 WP_BASE_URL=http://localhost:8911 npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts --project=chromium

Result: 2 passed.

Additional post-fix SQL check on the successful browser run:

  • latest same-account draft was post_status=draft;
  • title/content marker remained present;
  • the sync storage room for that post contained only state initialization ({"version":1}), not a blank document payload.

END AI GENERATED TEXT

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: danluu <danluu@git.wordpress.org>
Co-authored-by: alecgeatches <alecgeatches@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@danluu danluu force-pushed the try/draft-reopens-blank-pr branch from ab349d1 to a3e4dbc Compare May 1, 2026 06:17
@danluu danluu force-pushed the try/draft-reopens-blank-pr branch from a3e4dbc to 22f5a55 Compare May 1, 2026 15:29
@t-hamano t-hamano added [Type] Bug An existing feature does not function as intended [Feature] Real-time Collaboration Phase 3 of the Gutenberg roadmap around real-time collaboration labels May 4, 2026
@alecgeatches
Copy link
Copy Markdown
Contributor

alecgeatches commented May 8, 2026

@danluu I've spent a while working through this one, and I'm not sure if the situation this protects against is possible without executing code commands and a highly unusual workflow, and even then I can't manually reproduce it. For now, I'll respond to this PR in the same way that I would a human.

Reproduction Workflow

Let's talk about the reproduction steps in collaboration-draft-reopens-blank.spec.ts:

  1. First, a user opens a new post.

  2. Next they need to run on-page code to extract the page ID from the unsaved post:

    ( window as any ).wp.data.select( 'core/editor' ).getCurrentPostId()

    This is necessary because the post hasn't been explicitly saved yet. Contrary to the implication in the reproduction video, the post does not show up in the post list until it has been manually saved, and the post ID is not shown in the URL until then as well. The post is essentially invisible to all users, but this is a workaround to extract the auto-backing post before it's possible in WordPress UI.

  3. Next, a second browser uses the temporary post ID to construct a URL and load the same page. At this point the page is still inaccessible from any UI, so this feels contrived but technically possible.

  4. After that, the first user adds content to the page and saves, finally making the page accessible through regular means.

  5. Lastly, the user refreshes the first page, losing all content.

In my local testing, even with the ID extraction and URL construction, I was unable to reproduce the behavior on trunk after a few attempts:

draft-post-id-sync.mov

I suspect this is due to human-speed timings, but I'm not sure.

Testing flakiness

To verify, I tried running this test on trunk:

git fetch https://github.com/danluu/gutenberg.git try/draft-reopens-blank-pr:pr-77874
git checkout pr-77874 -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts test/e2e/specs/editor/collaboration/fixtures/collaboration-utils.ts
npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts

On my first attempt, the test passed in trunk, and on the second it failed. I tried a run of 10 tests and got the results:

Run 1:  FAIL
Run 2:  FAIL
Run 3:  PASS
Run 4:  PASS
Run 5:  FAIL
Run 6:  FAIL
Run 7:  PASS
Run 8:  PASS
Run 9:  PASS
Run 10: FAIL

Summary: 5 passed, 5 failed (out of 10)

The test as-is can pass on trunk, so there seems to be a timing issue or something. This test should fail reliably on trunk for regression testing.

Summary

I think this may be a valid bug, but as-is the reproduction steps and code here don't do a good job of showing it. I'd like to address these issues:

  1. Is it possible to get into this state without the manual construction of a draft URL using wp.data.select( 'core/editor' ).getCurrentPostId()? The fix also affects a second LOCAL_SYNC_MANAGER_ORIGIN transaction for invalidated keys, maybe this can result in a less unusual and more reliable setup?

  2. Is this manually reproducible? It doesn't need to be, but it's helpful.

  3. Can we ensure the test reliably tests the fix?

Thank you!

if ( POLLING_MANAGER_ORIGIN === origin ) {
if (
POLLING_MANAGER_ORIGIN === origin ||
LOCAL_SYNC_MANAGER_ORIGIN === origin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is merged, can we also add a note around the LOCAL_SYNC_MANAGER_ORIGIN constant? Currently there's an implicit need to pair any LOCAL_SYNC_MANAGER_ORIGIN transaction with a persistCRDTDoc(), or else the changes made aren't persisted to the room or entity. Both current places using the origin already persist the doc (like below), but it would be good to document that these two functions need to be bundled together.

targetDoc.transact( () => {
applyChangesToCRDTDoc( targetDoc, changes );
handlers.persistCRDTDoc();
}, LOCAL_SYNC_MANAGER_ORIGIN );

@danluu
Copy link
Copy Markdown
Contributor Author

danluu commented May 15, 2026

Sorry I didn't respond to this earlier. I must've marked the email for this as read while scrolling and not noticed it! Let me see if I can get a more natural repro for this. Given the usual sequence of steps here, it may not be possible.

@danluu
Copy link
Copy Markdown
Contributor Author

danluu commented May 15, 2026

Alright, here's a video of a repro that supposedly only does "normal" user actions. As a result of trying to do natural actions, there's a fairly long (roughly 1 minute) near the start of the video as one user waits to see the post show up so they can click on it to open it.

pr77874-natural-exact-repro-current-trunk-annotated.mp4

AI explanation of similar correct (issue doesn't reproduce) vs incorrect (issue does reproduce) ordering, below:

Correct Ordering

  1. User B opens the title-only draft and clears the title, creating stale blank local RTC state.
  2. User A adds body content and saves.
  3. User A’s saved title/body state is persisted to REST and _crdt_document.
  4. Before User B closes, User B receives/reconciles User A’s saved RTC state.
  5. User B’s local doc is no longer blank/stale when it disconnects.
  6. Fresh reopen hydrates from saved REST/persisted CRDT, and any RTC state agrees with it.
  7. Result: editor shows saved title and body.

Incorrect Ordering

  1. User B opens the title-only draft and clears the title.
  2. Save is disabled in User B’s UI, but the blank edit still enters User B’s local RTC/Yjs document.
  3. User A adds body content and saves.
  4. REST, revisions, and _crdt_document get the correct saved marker.
  5. User B closes before receiving or reconciling User A’s good saved state.
  6. The stale blank RTC state remains deliverable through provider/session cleanup timing.
  7. A fresh editor opens and connects to RTC while hydration/reconciliation is still not a hard authority barrier.
  8. The fresh editor applies or observes the stale blank RTC state before the saved persisted state fully wins.
  9. Result: editor renders No title and empty body even though REST/revisions/CRDT contain the saved marker.

Why It Varies

The bug depends on which async path wins:

  • If saved state reaches the stale tab before close, the bug self-heals.
  • If the stale tab closes while still blank, the next fresh load can be poisoned.
  • If fresh load hydrates persisted CRDT/REST and treats that as authoritative before stale RTC arrives, it opens correctly.
  • If stale RTC arrives during the hydration window and is treated as current collaboration state, it opens blank.

The race is among RTC delivery, save persistence, CRDT serialization, fresh editor hydration, and tab unload/disconnect. The user-visible actions can be identical while
those internal orderings differ.

[END AI TEXT]

@alecgeatches
Copy link
Copy Markdown
Contributor

alecgeatches commented May 18, 2026

Thank you for the reproduction, and interesting! The reproduction seemed a bit odd, as in my knowledge RTC autosaves did not automatically create draft posts on purpose, but it appears #77865 introduced this change which I missed. It seems like that change from trunk hasn't been merged into this branch yet. I'll try that and see if tests pass.

@alecgeatches
Copy link
Copy Markdown
Contributor

alecgeatches commented May 18, 2026

I still haven't had luck with a local reproduction, I believe because there's a tricky step where one user needs to delete the post's content, while the other user adds content and saves, and then that first user needs to exit the post before the save is merged which I haven't been able to manage manually.

I think this problem is likely real, but I don't trust the e2e tests. What do you think about removing them? I have a couple of reasons:

  1. I tried recreating the "Testing flakiness" steps from above against trunk again, and in my testing these tests do not reliably fail on trunk:

    # Get changed files
    $ git checkout pr-77874 -- test/e2e/specs/editor/collaboration/collaboration-autosave-stale-blank-reopen.spec.ts test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts test/e2e/specs/editor/collaboration/fixtures/collaboration-utils.ts
    
    # Run collaboration-draft-reopens-blank.spec.ts 5 times
    for i in $(seq 1 5); do npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-draft-reopens-blank.spec.ts > /dev/null 2>&1 && echo "Run $i: PASS" || echo "Run $i: FAIL"; done
    Run 1: FAIL
    Run 2: PASS
    Run 3: PASS
    Run 4: PASS
    Run 5: PASS
    
    # Run collaboration-autosave-stale-blank-reopen.spec.ts 5 times
    for i in $(seq 1 5); do npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-autosave-stale-blank-reopen.spec.ts > /dev/null 2>&1 && echo "Run $i: PASS" || echo "Run $i: FAIL"; done
    Run 1: PASS
    Run 2: PASS
    Run 3: PASS
    Run 4: PASS
    Run 5: FAIL

    Both pass 80% of the time in my small sample size on trunk.

  2. Both of these tests have pretty large timeouts (4 minutes and 2 minutes), which is a large overhead for CI. For collaboration-autosave-stale-blank-reopen.spec.ts a pass still takes a couple of minutes. This could be acceptable, but given the regression testing results I don't think it's worth it. I'm sure the long runtime is also related to my specific ask for a reproduction without skipping steps, so that's probably my fault. I think it's just too long and not reproducible enough in this case.

I think it would be possible to improve the test result timing by adjusting AUTOSAVE_INTERVAL lower for those particular tests, but given that the tests don't appear to be good regression indicators on trunk it's probably better to leave them out. I say we just merge the code change and polling-manager unit test, unless you think we should try to improve the e2e test results. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Feature] Real-time Collaboration Phase 3 of the Gutenberg roadmap around real-time collaboration [Package] Sync [Type] Bug An existing feature does not function as intended

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants