fix: exclude MoveFrom ghost text from paragraph.raw_text()#879
Merged
Conversation
Add first-class MoveFrom/MoveTo tracked-change support following the existing Insert/Delete pattern. When text is cut-and-pasted in Word with Track Changes, both moveFrom (ghost) and moveTo (live) contain the same text. raw_text() now skips MoveFrom and includes MoveTo, preventing moved text from appearing twice in output. Also prevent nested MoveFrom subtrees from flattening into parent containers (Insert, Delete, Hyperlink, MoveTo) via ignore_element(). Fixes bokuweb#794
|
Thanks for continuing the work on this issue Steven, awesome 👍 |
Contributor
Author
bokuweb
approved these changes
Apr 22, 2026
bokuweb
left a comment
Owner
There was a problem hiding this comment.
LGTM
thanks for your great work
Contributor
Author
|
Thanks for the review and merge, @bokuweb. Happy to help with future follow-ups here if useful. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MoveFrom/MoveTotracked-change support, following the existingInsert/DeletepatternParagraph::raw_text()now skipsMoveFrom(ghost text from original location) and includesMoveTo(live text at destination), preventing moved text from appearing twice in outputMoveFromsubtrees from flattening into parent containers (Insert,Delete,Hyperlink,MoveTo) viaignore_element()callsFixes #794 — builds on the approach started in #795 by @fonskip, adding nested-flattening prevention, WASM support, and regression tests.
Context
When text is cut-and-pasted in Microsoft Word with Track Changes enabled, Word records the operation using
<w:moveFrom>(original location) and<w:moveTo>(destination). Both contain the same text. Previously,raw_text()had no awareness of these elements — their inner runs would either fall through to a generic handler or be parsed as regular content, causing moved text to appear twice.Changes
New files (6)
docx-core/src/documents/elements/move_from.rsMoveFromstruct,MoveFromChildenum, BuildXML, Serialize, HistoryIddocx-core/src/documents/elements/move_to.rsMoveTostruct,MoveToChildenum, BuildXML, Serialize, HistoryIddocx-core/src/reader/move_from.rs<w:moveFrom>docx-core/src/reader/move_to.rs<w:moveTo>docx-wasm/src/move_from.rsdocx-wasm/src/move_to.rsModified files (11)
xml_element.rs—MoveFrom/MoveToenum variants +FromStrmappingselements.rs(xml_builder) —open_move_from/open_move_tomacroselements/mod.rs— module registrationelements/paragraph.rs—ParagraphChildvariants,raw_text()fix, Serialize, BuildXML, builder methodsreader/mod.rs— module registrationreader/paragraph.rs— dispatch MoveFrom/MoveTo to readersreader/insert.rs—ignore_element()to skip nested MoveFrom subtreesreader/delete.rs—ignore_element()to skip nested MoveFrom subtreesreader/hyperlink.rs—ignore_element()to skip nested MoveFrom subtreesdocx-wasm/src/paragraph.rs—add_move_from()/add_move_to()methodsdocx-wasm/src/lib.rs— module registrationTest plan
test_move_from_default— MoveFrom XML round-triptest_move_to_default— MoveTo XML round-triptest_raw_text_move_from_to_dedup— verifies "Hello world" not "Hello worldworld"test_raw_text_mixed_content_with_move— mixed normal + moved texttest_read_move_from_to— XML reader parses moveFrom/moveTo correctlytest_read_move_from_nested_in_insert— ghost text inside<w:ins>is skippedtest_read_move_from_nested_in_move_to— ghost text inside<w:moveTo>is skippedPR checklist (per CONTRIBUTING.md)
make test— all 269 unit tests + 20 integration tests pass (cargo test -- --test-threads=1)make lint— pre-existing clippy failures onmain(7 errors); this PR adds 2 that follow the identicalneedless_borrowpattern indelete.rsandinsert.rs(&self.generate()required byHistoryIdtrait)cargo fmt --check— cleancargo-insta review— not needed (no.snap.newfiles generated)docx-wasm pnpm test— all Rust-side tests pass;wasm-pack buildstep could not complete (wasm-packrequires Rust >= 1.85; local toolchain is 1.81). WASM wrappers follow the exactInsert/DeletepatternPrior work
This PR supersedes #795 by @fonskip, which introduced the same core approach (MoveFrom/MoveTo structs, readers,
raw_text()fix). That WIP PR was a great starting point — this PR builds on it by adding:ignore_element()calls in Insert, Delete, Hyperlink, and MoveTo readers to prevent MoveFrom ghost text from leaking into parent containersIntentionally deferred
moveFromRangeStart/End,moveToRangeStart/End): bookmark-style markers that map toUnsupportedsafely via theFromStrcatch-allw:idpairing preservation: IDs are discarded on read and regenerated on write, matching existing Insert/Delete behavior