Skip to content

fix: exclude MoveFrom ghost text from paragraph.raw_text()#879

Merged
bokuweb merged 1 commit into
bokuweb:mainfrom
stevenobiajulu:fix/794-move-from-to-dedup
Apr 22, 2026
Merged

fix: exclude MoveFrom ghost text from paragraph.raw_text()#879
bokuweb merged 1 commit into
bokuweb:mainfrom
stevenobiajulu:fix/794-move-from-to-dedup

Conversation

@stevenobiajulu

@stevenobiajulu stevenobiajulu commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add first-class MoveFrom/MoveTo tracked-change support, following the existing Insert/Delete pattern
  • Core fix: Paragraph::raw_text() now skips MoveFrom (ghost text from original location) and includes MoveTo (live text at destination), preventing moved text from appearing twice in output
  • Prevent nested MoveFrom subtrees from flattening into parent containers (Insert, Delete, Hyperlink, MoveTo) via ignore_element() calls

Fixes #794 — builds on the approach started in #795 by @fonskip, adding nested-flattening prevention, WASM support, and regression tests.

Context

When text is cut-and-pasted in Microsoft Word with Track Changes enabled, Word records the operation using <w:moveFrom> (original location) and <w:moveTo> (destination). Both contain the same text. Previously, raw_text() had no awareness of these elements — their inner runs would either fall through to a generic handler or be parsed as regular content, causing moved text to appear twice.

Changes

New files (6)

File Purpose
docx-core/src/documents/elements/move_from.rs MoveFrom struct, MoveFromChild enum, BuildXML, Serialize, HistoryId
docx-core/src/documents/elements/move_to.rs MoveTo struct, MoveToChild enum, BuildXML, Serialize, HistoryId
docx-core/src/reader/move_from.rs XML reader for <w:moveFrom>
docx-core/src/reader/move_to.rs XML reader for <w:moveTo>
docx-wasm/src/move_from.rs WASM wrapper
docx-wasm/src/move_to.rs WASM wrapper

Modified files (11)

  • xml_element.rsMoveFrom/MoveTo enum variants + FromStr mappings
  • elements.rs (xml_builder) — open_move_from/open_move_to macros
  • elements/mod.rs — module registration
  • elements/paragraph.rsParagraphChild variants, raw_text() fix, Serialize, BuildXML, builder methods
  • reader/mod.rs — module registration
  • reader/paragraph.rs — dispatch MoveFrom/MoveTo to readers
  • reader/insert.rsignore_element() to skip nested MoveFrom subtrees
  • reader/delete.rsignore_element() to skip nested MoveFrom subtrees
  • reader/hyperlink.rsignore_element() to skip nested MoveFrom subtrees
  • docx-wasm/src/paragraph.rsadd_move_from()/add_move_to() methods
  • docx-wasm/src/lib.rs — module registration

Test plan

  • test_move_from_default — MoveFrom XML round-trip
  • test_move_to_default — MoveTo XML round-trip
  • test_raw_text_move_from_to_dedup — verifies "Hello world" not "Hello worldworld"
  • test_raw_text_mixed_content_with_move — mixed normal + moved text
  • test_read_move_from_to — XML reader parses moveFrom/moveTo correctly
  • test_read_move_from_nested_in_insert — ghost text inside <w:ins> is skipped
  • test_read_move_from_nested_in_move_to — ghost text inside <w:moveTo> is skipped

PR checklist (per CONTRIBUTING.md)

  • make test — all 269 unit tests + 20 integration tests pass (cargo test -- --test-threads=1)
  • make lint — pre-existing clippy failures on main (7 errors); this PR adds 2 that follow the identical needless_borrow pattern in delete.rs and insert.rs (&self.generate() required by HistoryId trait)
  • cargo fmt --check — clean
  • cargo-insta review — not needed (no .snap.new files generated)
  • docx-wasm pnpm test — all Rust-side tests pass; wasm-pack build step could not complete (wasm-pack requires Rust >= 1.85; local toolchain is 1.81). WASM wrappers follow the exact Insert/Delete pattern

Prior work

This PR supersedes #795 by @fonskip, which introduced the same core approach (MoveFrom/MoveTo structs, readers, raw_text() fix). That WIP PR was a great starting point — this PR builds on it by adding:

  • Nested flattening prevention: ignore_element() calls in Insert, Delete, Hyperlink, and MoveTo readers to prevent MoveFrom ghost text from leaking into parent containers
  • WASM wrappers for JavaScript interop
  • Regression tests for nested MoveFrom scenarios

Intentionally deferred

  • Move range markers (moveFromRangeStart/End, moveToRangeStart/End): bookmark-style markers that map to Unsupported safely via the FromStr catch-all
  • w:id pairing preservation: IDs are discarded on read and regenerated on write, matching existing Insert/Delete behavior
  • Table-cell-level moves: paragraph-level handling covers the primary use case
  • MoveFrom/MoveTo inside StructuredDataTag: same flattening risk but very rare

Add first-class MoveFrom/MoveTo tracked-change support following the
existing Insert/Delete pattern. When text is cut-and-pasted in Word
with Track Changes, both moveFrom (ghost) and moveTo (live) contain
the same text. raw_text() now skips MoveFrom and includes MoveTo,
preventing moved text from appearing twice in output.

Also prevent nested MoveFrom subtrees from flattening into parent
containers (Insert, Delete, Hyperlink, MoveTo) via ignore_element().

Fixes bokuweb#794
@fonskip

fonskip commented Mar 3, 2026

Copy link
Copy Markdown

Thanks for continuing the work on this issue Steven, awesome 👍

@stevenobiajulu

Copy link
Copy Markdown
Contributor Author

Thanks @fonskip! Your PR was a huge head start — the core MoveFrom/MoveTo structs and raw_text() fix came straight from #795.

@bokuweb bokuweb left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thanks for your great work

@bokuweb bokuweb merged commit 2d55d69 into bokuweb:main Apr 22, 2026
4 checks passed
@stevenobiajulu

Copy link
Copy Markdown
Contributor Author

Thanks for the review and merge, @bokuweb. Happy to help with future follow-ups here if useful.

@stevenobiajulu stevenobiajulu deleted the fix/794-move-from-to-dedup branch April 22, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Moved text in tracked changes is duplicated when using paragraph.raw_text()

3 participants