Skip to content

File upstream LibreOffice bug: nested <w:ins><w:del> tracked changes lost on DOCX import (found by the accept/reject oracle, #345) #346

@stevenobiajulu

Description

@stevenobiajulu

Recommendation: file upstream

Our LibreOffice accept/reject oracle (added in #345, [LEAN-HELP-09]) surfaced what looks like a genuine LibreOffice bug: a <w:del> nested inside a <w:ins> (a deletion of inserted text) is silently flattened on DOCX import — both tracked changes are discarded and the inserted-then-deleted text is incorrectly retained as plain content.

This is data loss on import (not an accept-time issue): it happens the moment LibreOffice opens the file, before any accept/reject. A reviewer's "inserted then deleted" history vanishes, and the resulting text is wrong vs. Microsoft Word.

LibreOffice uses Bugzilla (bugs.documentfoundation.org), not GitHub, so this issue tracks the action of filing upstream. A ready-to-submit draft is below.

Environment

  • LibreOffice 25.8.7.3 (30742500f2d3eb4366ac312fa33d3dcabdb3eba5), macOS, headless.

Minimal reproduction

Input word/document.xml — paragraph 1 is an insertion (w:ins) whose content is a deletion (w:del) of the text "x"; its paragraph mark is untracked. Paragraph 2 is a plain "keep".

<?xml version="1.0"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:body>
  <w:p><w:pPr/>
    <w:ins w:id="1" w:author="reviewerA" w:date="2024-01-01T00:00:00Z">
      <w:del w:id="2" w:author="reviewerB" w:date="2024-01-02T00:00:00Z">
        <w:r><w:delText>x</w:delText></w:r>
      </w:del>
    </w:ins>
  </w:p>
  <w:p><w:r><w:t>keep</w:t></w:r></w:p>
</w:body></w:document>

Steps (pure round-trip — no accept/reject):

# package the document.xml above into minimal.docx ([Content_Types].xml, _rels/.rels,
# word/_rels/document.xml.rels, word/document.xml), then:
soffice --headless --convert-to "docx:MS Word 2007 XML" --outdir /tmp/out minimal.docx
unzip -p /tmp/out/minimal.docx word/document.xml

Observed vs. expected

Observed (LibreOffice round-trip output, redlines stripped, "x" kept as plain text):

<w:p>…<w:r><w:t>x</w:t></w:r></w:p>
<w:p>…<w:r><w:t>keep</w:t></w:r></w:p>

Expected (Microsoft Word semantics): the <w:ins> / <w:del> redlines are preserved on round-trip; and since "x" was inserted and deleted, it is not part of the substantive content — Accept-All and Reject-All both remove it, leaving an empty first paragraph. Our TS engine and the Lean model both produce the empty-first-paragraph result.

Where we hit it

  • safe-docx differential harness, [LEAN-HELP-09] in packages/docx-core/src/integration/lean-differential-helpers.test.ts (PR test(docx-core): wire a libreoffice accept/reject oracle voter into the differential harness #345). We pin this as a characterized divergence and assert only the paragraph count (kept-not-dropped) for the nested-G3 case against LibreOffice — not the content — precisely because of this bug. The clean single-level cases (a plain w:del accept, a plain w:ins reject) round-trip correctly, so the issue is specific to the nested ins[del] construct.

Action items

  • Search bugs.documentfoundation.org for an existing report (a quick search found related-but-not-duplicate bugs — see Sources; the nested-ins/del-flatten case wasn't found, but search more thoroughly before filing).
  • File at LibreOffice Bugzilla using the draft below (component: Writer; keywords: filter:docx, dataLoss).
  • Link the upstream bug number back here and into the [LEAN-HELP-09] comment.

Draft upstream report (ready to paste into Bugzilla)

Summary: FILEOPEN DOCX: a <w:del> nested inside a <w:ins> (deletion of inserted text) is dropped on import — both tracked changes lost, inserted-then-deleted text retained as plain content

Steps to Reproduce: Open a .docx whose document.xml body contains <w:ins><w:del><w:r><w:delText>x</w:delText></w:r></w:del></w:ins> followed by a normal paragraph (minimal sample XML above). Then File ▸ Save As ▸ Word 2007 (.docx), reopen, and inspect word/document.xml (or use the headless --convert-to round-trip above).

Current behavior: Both the <w:ins> and <w:del> are gone; "x" appears as ordinary, non-tracked text. No track-changes redline is shown when the file is opened.

Expected behavior: The nested insertion/deletion is preserved as tracked changes. Because "x" was inserted and then deleted, it should not be part of the accepted content — Accept All / Reject All should both remove it (matching Microsoft Word). At minimum, the redline information must not be silently discarded on import.

Version: LibreOffice 25.8.7.3 (macOS). Reproduces headless and in the UI.

Note: Single-level redlines (a plain <w:del> or a plain <w:ins>) import and resolve correctly; only the nested ins-wrapping-del construct triggers the loss.

Sources (related, non-duplicate)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions