You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a capability to convert Microsoft Word .docx documents into OpenDocument Text .odt, building on the now-merged @usejunior/docx-core (DOCX model) and @usejunior/odf-core (ODT packaging + document view, landed in #328). This is the natural bridge between the two backends and is directly motivated by Germany's ODF mandate: organizations holding .docx will need faithful .odt equivalents.
Motivation
Germany's IT-Planungsrat ODF mandate creates demand for converting existing .docx corpora to .odt.
We already parse .docx into a structured model (docx-core) and write valid .odt packages (odf-core). A conversion path closes the loop and is a strong SEO/positioning surface ("convert DOCX to ODF", "edit ODT with AI agents").
During feat(odf-core): add ODF .odt core library (archive, view, replace) #328's post-merge smoke we converted a real NVCA .docx → .odt via LibreOffice and round-tripped it through odf-core; that proved the shape works but relied on an external binary (see approach trade-offs below).
Scope (proposed)
In:.docx (WordprocessingML text documents) → .odt (OpenDocument Text). Body paragraphs, headings, basic run formatting (bold/italic/underline), lists, and tables.
Out (defer):.ods/.odp; tracked changes / comments fidelity; headers/footers/footnotes fidelity; pixel-faithful styling. Conversion is explicitly semantic, not byte- or layout-perfect (mirrors the existing export tooling's "intentionally lossy" stance).
Approaches & trade-offs (to settle in design)
Native model-to-model mapping (docx-core DOM → odf-core document model → content.xml + styles.xml).
✅ No external runtime dependency — consistent with the repo convention of a Node/TypeScript-only runtime (no LibreOffice/Aspose/Python at runtime).
✅ Deterministic, testable, embeddable in the MCP server.
❌ Larger effort; formatting/style mapping is broad; initial fidelity will be partial.
Shell out to LibreOffice headless (soffice --convert-to odt).
✅ High fidelity immediately; trivial to implement.
❌ Heavy runtime dependency that violates the Node/TS-only runtime convention; not viable for the local-first MCP distribution. Acceptable only as a dev/CI reference oracle for differential testing, not as the shipped path.
Recommendation to evaluate in design: native mapping for the shipped capability, with LibreOffice used purely as a test oracle (convert the same .docx both ways and diff visible text / structure).
Suggested process
New capability → start with an OpenSpec change (add-docx-to-odf-conversion) before implementation, per repo convention.
Phase it: (1) text + headings + basic runs; (2) lists + tables; (3) richer styles. Gate each phase on a real-document corpus (e.g. the bundled NVCA / ILPA fixtures) with a LibreOffice-oracle differential test.
Summary
Add a capability to convert Microsoft Word
.docxdocuments into OpenDocument Text.odt, building on the now-merged@usejunior/docx-core(DOCX model) and@usejunior/odf-core(ODT packaging + document view, landed in #328). This is the natural bridge between the two backends and is directly motivated by Germany's ODF mandate: organizations holding.docxwill need faithful.odtequivalents.Motivation
.docxcorpora to.odt..docxinto a structured model (docx-core) and write valid.odtpackages (odf-core). A conversion path closes the loop and is a strong SEO/positioning surface ("convert DOCX to ODF", "edit ODT with AI agents")..docx→.odtvia LibreOffice and round-tripped it throughodf-core; that proved the shape works but relied on an external binary (see approach trade-offs below).Scope (proposed)
.docx(WordprocessingML text documents) →.odt(OpenDocument Text). Body paragraphs, headings, basic run formatting (bold/italic/underline), lists, and tables..ods/.odp; tracked changes / comments fidelity; headers/footers/footnotes fidelity; pixel-faithful styling. Conversion is explicitly semantic, not byte- or layout-perfect (mirrors the existing export tooling's "intentionally lossy" stance).Approaches & trade-offs (to settle in design)
docx-coreDOM →odf-coredocument model →content.xml+styles.xml).soffice --convert-to odt).Recommendation to evaluate in design: native mapping for the shipped capability, with LibreOffice used purely as a test oracle (convert the same
.docxboth ways and diff visible text / structure).Suggested process
add-docx-to-odf-conversion) before implementation, per repo convention.private: trueuntil a real publish-readiness gate (release-isolation guard from ci(release): add ODF release-isolation guard and OpenSpec proposal #326).Acceptance criteria (Phase 1)
.docxto a valid.odt(opens cleanly in LibreOffice)..odtsatisfiesodf-core's packaging rules (mimetype first + uncompressed;validateOdfArchiveSafetypasses).References
@usejunior/docx-core(DOCX model),@usejunior/odf-core(ODT packaging/view, feat(odf-core): add ODF .odt core library (archive, view, replace) #328)