Skip to content

Epic: DOCX export/serialization (Markdown, HTML, plain text, PDF) #307

@stevenobiajulu

Description

@stevenobiajulu

Summary

Add document export / serialization to the safe-docx suite: render an open .docx into Markdown, HTML, plain text, and (separately) PDF. This completes the read → edit → compare → export loop.

Why

Agents routinely need a document's content in a portable format — to summarize, diff, feed to another model, or hand off to a human. Today the suite can read, grep, edit, compare, and save .docx, but it cannot emit a rendering in another format.

Architecture: one serializer core, several emitters

The OOXML parse layer already exists (packages/docx-core/src/primitives/), and document_view.ts already produces a structured, semantically-tagged model — headings with levels, list metadata (list_level, label_type), and inline tags (<b>, <i>, <u>, <a href>, <font>, <highlight>).

So Markdown, HTML, and plain text are thin emitters over a shared "structured-export" serializer core and should be built together. The inline layer is already HTML-shaped, which makes HTML and Markdown roughly equal first targets. PDF is a different problem — it needs a layout/render engine — and is tracked as a separate initiative.

Sequencing

  1. Structured-export serializer core + Markdown (feat: DOCX → Markdown export #303) and HTML (feat: DOCX → HTML export #304) emitters — first wave; they share the tree walk.
  2. Plain text (feat: DOCX → plain text export #305) — trivial once the core exists.
  3. PDF (feat: DOCX → PDF rendering #306) — separate, larger initiative; design spike first.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions