Skip to content

feat(pipeline): add PDF fallback parser and layout reconstruction (#63)#90

Merged
PAMulligan merged 1 commit into
mainfrom
63-indesign-pipeline-pdf-fallback-parser-and-layout-reconstruction
May 29, 2026
Merged

feat(pipeline): add PDF fallback parser and layout reconstruction (#63)#90
PAMulligan merged 1 commit into
mainfrom
63-indesign-pipeline-pdf-fallback-parser-and-layout-reconstruction

Conversation

@PAMulligan
Copy link
Copy Markdown
Collaborator

@-

Reconstruct an approximate version of the IDML parser's intermediate
representation from a PDF exported from InDesign, for when IDML is
unavailable or as a verification source.

- parse-pdf.js orchestrator: parsePdf/parsePdfBuffer -> validated Document IR
  (same shape as the IDML parser; Spread-per-page, Frame/TextFrame/ImageFrame/
  Story, warnings)
- pdf/ modules: lazy pdfjs-dist loader, per-page extraction (text runs, font
  metrics, fill colors, image XObjects, vector flag), positional clustering with
  gutter-aware column detection, font-size -> heading/body/caption style
  synthesis, RGB/gray/CMYK -> hex with nearest-swatch matching against an IDML
  palette, and a zero-dependency PNG encoder for the asset cache
- fidelity warnings for every approximation (10 codes), surfaced in CLI output
  and documented in docs/pipeline/indesign-pdf-fidelity.md
- flavian-parse-pdf CLI (--dpi, --asset-dir, --quiet)
- programmatic PDF fixtures (build-pdf.js); 57 tests incl. text-heavy,
  image-heavy, multi-column, brochure, asset extraction, and an IDML<->PDF
  round-trip within documented tolerances

pdfjs-dist pinned to 4.x with a guarded Promise.withResolvers polyfill so the
package keeps its Node >=20 support (5.x requires Node >=22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@PAMulligan PAMulligan linked an issue May 29, 2026 that may be closed by this pull request
@PAMulligan PAMulligan merged commit 38f4b64 into main May 29, 2026
4 checks passed
@PAMulligan PAMulligan deleted the 63-indesign-pipeline-pdf-fallback-parser-and-layout-reconstruction branch May 29, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[InDesign pipeline] PDF fallback parser and layout reconstruction

1 participant