This guide distils the live code path that now powers the OGOS FastAPI module. It expands on the original research notes with the exact responsibilities of each component so new contributors can reason about the pipeline quickly.
PDFAnalyzer.analyze_pdf inspects the first page using both PyMuPDF (fitz) and pypdf:
- Geometry – Convert media/trim boxes into millimetres for consistent reporting.
- Dieline Detection –
page.get_drawings()still drivesdetected_dielines, but we additionally callpage.get_cdrawings()to build the richerdieline_layersdiagnostics. - Layer Reporting – Each stroke-only drawing contributes a segment:
layer– the optional-content group name (falls back tounnamed).stroke_color– device-space colour tuple rounded to four decimals.line_width– millimetres when PyMuPDF reports a width.bounding_box– axis-aligned rectangle in millimetres.
- Mismatch Flag – We lowercase/strip layer names and compare them against the canonical alias set (
stans,cutcontour,kisscut,diecut). A mismatch is raised when more than one canonical alias appears or when aliases mix with unnamed layers. This is the signal consumed by the API headers.
Tip: You can inspect the same structure via
python -m tools.dump_dieline path/to.pdf --json.
Before we touch geometry, the existing SpotColorRenamer and SpotColorHandler ensure that every Separation/DeviceN spot colour token matches job_config.spot_color_name (default /stans). They recurse into Form XObjects, /OCProperties, and content streams so that the later PyMuPDF pass only has to deal with geometry.
PyMuPDFCompoundPathTool.process performs three major steps:
- Collect Streams – Use pypdf to locate page and Form XObject content streams that reference our target spot-colour names. Each candidate stream is opened via
doc.xref_stream(xref). - Extract Vector Sequences – Delegate to
StansCompoundPathConverterto strip unrelated content and retrieve raw operator sequences belonging to/stansstrokes. - Rebuild Primary Stream – Insert the combined compound-path commands into the first stream, update any siblings to remove the redundant stroke blocks, and save with
doc.save(..., deflate=True).
The merged path inherits stroke width, graphics-state, and colour commands from the first matching sequence to keep overprint and optional content intact.
Once the compound path is committed, the tool runs one more pass to ensure the /stans spot colour carries the expected tint transform (100 % magenta) and 0.5 pt width. At this point the caller (either the API or CLI) can distribute the PDF.
The updated API surfaces the new information in three places:
| Location | What you get |
|---|---|
analysis.dieline_layers |
Full segment list and layer_mismatch boolean. |
| HTTP Headers | X-Dieline-Layer-Mismatch, X-Dieline-Segment-Count, and the existing winding metadata. |
| CLI | python -m tools.dump_dieline for humans, /process?return_json=true for automation. |
This ensures both manual QA and workflow automation can flag files where designers split dielines across multiple layers.
When adding new capabilities, keep the following guidelines in mind:
- Respect Geometry – All PyMuPDF operations should use the original coordinates. Avoid snapping unless you also report the adjustment in diagnostics.
- Layer Integrity – If you introduce new form XObjects or redraw shapes, copy over the
/OCGreferences from the original stream to avoid breaking layer toggles. - Testing – Extend
tests/test_api_dieline_layers.py(for API commitments) or build fixture-based tests when touching the compound-path tool. - CLI Affordances – Prefer adding small flags to
tools.dump_dielineover building ad-hoc scripts; this keeps QA workflows consistent.
app/utils/pymupdf_compound_path_tool.pyapp/core/pdf_analyzer.pyapp/utils/stans_compound_path_converter.py- PyMuPDF documentation: https://pymupdf.readthedocs.io