This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The package is installed in editable mode into the system Python. Invoke it via:
python -m blueview.cli <subcommand> [options]Subcommands: info, markups, spaces, text, search, render
python -m pip install -e .No venv is used — the package is installed directly into the system Python (Python313).
One package (blueview/), no frameworks. Each module is a thin layer over the two PDF libraries:
- PyMuPDF (
fitz) — standard annotation fields, text extraction, page rendering. Used inmarkups.py,text.py,render.py,document.py. - pikepdf — low-level PDF object access for Bluebeam's undocumented private data. Used in
markups.py(_load_bb_annotation_data) anddocument.py(_extract_scales).
Bluebeam stores private data in two places within the PDF:
- Per-annotation objects — extra keys on each annotation dict (measurement values, lock state).
/Root/BBdocument dictionary — holds/CustomColumnsdefinitions and/MarkupList(an array of per-markup dicts keyed by/NMUUID) carrying status, custom column values, and space assignment.
markups.extract() reads standard fields via PyMuPDF, then calls _load_bb_annotation_data() (pikepdf) to merge in the Bluebeam-private fields, keyed by annotation UUID (/NM).
markups.extract() hard-excludes author == "AutoCAD SHX Text" unconditionally — these are geometry-proxy annotations from CAD exports, not real markups. All other filtering (--status, --author, --subject, --page) is opt-in via CLI flags.
PyMuPDF's page.get_label() returns UTF-16 labels as raw PDF hex strings (<FEFF...>). _util.decode_pdf_label() decodes these transparently and is used in every module that emits a page_label field.
export.py has four writers (to_json, to_csv, to_xlsx, to_table). XLSX output includes styled headers and frozen first row. All writers accept rows: list[dict] with heterogeneous keys — _unified_fieldnames() collects the union of all keys in insertion order.