PSPDFKit · jdrhyne · Apr 24, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,15 @@
 # Changelog
 
+## 0.2.2 — 2026-04-23
+
+- Re-run benchmarks with all competitor libraries updated to latest versions
+- Pin exact library versions in benchmark tables for transparency
+- Fix ODL hybrid scoring (was running without docling backend server, producing degraded results)
+- pymupdf4llm updated 0.3.4 → 1.27.2 (major version bump, significant quality improvement)
+- markitdown updated 0.1.4 → 0.1.5 (table extraction restored)
+- opendataloader-pdf updated 1.9.1 → 2.3.0
+- docling updated 2.71.0 → 2.91.0
+
 ## 0.2.1 — 2026-04-22
 
 - Add `--enable-image-export` flag: extracts images to `{output}_resources/` and references them as Markdown image links (off by default)

diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@
 
 Fast, accurate Markdown from PDFs — locally, with no cleanup required. Built for Claude, Codex, RAG pipelines, and document-heavy automation where noisy extraction burns tokens and makes downstream results less reliable.
 
-- **How fast is it?** — 0.007s per page. 87x faster than docling, 10x faster than pymupdf4llm. ([benchmarks](#benchmarks))
+- **How fast is it?** — 0.011s per page. 48x faster than docling, 29x faster than pymupdf4llm. ([benchmarks](#benchmarks))
 - **How accurate is it?** — 0.93 reading order (best in class), 0.89 overall extraction accuracy, 0.82 heading detection. ([benchmarks](#benchmarks))
 - **NEW: Image export** — `--enable-image-export` extracts images alongside Markdown for vision-capable LLMs. ([usage](#image-export))
 - **Where do my PDFs go?** — Nowhere. The CLI runs locally. Your documents are not uploaded to Nutrient. ([trust & licensing](#trust-and-licensing))
@@ -124,56 +124,56 @@ Extracts images from the PDF and saves them to `output_resources/`, referenced a
 
 ## Benchmarks
 
-Benchmark results from 200 PDF documents with hand-annotated Markdown ground truth, evaluated using NID (reading order), TEDS (table structure), and MHS (heading hierarchy) metrics. Benchmarked on `2026-04-22`.
+Benchmark results from 200 PDF documents with hand-annotated Markdown ground truth, evaluated using NID (reading order), TEDS (table structure), and MHS (heading hierarchy) metrics. All competitor libraries pinned to their latest versions as of `2026-04-23`.
 
 ### Visual Snapshot
 
-![Extraction accuracy](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/extraction-accuracy.png?v=20260422)
+![Extraction accuracy](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/extraction-accuracy.png?v=20260423)
 
-![Reading order](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/reading-order.png?v=20260422)
+![Reading order](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/reading-order.png?v=20260423)
 
-![Table structure](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/table-structure.png?v=20260422)
+![Table structure](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/table-structure.png?v=20260423)
 
-![Heading level](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/heading-level.png?v=20260422)
+![Heading level](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/heading-level.png?v=20260423)
 
-![Extraction speed](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/extraction-speed.png?v=20260422)
+![Extraction speed](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/extraction-speed.png?v=20260423)
 
-![Faster with Nutrient](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/faster-with-nutrient.png?v=20260422)
+![Faster with Nutrient](https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/docs/assets/faster-with-nutrient.png?v=20260423)
 
 ### Accuracy
 
-| Solution | Overall | Reading Order (NID) | Table Structure (TEDS) | Heading Level (MHS) |
-| --- | ---: | ---: | ---: | ---: |
-| **Nutrient** | **0.89** | **0.93** | 0.71 | 0.82 |
-| docling | 0.88 | 0.90 | **0.89** | **0.82** |
-| opendataloader | 0.84 | 0.91 | 0.49 | 0.74 |
-| opendataloader-hybrid | 0.83 | 0.92 | 0.43 | 0.73 |
-| pymupdf4llm | 0.74 | 0.89 | 0.40 | 0.43 |
-| markitdown | 0.58 | 0.88 | 0.00 | 0.00 |
-| pypdf | 0.58 | 0.87 | 0.00 | 0.00 |
-| liteparse | 0.57 | 0.86 | 0.00 | 0.00 |
+| Solution | Version | Overall | Reading Order (NID) | Table Structure (TEDS) | Heading Level (MHS) |
+| --- | --- | ---: | ---: | ---: | ---: |
+| **Nutrient** | 1.0.1 | **0.89** | **0.93** | 0.71 | 0.82 |
+| docling | 2.91.0 | 0.88 | 0.90 | **0.89** | **0.82** |
+| opendataloader-hybrid | 2.3.0 | 0.87 | 0.91 | 0.68 | 0.81 |
+| pymupdf4llm | 1.27.2 | 0.83 | 0.89 | 0.54 | 0.77 |
+| opendataloader | 2.3.0 | 0.83 | 0.90 | 0.48 | 0.74 |
+| markitdown | 0.1.5 | 0.59 | 0.84 | 0.27 | 0.00 |
+| pypdf | 6.10.2 | 0.58 | 0.87 | 0.00 | 0.00 |
+| liteparse | 1.2.1 | 0.57 | 0.86 | 0.00 | 0.00 |
 
 ### Speed
 
 | Solution | Seconds per page |
 | --- | ---: |
-| **Nutrient** | **0.007** |
-| pypdf | 0.017 |
-| markitdown | 0.038 |
-| opendataloader-hybrid | 0.048 |
-| pymupdf4llm | 0.071 |
-| opendataloader | 0.079 |
-| docling | 0.610 |
-| liteparse | 1.033 |
+| **Nutrient** | **0.011** |
+| pypdf | 0.019 |
+| opendataloader | 0.023 |
+| markitdown | 0.097 |
+| pymupdf4llm | 0.319 |
+| opendataloader-hybrid | 0.444 |
+| docling | 0.527 |
+| liteparse | 1.081 |
 
 ### Faster with Nutrient
 
-- `147x` faster than `liteparse`
-- `87x` faster than `docling`
-- `11x` faster than `opendataloader`
-- `10x` faster than `pymupdf4llm`
-- `7x` faster than `opendataloader-hybrid`
-- `5x` faster than `markitdown`
+- `98x` faster than `liteparse`
+- `48x` faster than `docling`
+- `40x` faster than `opendataloader-hybrid`
+- `29x` faster than `pymupdf4llm`
+- `9x` faster than `markitdown`
+- `2x` faster than `opendataloader`
 
 For the full comparison table, see [docs/benchmarks.md](docs/benchmarks.md).
 
@@ -191,7 +191,7 @@ See [LICENSE.md](LICENSE.md) for the full terms and [docs/distribution-model.md]
 
 ### What makes this different from other PDF extractors?
 
-Speed and accuracy should not be a tradeoff. Most extractors are either fast but lose structure (markitdown, pymupdf4llm) or accurate but slow (docling). Nutrient extracts at 0.007s per page with the best reading order score (0.93), strong heading and table preservation — less cleanup, fewer wasted tokens, and more reliable downstream results.
+Speed and accuracy should not be a tradeoff. Most extractors are either fast but lose structure (markitdown, pymupdf4llm) or accurate but slow (docling). Nutrient extracts at 0.011s per page with the best reading order score (0.93), strong heading and table preservation — less cleanup, fewer wasted tokens, and more reliable downstream results.
 
 ### Do my documents leave my machine?
 

diff --git a/docs/assets/extraction-accuracy.png b/docs/assets/extraction-accuracy.png
diff --git a/docs/assets/extraction-speed.png b/docs/assets/extraction-speed.png
diff --git a/docs/assets/faster-with-nutrient.png b/docs/assets/faster-with-nutrient.png
diff --git a/docs/assets/heading-level.png b/docs/assets/heading-level.png
diff --git a/docs/assets/reading-order.png b/docs/assets/reading-order.png
diff --git a/docs/assets/table-structure.png b/docs/assets/table-structure.png
diff --git a/docs/benchmarks.md b/docs/benchmarks.md
@@ -2,43 +2,44 @@
 
 Evaluated on 200 PDF documents with hand-annotated Markdown ground truth from the DP-Bench corpus.
 
-- Benchmark date: `2026-04-22`
+- Benchmark date: `2026-04-23`
 - Corpus: 200 documents with ground-truth Markdown annotations (42 with tables, 107 with headings)
+- Hardware: Apple M4 Max
 - Metrics: NID (reading order), TEDS (table structure), MHS (heading hierarchy)
 - All scores normalized to [0, 1] — higher is better
+- All competitor libraries pinned to their latest versions
 
 ## Accuracy Metrics
 
-| Solution | Extraction accuracy | Reading order (NID) | Table structure (TEDS) | Heading level (MHS) |
-| --- | ---: | ---: | ---: | ---: |
-| **Nutrient** | **0.89** | **0.93** | 0.71 | 0.82 |
-| docling | 0.88 | 0.90 | **0.89** | **0.82** |
-| opendataloader | 0.84 | 0.91 | 0.49 | 0.74 |
-| opendataloader-hybrid | 0.83 | 0.92 | 0.43 | 0.73 |
-| pymupdf4llm | 0.74 | 0.89 | 0.40 | 0.43 |
-| markitdown | 0.58 | 0.88 | 0.00 | 0.00 |
-| pypdf | 0.58 | 0.87 | 0.00 | 0.00 |
-| liteparse | 0.57 | 0.86 | 0.00 | 0.00 |
+| Solution | Version | Extraction accuracy | Reading order (NID) | Table structure (TEDS) | Heading level (MHS) |
+| --- | --- | ---: | ---: | ---: | ---: |
+| **Nutrient** | 1.0.1 | **0.89** | **0.93** | 0.71 | 0.82 |
+| docling | 2.91.0 | 0.88 | 0.90 | **0.89** | **0.82** |
+| opendataloader-hybrid | 2.3.0 | 0.87 | 0.91 | 0.68 | 0.81 |
+| pymupdf4llm | 1.27.2 | 0.83 | 0.89 | 0.54 | 0.77 |
+| opendataloader | 2.3.0 | 0.83 | 0.90 | 0.48 | 0.74 |
+| markitdown | 0.1.5 | 0.59 | 0.84 | 0.27 | 0.00 |
+| pypdf | 6.10.2 | 0.58 | 0.87 | 0.00 | 0.00 |
+| liteparse | 1.2.1 | 0.57 | 0.86 | 0.00 | 0.00 |
 
 ## Speed
 
 | Solution | Seconds per page |
 | --- | ---: |
-| **Nutrient** | **0.007** |
-| pypdf | 0.017 |
-| markitdown | 0.038 |
-| opendataloader-hybrid | 0.048 |
-| pymupdf4llm | 0.071 |
-| opendataloader | 0.079 |
-| docling | 0.610 |
-| liteparse | 1.033 |
+| **Nutrient** | **0.011** |
+| pypdf | 0.019 |
+| opendataloader | 0.023 |
+| markitdown | 0.097 |
+| pymupdf4llm | 0.319 |
+| opendataloader-hybrid | 0.444 |
+| docling | 0.527 |
+| liteparse | 1.081 |
 
 ## Relative Speed Callouts
 
-- Nutrient is `147x` faster than `liteparse`
-- Nutrient is `87x` faster than `docling`
-- Nutrient is `11x` faster than `opendataloader`
-- Nutrient is `10x` faster than `pymupdf4llm`
-- Nutrient is `7x` faster than `opendataloader-hybrid`
-- Nutrient is `5x` faster than `markitdown`
-- Nutrient is `2x` faster than `pypdf`
+- Nutrient is `98x` faster than `liteparse`
+- Nutrient is `48x` faster than `docling`
+- Nutrient is `40x` faster than `opendataloader-hybrid`
+- Nutrient is `29x` faster than `pymupdf4llm`
+- Nutrient is `9x` faster than `markitdown`
+- Nutrient is `2x` faster than `opendataloader`
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@pspdfkit/pdf-to-markdown",
-  "version": "0.2.1",
+  "version": "0.2.2",
   "description": "Standalone CLI wrapper for Nutrient's PDF-to-Markdown extractor",
   "bin": {
     "pdf-to-markdown": "bin/pdf-to-markdown"