Skip to content

Releases: QuartzUnit/docpick

v0.1.2

17 Mar 01:42

Choose a tag to compare

Changes

  • Add MCP server module (docpick-mcp) with 3 tools: extract_document, ocr_document, list_schemas
  • Add [mcp] optional dependency (pip install "docpick[mcp]")
  • Register on MCP official registry (io.github.ArkNill/docpick)
  • 217 tests, all passing

v0.1.1

16 Mar 23:57

Choose a tag to compare

Initial Public Release

Schema-driven document extraction with local OCR + LLM.

Highlights

  • 4 OCR engines: PaddleOCR (default), EasyOCR, GOT-OCR2.0, VLM — with 2-Tier auto-selection and fallback
  • 8 built-in schemas: Invoice, Receipt, Bill of Lading, Purchase Order, Korean Tax Invoice, Bank Statement, ID Document, Certificate of Origin
  • 3 LLM providers: vLLM, Ollama, OpenAI-compatible endpoints
  • Validation: Checkdigit algorithms (Luhn, Verhoeff, ISBN-13, IBAN mod97, ISO 6346, Korean BRN), cross-field rules, cross-document consistency
  • Batch processing: Async with configurable concurrency and progress bar
  • CLI: docpick extract, ocr, validate, batch, schemas
  • 217 unit tests, all passing
  • Zero cloud dependency — runs entirely on your machine
  • Apache 2.0 — no GPL/AGPL dependencies

Install

pip install docpick

Python 3.11+ required. See README for details.