Releases: QuartzUnit/docpick
Releases · QuartzUnit/docpick
v0.1.2
v0.1.1
Initial Public Release
Schema-driven document extraction with local OCR + LLM.
Highlights
- 4 OCR engines: PaddleOCR (default), EasyOCR, GOT-OCR2.0, VLM — with 2-Tier auto-selection and fallback
- 8 built-in schemas: Invoice, Receipt, Bill of Lading, Purchase Order, Korean Tax Invoice, Bank Statement, ID Document, Certificate of Origin
- 3 LLM providers: vLLM, Ollama, OpenAI-compatible endpoints
- Validation: Checkdigit algorithms (Luhn, Verhoeff, ISBN-13, IBAN mod97, ISO 6346, Korean BRN), cross-field rules, cross-document consistency
- Batch processing: Async with configurable concurrency and progress bar
- CLI:
docpick extract,ocr,validate,batch,schemas - 217 unit tests, all passing
- Zero cloud dependency — runs entirely on your machine
- Apache 2.0 — no GPL/AGPL dependencies
Install
pip install docpickPython 3.11+ required. See README for details.