-
Notifications
You must be signed in to change notification settings - Fork 0
docs: add Ollama local VLM guide #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| # Ollama local VLM guide | ||
|
|
||
| Use Ollama when you want free, local image description for extracted document images. | ||
|
|
||
| ## Install Ollama | ||
|
|
||
| 1. Install Ollama from <https://ollama.com/download>. | ||
| 2. Start the Ollama service on your machine. | ||
| 3. Pull the default local vision model used by parsemux: | ||
|
|
||
| ```bash | ||
| ollama pull qwen2.5vl:7b | ||
| ``` | ||
|
|
||
| Parsemux defaults to `qwen2.5vl:7b` for local image description. | ||
|
|
||
| ## Run parsemux with local image description | ||
|
|
||
| When you do not provide `--vlm-key` or `--llm-key`, parsemux auto-detects the VLM provider as Ollama. | ||
|
|
||
| ```bash | ||
| parsemux parse doc.pdf --extract-images --describe-images | ||
| ``` | ||
|
|
||
| This flow: | ||
|
|
||
| - extracts images from the document | ||
| - sends them to the local Ollama server at `http://localhost:11434` | ||
| - writes image descriptions back into the parse result | ||
|
|
||
| You can also set the provider explicitly: | ||
|
|
||
| ```bash | ||
| parsemux parse doc.pdf --extract-images --describe-images --vlm-provider ollama | ||
| ``` | ||
|
Comment on lines
+19
to
+35
|
||
|
|
||
| ## Performance expectations | ||
|
|
||
| Ollama is the zero-cost option, but it trades speed for privacy and local control. | ||
|
|
||
| - Speed: slower than hosted APIs, especially on CPU-only machines | ||
| - Quality: good enough for many document images, charts, and screenshots, but usually below top cloud vision models | ||
| - Privacy: best option when documents must stay on your machine | ||
| - Cost: `0.0` direct API cost inside parsemux | ||
|
|
||
| For best results: | ||
|
|
||
| - use a machine with a capable GPU if available | ||
| - keep document batches small when testing locally | ||
| - expect longer runtimes for image-heavy PDFs | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example command here (
parsemux parse ... --describe-imageswith no--vlm-key/--llm-key) won’t currently trigger image description: insrc/parsemux/core/engine.pythe VLM step is gated behindif vlm_key:(derived fromrequest.vlm_api_key/request.llm_api_key/PARSEMUX_VLM_API_KEY). With no key provided, the engine skips VLM entirely, so this section’s “falls back to Ollama automatically” claim is inaccurate. Either (a) adjust the docs to require providing a key/env var (even for Ollama), or (b) update the engine to allow Ollama descriptions with an empty key when--vlm-provider ollama(or when provider auto-detects to Ollama).