docs: add Ollama local VLM guide by abhijeetnardele24-hash · Pull Request #16 · vericontext/parsemux

abhijeetnardele24-hash · 2026-04-02T17:13:01Z

Adds a dedicated Ollama guide and README documentation for free local image description with qwen2.5vl:7b.

Closes #9

Signed-off-by: abhijeet nardele <234410808+abhijeetnardele24-hash@users.noreply.github.com>

vercel · 2026-04-02T17:13:07Z

@abhijeetnardele24-hash is attempting to deploy a commit to the Kiyeon Jeon's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copilot

Pull request overview

Adds documentation for using Ollama as a local Vision-Language Model (VLM) provider in parsemux, aiming to enable “free local image description” with qwen2.5vl:7b and to close issue #9.

Changes:

Adds an “Ollama local VLM” section to the main README with a quick-start command.
Introduces a new dedicated setup/performance guide at docs/ollama-guide.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
README.md	Adds a short Ollama local VLM quick-start section and links to the full guide.
docs/ollama-guide.md	New guide describing Ollama installation, usage, and performance expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T17:16:02Z

README.md

+```bash
+ollama pull qwen2.5vl:7b
+parsemux parse doc.pdf --extract-images --describe-images
+```
+
+When no VLM key is provided, parsemux falls back to Ollama automatically.
+See [docs/ollama-guide.md](docs/ollama-guide.md) for setup details and performance notes.


The example command here (parsemux parse ... --describe-images with no --vlm-key/--llm-key) won’t currently trigger image description: in src/parsemux/core/engine.py the VLM step is gated behind if vlm_key: (derived from request.vlm_api_key / request.llm_api_key / PARSEMUX_VLM_API_KEY). With no key provided, the engine skips VLM entirely, so this section’s “falls back to Ollama automatically” claim is inaccurate. Either (a) adjust the docs to require providing a key/env var (even for Ollama), or (b) update the engine to allow Ollama descriptions with an empty key when --vlm-provider ollama (or when provider auto-detects to Ollama).

Copilot · 2026-04-02T17:16:02Z

docs/ollama-guide.md

+When you do not provide `--vlm-key` or `--llm-key`, parsemux auto-detects the VLM provider as Ollama.
+
+```bash
+parsemux parse doc.pdf --extract-images --describe-images
+```
+
+This flow:
+
+- extracts images from the document
+- sends them to the local Ollama server at `http://localhost:11434`
+- writes image descriptions back into the parse result
+
+You can also set the provider explicitly:
+
+```bash
+parsemux parse doc.pdf --extract-images --describe-images --vlm-provider ollama
+```


This section says that omitting --vlm-key/--llm-key will auto-detect Ollama and run image description, but the current engine only runs the VLM step when a non-empty key is present (if vlm_key: in src/parsemux/core/engine.py). As written, both example commands will parse and extract images but will not describe them unless a key/env var is provided. Please either update these instructions to include the required key/env var (or explicitly document the current limitation), or update the engine so Ollama can run without a key when selected.

kiyeonjeon21

Great guide! One factual correction needed:

When no VLM key is provided, parsemux falls back to Ollama automatically.

This isn't accurate with the current code. When describe_images=True but no key is provided (no --vlm-key, no PARSEMUX_VLM_API_KEY in .env), the VLM step is skipped entirely — it doesn't auto-fallback to Ollama.

The user must explicitly either:

Have Ollama running and use --vlm-provider ollama
Or have no key at all, in which case the auto-detect logic in vlm.py sends keyless requests to Ollama as the default

Actually looking at the code again — _detect_provider() does fall back to VLMProvider.OLLAMA when no key prefix matches. But the engine only calls VLM when vlm_key is truthy. So with an empty key, VLM is skipped.

Fix: Change the README/guide wording to:

To use Ollama, explicitly pass the provider:
parsemux parse doc.pdf --extract-images --describe-images --vlm-provider ollama

Or note that users can set PARSEMUX_VLM_API_KEY=ollama (any non-empty string) to trigger the Ollama path.

Once updated, I'll approve and merge. Thanks!

docs: add Ollama local VLM guide

5ad946b

Signed-off-by: abhijeet nardele <234410808+abhijeetnardele24-hash@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 2, 2026 17:13

Copilot started reviewing on behalf of abhijeetnardele24-hash April 2, 2026 17:13 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

kiyeonjeon21 requested changes Apr 2, 2026

View reviewed changes

abhijeetnardele24-hash closed this by deleting the head repository Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Ollama local VLM guide#16

docs: add Ollama local VLM guide#16
abhijeetnardele24-hash wants to merge 1 commit intovericontext:mainfrom
abhijeetnardele24-hash:add-ollama-guide

abhijeetnardele24-hash commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

kiyeonjeon21 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abhijeetnardele24-hash commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

kiyeonjeon21 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants