Skip to content

docs: add Ollama local VLM guide#16

Closed
abhijeetnardele24-hash wants to merge 1 commit intovericontext:mainfrom
abhijeetnardele24-hash:add-ollama-guide
Closed

docs: add Ollama local VLM guide#16
abhijeetnardele24-hash wants to merge 1 commit intovericontext:mainfrom
abhijeetnardele24-hash:add-ollama-guide

Conversation

@abhijeetnardele24-hash
Copy link
Copy Markdown
Contributor

Adds a dedicated Ollama guide and README documentation for free local image description with qwen2.5vl:7b.

Closes #9

Signed-off-by: abhijeet nardele <234410808+abhijeetnardele24-hash@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 2, 2026 17:13
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 2, 2026

@abhijeetnardele24-hash is attempting to deploy a commit to the Kiyeon Jeon's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for using Ollama as a local Vision-Language Model (VLM) provider in parsemux, aiming to enable “free local image description” with qwen2.5vl:7b and to close issue #9.

Changes:

  • Adds an “Ollama local VLM” section to the main README with a quick-start command.
  • Introduces a new dedicated setup/performance guide at docs/ollama-guide.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
README.md Adds a short Ollama local VLM quick-start section and links to the full guide.
docs/ollama-guide.md New guide describing Ollama installation, usage, and performance expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +65
```bash
ollama pull qwen2.5vl:7b
parsemux parse doc.pdf --extract-images --describe-images
```

When no VLM key is provided, parsemux falls back to Ollama automatically.
See [docs/ollama-guide.md](docs/ollama-guide.md) for setup details and performance notes.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example command here (parsemux parse ... --describe-images with no --vlm-key/--llm-key) won’t currently trigger image description: in src/parsemux/core/engine.py the VLM step is gated behind if vlm_key: (derived from request.vlm_api_key / request.llm_api_key / PARSEMUX_VLM_API_KEY). With no key provided, the engine skips VLM entirely, so this section’s “falls back to Ollama automatically” claim is inaccurate. Either (a) adjust the docs to require providing a key/env var (even for Ollama), or (b) update the engine to allow Ollama descriptions with an empty key when --vlm-provider ollama (or when provider auto-detects to Ollama).

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +35
When you do not provide `--vlm-key` or `--llm-key`, parsemux auto-detects the VLM provider as Ollama.

```bash
parsemux parse doc.pdf --extract-images --describe-images
```

This flow:

- extracts images from the document
- sends them to the local Ollama server at `http://localhost:11434`
- writes image descriptions back into the parse result

You can also set the provider explicitly:

```bash
parsemux parse doc.pdf --extract-images --describe-images --vlm-provider ollama
```
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says that omitting --vlm-key/--llm-key will auto-detect Ollama and run image description, but the current engine only runs the VLM step when a non-empty key is present (if vlm_key: in src/parsemux/core/engine.py). As written, both example commands will parse and extract images but will not describe them unless a key/env var is provided. Please either update these instructions to include the required key/env var (or explicitly document the current limitation), or update the engine so Ollama can run without a key when selected.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@kiyeonjeon21 kiyeonjeon21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great guide! One factual correction needed:

When no VLM key is provided, parsemux falls back to Ollama automatically.

This isn't accurate with the current code. When describe_images=True but no key is provided (no --vlm-key, no PARSEMUX_VLM_API_KEY in .env), the VLM step is skipped entirely — it doesn't auto-fallback to Ollama.

The user must explicitly either:

  • Have Ollama running and use --vlm-provider ollama
  • Or have no key at all, in which case the auto-detect logic in vlm.py sends keyless requests to Ollama as the default

Actually looking at the code again — _detect_provider() does fall back to VLMProvider.OLLAMA when no key prefix matches. But the engine only calls VLM when vlm_key is truthy. So with an empty key, VLM is skipped.

Fix: Change the README/guide wording to:

To use Ollama, explicitly pass the provider:
parsemux parse doc.pdf --extract-images --describe-images --vlm-provider ollama

Or note that users can set PARSEMUX_VLM_API_KEY=ollama (any non-empty string) to trigger the Ollama path.

Once updated, I'll approve and merge. Thanks!

@abhijeetnardele24-hash abhijeetnardele24-hash closed this by deleting the head repository Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Ollama local VLM testing guide

3 participants