fix(llm): forward num_ctx to Ollama (local-model context truncation) by lfnothias · Pull Request #22 · HolobiomicsLab/Perspicacite-AI

lfnothias · 2026-06-19T16:46:50Z

Problem

Local Ollama models (e.g. Mistral) silently fail to produce output on long RAG queries. Root cause: Ollama defaults num_ctx to 2048 tokens, and Perspicacité never sets it. RAG synthesis prompts are assembled up to context.max_tokens (default 8000) — far past 2048 — so the local model only sees a truncated tail and returns empty/garbage. (Reported by a user whose Ollama-Mistral synthesis stage produced nothing on multi-chunk answers.)

Fix

New LLMConfig.ollama_num_ctx (default 8192, documented in config.example.yml).
AsyncLLMClient._provider_extra_params(provider) returns {"num_ctx": ...} for the ollama provider only, merged into the LiteLLM completion call (both non-streaming and streaming paths). No-op for every other provider.
Larger num_ctx = more RAM; users can tune it down.

Why this is the right layer

LiteLLM forwards num_ctx to Ollama's options.num_ctx. Setting it from config means local users get a usable window out of the box instead of the silent 2048 default. It does not change behaviour for API providers.

Test Plan

tests/unit/test_ollama_num_ctx.py — config default/override, num_ctx forwarded for ollama, empty for openai/anthropic/deepseek/minimax (5 passed, hermetic)
Reviewer: live Ollama synthesis on a long query no longer returns empty

Branched off latest main (#18). Unrelated to the docling PR (#12).

🤖 Generated with Claude Code

…truncated Ollama defaults num_ctx to 2048; Perspicacite never set it, so RAG synthesis prompts (assembled up to ~context.max_tokens) overflowed the local window and Mistral/Llama produced empty output. Add LLMConfig.ollama_num_ctx (default 8192) and forward it via the LiteLLM completion call for the ollama provider only (no-op for other providers). Hermetic test + config.example.yml doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lfnothias · 2026-06-19T16:51:11Z

Companion to #12 (docling tables/figures extraction). Together they cover the user report: #12 gets structured content out of PDFs; this PR (#22) makes the local Ollama model actually synthesize over long contexts. The two are independent and can merge in either order.

lfnothias mentioned this pull request Jun 19, 2026

feat(pdf): docling tables/figures extraction — advanced opt-in (R2) #12

Merged

7 tasks

lfnothias merged commit 002161e into main Jun 19, 2026
1 of 2 checks passed

lfnothias mentioned this pull request Jun 19, 2026

Chore/harden perspicacite #11

Closed

8 tasks

lfnothias deleted the fix/ollama-num-ctx branch June 19, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): forward num_ctx to Ollama (local-model context truncation)#22

fix(llm): forward num_ctx to Ollama (local-model context truncation)#22
lfnothias merged 1 commit into
mainfrom
fix/ollama-num-ctx

lfnothias commented Jun 19, 2026

Uh oh!

lfnothias commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lfnothias commented Jun 19, 2026

Problem

Fix

Why this is the right layer

Test Plan

Uh oh!

lfnothias commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant