Skip to content

fix(llm): forward num_ctx to Ollama (local-model context truncation)#22

Merged
lfnothias merged 1 commit into
mainfrom
fix/ollama-num-ctx
Jun 19, 2026
Merged

fix(llm): forward num_ctx to Ollama (local-model context truncation)#22
lfnothias merged 1 commit into
mainfrom
fix/ollama-num-ctx

Conversation

@lfnothias

Copy link
Copy Markdown
Collaborator

Problem

Local Ollama models (e.g. Mistral) silently fail to produce output on long RAG queries. Root cause: Ollama defaults num_ctx to 2048 tokens, and Perspicacité never sets it. RAG synthesis prompts are assembled up to context.max_tokens (default 8000) — far past 2048 — so the local model only sees a truncated tail and returns empty/garbage. (Reported by a user whose Ollama-Mistral synthesis stage produced nothing on multi-chunk answers.)

Fix

  • New LLMConfig.ollama_num_ctx (default 8192, documented in config.example.yml).
  • AsyncLLMClient._provider_extra_params(provider) returns {"num_ctx": ...} for the ollama provider only, merged into the LiteLLM completion call (both non-streaming and streaming paths). No-op for every other provider.
  • Larger num_ctx = more RAM; users can tune it down.

Why this is the right layer

LiteLLM forwards num_ctx to Ollama's options.num_ctx. Setting it from config means local users get a usable window out of the box instead of the silent 2048 default. It does not change behaviour for API providers.

Test Plan

  • tests/unit/test_ollama_num_ctx.py — config default/override, num_ctx forwarded for ollama, empty for openai/anthropic/deepseek/minimax (5 passed, hermetic)
  • Reviewer: live Ollama synthesis on a long query no longer returns empty

Branched off latest main (#18). Unrelated to the docling PR (#12).

🤖 Generated with Claude Code

…truncated

Ollama defaults num_ctx to 2048; Perspicacite never set it, so RAG synthesis
prompts (assembled up to ~context.max_tokens) overflowed the local window and
Mistral/Llama produced empty output. Add LLMConfig.ollama_num_ctx (default
8192) and forward it via the LiteLLM completion call for the ollama provider
only (no-op for other providers). Hermetic test + config.example.yml doc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@lfnothias

Copy link
Copy Markdown
Collaborator Author

Companion to #12 (docling tables/figures extraction). Together they cover the user report: #12 gets structured content out of PDFs; this PR (#22) makes the local Ollama model actually synthesize over long contexts. The two are independent and can merge in either order.

@lfnothias lfnothias merged commit 002161e into main Jun 19, 2026
1 of 2 checks passed
@lfnothias lfnothias mentioned this pull request Jun 19, 2026
8 tasks
@lfnothias lfnothias deleted the fix/ollama-num-ctx branch June 19, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant