Skip to content

Support Ollama as a native LLM provider #232

@cl0ckt0wer

Description

@cl0ckt0wer

Summary

Ollama is the most popular self-hosted LLM runtime (30K+ GitHub stars, Windows/Linux/macOS). Users with powerful local GPUs (e.g. RX 7900 XTX, RTX 4090) want to run everything locally — embeddings AND the LLM for compression/summarization. Currently there's no path to use Ollama as a compression/summarization provider without running a separate translation proxy (LiteLLM).

Current state

  • Embeddings already work with Ollama via EMBEDDING_PROVIDER=openai + OPENAI_BASE_URL — Ollama speaks the OpenAI embeddings API natively
  • LLM compression/summarization requires Anthropic protocol — Ollama does not speak this
  • ANTHROPIC_BASE_URL exists and supports proxies, but adding LiteLLM as another container adds ~200MB and operational complexity
  • No Ollama issues have been filed — this is a gap

Proposed solution (simplest path)

Add an Ollama provider directly to src/config.ts that uses raw fetch() (no SDK dependency) to call POST http://host:11434/api/chat with an Ollama-compatible payload. This is exactly what MiniMax already does — it uses raw fetch to avoid SDK stainless headers.

OLLAMA_BASE_URL=http://host.docker.internal:11434   # default: http://localhost:11434
OLLAMA_MODEL=qwen3:14b                                # default: llama3.2

Provider detection in detectProvider():

  • Check OLLAMA_BASE_URL or presence of Ollama on localhost
  • Return provider: "ollama" with raw-fetch-based chat completion

Why this is the right approach

  1. Same pattern as MiniMax — MiniMax already uses provider: "minimax" with raw fetch for Anthropic-compatible APIs. Ollama would do the same.
  2. Zero new dependencies — just fetch() calls to the Ollama /api/chat endpoint
  3. Completes the self-hosted story — users can already self-host with EMBEDDING_PROVIDER=openai → Ollama. Adding the LLM provider closes the loop.
  4. No proxy needed — eliminates the LiteLLM requirement entirely

Environment

  • OS: Windows 11
  • Hardware: Ryzen 9 5950X, RX 7900 XTX (24GB)
  • agentmemory: v0.9.4 (Docker, iii-engine 0.11.6)
  • Ollama: v0.23.1 with nomic-embed-text (embeddings working)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions