Support Ollama as a native LLM provider

## Summary

Ollama is the most popular self-hosted LLM runtime (30K+ GitHub stars, Windows/Linux/macOS). Users with powerful local GPUs (e.g. RX 7900 XTX, RTX 4090) want to run everything locally — embeddings AND the LLM for compression/summarization. Currently there's no path to use Ollama as a compression/summarization provider without running a separate translation proxy (LiteLLM).

## Current state

- Embeddings already work with Ollama via `EMBEDDING_PROVIDER=openai` + `OPENAI_BASE_URL` — Ollama speaks the OpenAI embeddings API natively
- LLM compression/summarization requires Anthropic protocol — Ollama does not speak this
- `ANTHROPIC_BASE_URL` exists and supports proxies, but adding LiteLLM as another container adds ~200MB and operational complexity
- No Ollama issues have been filed — this is a gap

## Proposed solution (simplest path)

Add an Ollama provider directly to `src/config.ts` that uses raw `fetch()` (no SDK dependency) to call `POST http://host:11434/api/chat` with an Ollama-compatible payload. This is exactly what MiniMax already does — it uses raw fetch to avoid SDK stainless headers.

```env
OLLAMA_BASE_URL=http://host.docker.internal:11434   # default: http://localhost:11434
OLLAMA_MODEL=qwen3:14b                                # default: llama3.2
```

Provider detection in `detectProvider()`:
- Check `OLLAMA_BASE_URL` or presence of Ollama on localhost
- Return `provider: "ollama"` with raw-fetch-based chat completion

## Why this is the right approach

1. **Same pattern as MiniMax** — MiniMax already uses `provider: "minimax"` with raw fetch for Anthropic-compatible APIs. Ollama would do the same.
2. **Zero new dependencies** — just `fetch()` calls to the Ollama `/api/chat` endpoint
3. **Completes the self-hosted story** — users can already self-host with `EMBEDDING_PROVIDER=openai` → Ollama. Adding the LLM provider closes the loop.
4. **No proxy needed** — eliminates the LiteLLM requirement entirely

## Environment

- OS: Windows 11
- Hardware: Ryzen 9 5950X, RX 7900 XTX (24GB)
- agentmemory: v0.9.4 (Docker, iii-engine 0.11.6)
- Ollama: v0.23.1 with `nomic-embed-text` (embeddings working)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Ollama as a native LLM provider #232

Summary

Current state

Proposed solution (simplest path)

Why this is the right approach

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support Ollama as a native LLM provider #232

Description

Summary

Current state

Proposed solution (simplest path)

Why this is the right approach

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions