quiet-node · quiet-node · Jun 8, 2026 · Jun 8, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -14,10 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Changed
 
 - **BREAKING**: Renamed `[debug] search_trace_enabled` to `trace_enabled` (now covers both chat and search). Rename the field in your `config.toml` after upgrading. Trace file layout also changed to `traces/{chat,search}/<conversation_id>.jsonl`.
-- **Inference providers.** Thuki now reaches models through a typed provider list instead of a single hardcoded Ollama endpoint. The `[inference]` section gains `active_provider` and a `[[inference.providers]]` array (Built-in + Ollama in this release); each provider keeps its own selected model. Existing Ollama users are migrated automatically: a legacy flat `ollama_url` becomes the Ollama provider's `base_url`, and the previously selected model is carried over, so nothing changes for them. The Ollama provider is reached over its native API exactly as before; the Built-in (Thuki) engine is reserved for an upcoming version. Settings gains a Providers section (editable Ollama URL with a non-local-server warning, per-provider model picker).
-- The internal inference command/hook/error model were renamed to be engine-agnostic: `ask_ollama` → `ask_model`, the `useOllama` hook → `useModel`, and `OllamaError`/`OllamaErrorKind` → `EngineError`/`EngineErrorKind` (the `NotRunning` variant is now `EngineUnreachable`). External callers that invoked `ask_ollama` directly must update to `ask_model`.
-- The `ask_model`, `search_pipeline`, and `capture_full_screen_command` Tauri commands now require a `conversationId: String` argument (and `ask_model` additionally requires `isFirstTurn: bool` and `slashCommand: Option<String>`). The frontend's `useModel` hook generates a stable trace id per session and threads it transparently. External callers that invoked these commands directly must update their `invoke()` calls. A new fire-and-forget `record_conversation_end` command lets the frontend signal end-of-conversation (used by `useModel.reset()` and `useModel.loadMessages()`) so the chat-domain trace file gets a clean closing line.
-- **BREAKING**: Renamed the `[model]` section in `config.toml` to `[inference]` and reshaped it from a single `ollama_url` string into the providers schema described above. There is no backward-compatibility shim for the section name: if you had a custom `[model]` section, rename it to `[inference]` after upgrading; a flat `ollama_url` inside `[inference]` is migrated automatically.
+- The `ask_ollama`, `search_pipeline`, and `capture_full_screen_command` Tauri commands now require a `conversationId: String` argument (and `ask_ollama` additionally requires `isFirstTurn: bool` and `slashCommand: Option<String>`). The frontend's `useOllama` hook generates a stable trace id per session and threads it transparently. External callers that invoked these commands directly must update their `invoke()` calls. A new fire-and-forget `record_conversation_end` command lets the frontend signal end-of-conversation (used by `useOllama.reset()` and `useOllama.loadMessages()`) so the chat-domain trace file gets a clean closing line.
+- **BREAKING**: Renamed the `[model]` section in `config.toml` to `[inference]`. The section still contains a single field, `ollama_url`, but the name now reflects what it actually configures (the inference daemon endpoint, not a model). There is no backward-compatibility shim: if you had a custom `[model]` section, rename it to `[inference]` after upgrading.
 - Active model selection is now strictly Option-typed end to end. Ollama's `/api/tags` is the single source of truth: when nothing is installed and nothing is persisted, Thuki refuses to dispatch requests and surfaces a "Pick a model" prompt instead of falling back to a hardcoded slug. The previous `DEFAULT_MODEL_NAME` constant has been removed.
 
 ## [0.14.1](https://github.com/quiet-node/thuki/compare/v0.14.0...v0.14.1) (2026-06-07)

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -53,7 +53,7 @@ Thuki is a macOS-only desktop app, a floating AI secretary activated by double-t
 The UI morphs between two states: a compact spotlight-style input bar → an expanded chat window. This morphing is driven by Framer Motion and a single `isChatMode` boolean in `App.tsx`.
 
 - **`App.tsx`** — orchestrates all state: messages, streaming, window resizing via ResizeObserver + Tauri `setSize()`
-- **`hooks/useModel.ts`** — Tauri Channel-based streaming hook (`useModel`); emits `Token`, `Done`, `Cancelled`, `Error` variants
+- **`hooks/useOllama.ts`** — Tauri Channel-based streaming hook; emits `Token`, `Done`, `Cancelled`, `Error` variants
 - **`view/ConversationView.tsx`** — smart auto-scroll (pins to bottom unless user scrolls up)
 - **`view/AskBarView.tsx`** — auto-expanding textarea (max 144px), morphs logo size, renders slash command tab-completion suggestions
 - **`components/ChatBubble.tsx`** — markdown rendering via Streamdown (rehype-sanitize for XSS protection)
@@ -67,8 +67,8 @@ User-facing reference for all commands lives in `docs/commands.md`. **Any new sl
 ### Backend (`src-tauri/src/`)
 
 - **`lib.rs`** — app setup: loads `AppConfig` via `config::load`, converts window to NSPanel (fullscreen overlay), registers tray, spawns hotkey listener, intercepts close events (hides instead of quits)
-- **`config/`** — typed TOML-backed application configuration. Loaded once at startup from `~/Library/Application Support/com.quietnode.thuki/config.toml` (seeded with defaults on first run), installed as Tauri managed state, exposed to the frontend via the `get_config` command. Every subsystem that needs model, prompt, window, activation, or quote values reads from `State<AppConfig>`. The `[inference]` section holds the typed providers list (`active_provider` + `[[inference.providers]]`, each `{id, kind, label, base_url, model}`); the loader migrates a legacy flat `ollama_url` onto a synthesized Ollama provider and `config/migrate.rs` folds the legacy SQLite `active_model` onto it at startup. See `docs/configurations.md` for the user-facing schema.
-- **`commands.rs`** — `ask_model` Tauri command: routes by the active provider's kind (Phase 1 implements Ollama's native `/api/chat` only; a non-Ollama active provider returns a typed `EngineError`), streams newline-delimited JSON, and sends chunks via Tauri Channel. Reads the active provider (base URL + selected model) from `State<RwLock<AppConfig>>`, the resolved system prompt, and the in-memory `ActiveModelState`.
+- **`config/`** — typed TOML-backed application configuration. Loaded once at startup from `~/Library/Application Support/com.quietnode.thuki/config.toml` (seeded with defaults on first run), installed as Tauri managed state, exposed to the frontend via the `get_config` command. Every subsystem that needs model, prompt, window, activation, or quote values reads from `State<AppConfig>`. See `docs/configurations.md` for the user-facing schema.
+- **`commands.rs`** — `ask_ollama` Tauri command: streams newline-delimited JSON from Ollama, sends chunks via Tauri Channel. Reads the active model, resolved system prompt, and Ollama URL from `State<AppConfig>`.
 - **`screenshot.rs`** — `capture_full_screen_command` Tauri command: uses CoreGraphics FFI (`CGWindowListCreateImage`) to capture all displays excluding Thuki's own windows, writes a JPEG to a temp dir, and returns the path
 - **`activator.rs`** — Core Graphics event tap watching for double-tap Control key (400 ms window, 600 ms cooldown; timing is a compiled constant, not yet exposed through `AppConfig` because the event-tap callback runs in a thread that cannot trivially read Tauri managed state). The tap MUST use `CGEventTapLocation::HID` and `CGEventTapOptions::Default` — see the critical constraint note in "Key Design Constraints" below.
 

diff --git a/docs/configurations.md b/docs/configurations.md
@@ -27,36 +27,19 @@ open ~/Library/Application\ Support/com.quietnode.thuki/config.toml
 
 ```toml
 [inference]
-# The provider Thuki sends inference to. Phase 1 ships the Ollama provider;
-# the Built-in (Thuki) engine arrives in a later version.
-active_provider = "ollama"
-# Context window size in tokens sent to the active provider with every request.
-# Warmup and chat share this value so Ollama reuses the same runner and its
-# cached KV prefix for the system prompt. Raise to fit longer conversations;
-# lower to reduce GPU memory use. Valid range: 2048-1048576.
-num_ctx = 16384
+# Where Thuki finds your local Ollama server. The active model itself is
+# selected from the in-app picker (which lists whatever is installed in
+# Ollama via /api/tags) and is stored in Thuki's local database, not here.
+ollama_url = "http://127.0.0.1:11434"
 # Minutes of inactivity before Thuki tells Ollama to release the model.
 # 0 = let Ollama manage (its own 5-minute default applies).
-# -1 = never release. Applies to the Ollama provider only.
+# -1 = never release (keep loaded until Ollama itself exits or you unload manually).
 keep_warm_inactivity_minutes = 0
-
-# One block per provider. The built-in entry is always present. A provider's
-# selected model lives on its own `model` field (empty until you pick one in
-# the model picker).
-[[inference.providers]]
-id = "builtin"
-kind = "builtin"
-label = "Built-in (Thuki)"
-model = ""
-
-[[inference.providers]]
-id = "ollama"
-kind = "ollama"
-label = "Ollama"
-# Where Thuki reaches your Ollama server. Defaults to this Mac; point it at
-# another machine to use Ollama running elsewhere (one server at a time).
-base_url = "http://127.0.0.1:11434"
-model = ""
+# Context window size in tokens sent to Ollama with every request.
+# Warmup and chat share this value so Ollama reuses the same runner and its
+# cached KV prefix for the system prompt. Raise to fit longer conversations;
+# lower to reduce GPU memory use. Valid range: 2048–1048576.
+num_ctx = 16384
 
 [prompt]
 # The full secretary persona prompt. Seeded on first run so this file is the
@@ -132,27 +115,15 @@ Every domain below is shown as a single table that lists **all** constants Thuki
 
 ### `[inference]`
 
-Thuki reaches a model through a **provider**. `active_provider` names which one is used; each provider is described by a `[[inference.providers]]` block. Phase 1 ships two providers: **Ollama** (reached over HTTP at a configurable URL, local or remote) and a **Built-in (Thuki)** entry reserved for an upcoming bundled engine. A fresh install defaults to the Ollama provider.
-
-Each provider keeps its own selected `model`. Thuki discovers installed models live from Ollama's `/api/tags` endpoint and lets you pick one from the in-app model picker (or the Providers section of Settings); the choice is written to that provider's `model` field. When no model is installed and none has been chosen, Thuki refuses to dispatch a chat request and surfaces a "Pick a model" prompt. Pull a model with `ollama pull <slug>` and select it.
-
-Upgrading from an older version is automatic: a pre-providers config with a flat `ollama_url` is migrated to an Ollama provider seeded with that URL, and the previously selected model (kept in SQLite) is moved onto it, so existing Ollama users are unaffected.
-
-| Constant          | Default    | Tunable? | Bounds              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| :---------------- | :--------- | :------- | :------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `active_provider` | `"ollama"` | Yes      | id of a provider    | Which provider receives inference. Must match the `id` of one of the `[[inference.providers]]` entries; an empty or dangling value resets to `ollama`. Phase 1: leave this on `ollama` (the Built-in engine is not available yet).                                                                                                                                                                                                                                                                                              |
-| `num_ctx`         | `16384`    | Yes      | `[2048, 1048576]`   | Context window size in tokens sent to the active provider with every request. Warmup and chat share this value so Ollama reuses the same runner instance and its cached KV prefix for the system prompt: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum. Raise to fit longer conversations: each doubling roughly doubles VRAM for the KV cache; lower to reclaim GPU memory. See [Tuning the Context Window](./tuning-context-window.md). |
-| `keep_warm_inactivity_minutes` | `0` | Yes | `-1` or `[0, 1440]` | Minutes of inactivity before Thuki tells Ollama to release the model from VRAM. Applies to the Ollama provider only. `0` means do not manage: Ollama's own 5-minute default applies. `-1` means never release. Raise for longer sessions between uses; lower to reclaim VRAM sooner.                                                                                                                                                                                                                                            |
+Where to find your local Ollama server. The active model itself is **not** a TOML setting: Thuki discovers installed models live from Ollama's `/api/tags` endpoint, lets you pick one from the in-app model picker, and stores that selection in its local SQLite database (`app_config` table). Storing the active slug in TOML would duplicate ground truth from Ollama and break the moment you remove a model with `ollama rm`, so it lives next to the conversation history instead.
 
-Each `[[inference.providers]]` block has these fields:
+When no model is installed and no choice has been persisted, Thuki refuses to dispatch a chat request and surfaces a "Pick a model" prompt in the input area. Pull a model with `ollama pull <slug>` and select it from the picker chip in the top-right of the overlay.
 
-| Field      | Description                                                                                                                                                  |
-| :--------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `id`       | Stable identifier referenced by `active_provider`. The `builtin` and `ollama` ids are seeded automatically.                                                  |
-| `kind`     | `builtin` or `ollama`. Any other kind is dropped on load. Determines how Thuki talks to the provider (the Ollama kind uses Ollama's native API).             |
-| `label`    | Human-readable name shown in Settings.                                                                                                                       |
-| `base_url` | For the Ollama kind: where Thuki reaches the server (defaults to `http://127.0.0.1:11434`; point it at another machine to use remote Ollama). Empty for the built-in kind. A provider of kind `ollama` with an empty `base_url` is dropped and re-seeded at the localhost default. |
-| `model`    | The model selected for this provider, written when you pick one. Empty means "none chosen yet".                                                              |
+| Constant     | Default                    | Tunable? | Why not tunable | Bounds        | Description                                                                                                                                                                                                          |
+| :----------- | :------------------------- | :------- | :-------------- | :------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ollama_url` | `"http://127.0.0.1:11434"` | Yes      | —               | non-empty URL | The web address where Thuki finds your local Ollama server. The default works if you run Ollama on this machine with its standard port. Change this only if you moved Ollama to a different port or another machine. |
+| `keep_warm_inactivity_minutes` | `0` | Yes | — | `-1` or `[0, 1440]` | Minutes of inactivity before Thuki tells Ollama to release the model from VRAM. `0` means do not manage: Ollama's own 5-minute default applies. `-1` means never release (stays until Ollama exits or you unload manually). Raise for longer sessions between uses; lower to reclaim VRAM sooner. |
+| `num_ctx` | `16384` | Yes | — | `[2048, 1048576]` | Context window size in tokens sent to Ollama with every request. Warmup and chat share this value so Ollama reuses the same runner instance and its cached KV prefix for the system prompt: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum, so values above the model's capacity are accepted but have no extra effect. Raise to fit longer conversations without the model forgetting early messages: each doubling roughly doubles VRAM for the KV cache; lower to reclaim GPU memory at the cost of a shorter effective history. 16384 is the default because it comfortably holds the full system prompt (~4000 tokens) plus many turns while staying within 8 GB GPU budgets. See [Tuning the Context Window](./tuning-context-window.md) for a 5-minute benchmark recipe to find the right value for your hardware. |
 
 If the active model has been removed from Ollama between launches, Thuki silently falls back to the first installed model the next time you open the picker. If no models are installed at all, the next request surfaces a "Model not found" error with the exact `ollama pull <name>` command to run.