diff --git a/.env.example b/.env.example index 01d2785..6780d0f 100644 --- a/.env.example +++ b/.env.example @@ -51,12 +51,48 @@ LOCAL_MAX_TOOL_ITERATIONS=8 SOLRAC_INTEGRATIONS_ENABLED=true SOLRAC_INTEGRATIONS_DIR=./integrations +# ── Remote engine (OpenRouter) ────────────────────────────────────────────── +# Alternative to LOCAL_ENABLED for hosts that can't run an on-host LLM but +# want a non-Claude default engine. The "local" engine slot dispatches to +# OpenRouter instead of Ollama/LMStudio; runtime UX is identical (no-prefix +# routing, /clear local semantics, capability note) but per-token cost is +# captured into audit.cost_usd so the existing HOURLY_COST_CAP_USD + +# GLOBAL_HOURLY_COST_CAP_USD ceilings gate remote burn automatically. +# +# MUTUALLY EXCLUSIVE with LOCAL_ENABLED — boot rejects both true. Uncomment +# the block below AND set LOCAL_ENABLED=false above to switch. +# +# REMOTE_MODEL is an OpenRouter slug (/). Browse the catalog +# at https://openrouter.ai/models. Examples: +# openai/gpt-4o-mini → cheap chat, ~$0.15/1M input tokens +# anthropic/claude-3.5-sonnet → parity with the `@` tier via OR +# meta-llama/llama-3.3-70b-instruct → open-weight 70B +# +# REMOTE_API_KEY is your OpenRouter key (typically sk-or-…). Get one at +# https://openrouter.ai/keys. The key is scrubbed from the Claude SDK +# subprocess env (agent.ts::sanitizedSubprocessEnv strips REMOTE_*) so a +# compromised model can't exfiltrate it via an auto-allowed Bash command. +# It is NEVER logged or written to SQLite — held only in the frozen Config +# object and the OpenRouter driver's closure. +# +# REMOTE_ENABLED=true +# REMOTE_BACKEND=openrouter +# REMOTE_MODEL=openai/gpt-4o-mini +# REMOTE_API_KEY=sk-or-REPLACE_ME +# REMOTE_BASE_URL=https://openrouter.ai/api/v1 # default; override for proxies +# REMOTE_TIMEOUT_MS=60000 # bumps to 120000 when tools-on +# REMOTE_HISTORY_LIMIT=6 # last N turns reconstructed +# REMOTE_MAX_TOOL_ITERATIONS=8 # runaway-loop backstop +# REMOTE_HTTP_REFERER=https://github.com/cjus/solrac # OpenRouter attribution +# REMOTE_X_TITLE=solrac # OpenRouter attribution + # ── Claude-only deploy alternative ────────────────────────────────────────── # Uncomment this block (and comment out the local-engine section above) for -# hosts that can't run a local model. No-prefix messages then route to Sonnet. -# `@`/`!` prefixes still work as before. +# hosts that can't run a local model AND don't want OpenRouter. No-prefix +# messages then route to Sonnet. `@`/`!` prefixes still work as before. # SOLRAC_DEFAULT_ENGINE=primary # LOCAL_ENABLED=false +# REMOTE_ENABLED=false # LOCAL_TOOLS_ENABLED=false # SOLRAC_INTEGRATIONS_ENABLED=true # still useful for Claude tiers diff --git a/CHANGELOG.md b/CHANGELOG.md index 48862c1..75866a0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,114 @@ # Changelog +## Unreleased — OpenRouter as a remote backend for the engine slot + +Adds **OpenRouter** as a third option for the no-prefix engine slot, alongside on-host Ollama and LMStudio. New `REMOTE_ENABLED=true` flag — mutually exclusive with `LOCAL_ENABLED` at boot — points the engine slot at OpenRouter so hosts that can't run a local LLM still get a default-engine option. Per-token cost from OpenRouter's streaming `usage.cost` field is captured and written to `audit.cost_usd`, so the existing per-chat (`HOURLY_COST_CAP_USD`) and global (`GLOBAL_HOURLY_COST_CAP_USD`) hourly caps gate remote burn automatically — no new cost-cap knob needed. Claude tiers (`@`, `!`) and the `local` engine routing are unaffected. + +- **New env vars** (all `REMOTE_*` namespace, provider-neutral for future vLLM/Anyscale/Together/Groq additions): + - `REMOTE_ENABLED` — master switch. Mutually exclusive with `LOCAL_ENABLED` (boot rejects both true). + - `REMOTE_BACKEND` — `openrouter` (only value today; type stays open for future providers). + - `REMOTE_MODEL` — OpenRouter slug, e.g. `anthropic/claude-3.5-sonnet`, `openai/gpt-4o-mini`, `meta-llama/llama-3.3-70b-instruct`. Contains `/` — verified safe across the codebase (no parser splits on `/`). + - `REMOTE_API_KEY` — required when `REMOTE_ENABLED=true`. Scrubbed by `sanitizedSubprocessEnv()` (prefix-match `REMOTE_*`) so the Claude SDK subprocess never sees the OpenRouter credential. + - `REMOTE_BASE_URL` — defaults to `https://openrouter.ai/api/v1`; override for proxies/staging. URL validated at boot. + - `REMOTE_TIMEOUT_MS` (default 60s / 120s with tools), `REMOTE_HISTORY_LIMIT` (default 6), `REMOTE_MAX_TOOL_ITERATIONS` (default 8) — mirror the `LOCAL_*` knobs. + - `REMOTE_HTTP_REFERER` (default `https://github.com/cjus/solrac`) + `REMOTE_X_TITLE` (default `solrac`) — OpenRouter attribution headers. +- **Engine slot reuse, not a new engine.** The internal `Engine = "primary" | "secondary" | "local"` union is unchanged; each `EngineDriver` factory sets a `mode: "local" | "remote"` field (`createOllamaDriver`/`createLmstudioDriver` → `"local"`; `createOpenrouterDriver` → `"remote"`). `runEngineTurn` reads `driver.mode` directly — no parallel `mode` field on the deps. Mode drives three behaviors: audit-tag prefix (`local:` vs `remote:`), capability-note framing ("cost the operator nothing" vs "cost the operator per-token via OpenRouter, so be concise"), and the `cost_usd` write decision. No new dispatch branch — the same routing, same queue, same mutex, same `/clear local` cutoff. +- **Audit tag pattern: `remote:openrouter:`.** Symmetric with `local:ollama:` and `claude:primary:`. The model slug's `/` flows through unmodified; no parser splits on `/` (audited via `grep -RnE "model\.(split|substring|substr|slice)" src/`). +- **Cost capture from the streaming usage chunk.** OpenRouter's trailing usage chunk includes `cost` (USD) alongside `prompt_tokens` / `completion_tokens` — and as of 2026 it's automatic (the historical `usage: { include: true }` / `stream_options: { include_usage: true }` opt-ins are deprecated and have no effect per [OpenRouter docs](https://openrouter.ai/docs/guides/administration/usage-accounting)). `EngineChatEvent.done` carries `costUsd: number | null`; `engine.ts::resolveAuditCost` picks the write: + - `mode=local` → 0 (on-host backends are free; driver costUsd ignored). + - `mode=remote && costUsd != null` → write the real cost. + - `mode=remote && costUsd == null` → write `null` (NOT 0) + log `remote.cost_missing`. Writing 0 would silently bypass the cap query's `COALESCE(SUM(cost_usd), 0)`; null preserves the audit row but excludes it from the cap sum. +- **Tool-loop sums cost across rounds.** `runToolLoop` accumulates per-round `costUsd` (each round is a separate API call on a remote backend with its own billed cost), then `engine.ts::resolveAuditCost` writes the sum to `audit.cost_usd`. The `costUsdSeen` flag distinguishes "every round skipped the field" (null) from "every round was a free local round" (0). +- **Footer cost chip in remote mode.** The engine-slot Telegram footer now appends `· $X.XXXX` when running in remote mode and the driver reported a cost — e.g. `✅ remote:openrouter:openai/gpt-4o-mini · 1.2s · $0.0042`. Local mode is unchanged (no chip; on-host = free). The chip is gated by the same logic as the audit write (`engine.ts::formatFooterCost`) so the UI and `audit.cost_usd` agree: if `remote.cost_missing` fires, the chip is omitted rather than rendered as `$0.0000`. Mirrors the Claude-tier footer's `$X.XXXX` segment so operators get the same cost visibility on both surfaces. +- **Mutual exclusion at boot.** `LOCAL_ENABLED=true && REMOTE_ENABLED=true` throws with an actionable message. `SOLRAC_DEFAULT_ENGINE=local` now requires `LOCAL_ENABLED OR REMOTE_ENABLED`; the error message lists both paths. +- **DB cutoff triple-pattern.** `db.hasLocalTurnsSince` and `db.outOfBandForEngine` LIKE clauses extend to `local:%` OR `ollama:%` (legacy) OR `remote:%`. So `/clear local` correctly wipes the engine-slot cutoff for an OpenRouter-only deploy, and Claude's cross-engine bridge honors the local cutoff for remote turns too (otherwise `/clear local` clears Ollama but Claude still recites freshly-cleared OpenRouter turns out of the bridge). +- **Subprocess env scrub.** `sanitizedSubprocessEnv()` in `agent.ts` adds a `REMOTE_*` prefix exclusion alongside the existing `TELEGRAM_*`, `TG_*`, `LOCAL_*` scrubs. `REMOTE_API_KEY` in particular is a billed credential — exfiltration via `Bash(echo $REMOTE_API_KEY)` would let a compromised model burn operator balance. +- **Web UI + `/help` mode awareness.** `defaultEngineLabel` renders `remote (openrouter)` when remote mode is active. `/help` engine section gets an `engineSlotMode` field that swaps the cost-framing line ("free" → "per-token via OpenRouter") for the no-prefix path. +- **Boot probe.** The engine-slot health probe (`probeEngineHealth`) runs against whichever driver is wired, including the OpenRouter `GET /models` probe with bearer auth. 401 surfaces as `auth_failed` so a bad `REMOTE_API_KEY` is visible at startup, not first-turn. +- **No new runtime deps.** OpenRouter is OpenAI-compatible — no SDK needed. The driver is built on raw `fetch` like the LMStudio driver. No anti-goals reversed. +- **No SDK pin bump.** Claude Agent SDK pin stays at `0.2.119`. +- **Tests.** 10 new driver tests cover OpenRouter probe (auth header, model-present, model-absent, 401, network error), streaming (cost captured from trailing usage, cost-missing falls through as null, slash-bearing slug round-trips, auth + attribution headers on every request, 401 surfaces with REMOTE_API_KEY hint, 404 → model_missing, inline error frame terminates, tool-call SSE deltas accumulate). 13 new config tests cover the REMOTE_* validations (required field set, mutex with LOCAL_ENABLED, default-engine=local needs one mode, base URL parse). 4 new runner tests cover the cost-write matrix (local 0, remote populated, remote-null defensive, mode default back-compat) and capability-note mode framing. 3 new DB tests cover the triple-pattern LIKE extension (remote:% matches hasLocalTurnsSince, remote:% hidden by outOfBandForEngine cutoff). +- **Cleanup debt flagged.** The dual-pattern `local:% OR ollama:%` LIKE clauses (left over from v0.7.0's "removed in a follow-up release once the migration has propagated") become triple-pattern with `remote:%`. The legacy `ollama:%` clause is scheduled for removal in the next minor. + +### Refactor: split `engine.ts` / `local-driver.ts` / `remote-driver.ts` + +The OpenRouter work originally landed on the `local-*` files because the runner is mode-polymorphic — both modes legitimately share the same streaming + tool-loop + audit plumbing. Naming-wise that made the codebase lie ("local" hosting a remote service). This commit follows up with a structural-only refactor: clearer file names, no behavior change, no env-var change, no DB schema change. + +**Files renamed (git blame preserved via `git mv`):** + +| Was | Now | +|---|---| +| `src/local.ts` | `src/engine.ts` | +| `src/local-tools.ts` | `src/engine-tools.ts` | +| `src/local.test.ts` | `src/engine.test.ts` | +| `src/local-tools.test.ts` | `src/engine-tools.test.ts` | + +**Files added:** + +- `src/engine-driver.ts` — shared abstraction owning `EngineBackend`, `EngineDriver`, `EngineChatEvent`, `EngineDriverError`, `DriverOpts`, plus the cross-driver helpers (`stableStringify`, `maybeLogEmptyStream`). +- `src/remote-driver.ts` — OpenRouter driver moved out of `local-driver.ts`; sets `driver.mode = "remote"`. New `buildRemoteCapabilityNote` + `buildRemoteToolCapabilityNote` always frame cost as "per-token via OpenRouter". +- `src/remote-driver.test.ts` — OpenRouter test block extracted (11 tests) from `local-driver.test.ts`. + +**Files updated in place:** + +- `src/local-driver.ts` — Ollama + LMStudio only, both with `driver.mode = "local"`. Capability builders `buildLocalCapabilityNote` / `buildLocalToolCapabilityNote` always frame cost as "free" (no mode parameter). OpenRouter shim removed. +- `src/main.ts`, `src/commands.ts`, `src/skill-tools.ts`, `src/instance.ts`, `test/smokes/local.ts` — caller updates: imports from `./engine.ts` / `./engine-driver.ts` / `./engine-tools.ts`; `LocalSkillDeps` → `EngineSkillDeps`; `runLocalTurn` → `runEngineTurn`; `mcpToLocalTools` → `mcpToEngineTools`; `probeLocalHealth` → `probeEngineHealth`. Variable names that describe the engine *slot* (`localDeps`, `localDriver`, `localSkillDeps`) kept — the slot is still named `local` in routing. + +**Type renames:** + +| Was | Now | +|---|---| +| `LocalBackend` (wire-format union) | `EngineBackend` | +| `LocalChatRole`, `LocalChatMessage`, `LocalToolCallRef`, `LocalToolDef`, `LocalChatEvent`, `LocalProbeResult`, `LocalStreamChatOpts` | `Engine*` equivalents | +| `LocalDriver` | `EngineDriver` (adds `mode: "local" \| "remote"` field) | +| `LocalDriverError` | `EngineDriverError` | +| `LocalRunDeps`, `LocalRunInput` | `EngineRunDeps`, `EngineRunInput` | +| `LocalEngineMode` (discriminator type) | **deleted** — `driver.mode` is now the single source of truth | +| `LocalSkillDeps` | `EngineSkillDeps` | +| `mcpToLocalTools` | `mcpToEngineTools` | +| `LOCAL_DENY_TOOLS` | `ENGINE_DENY_TOOLS` | + +Operator-config-layer types (`config.ts::LocalBackend = "ollama" \| "lmstudio"`, `config.ts::RemoteBackend = "openrouter"`) kept — they describe operator-facing env-var values, distinct from the wire-format `EngineBackend` union. + +**Log event renames** (`local.*` → `engine.*` for runner events; per-backend prefixes for driver events): + +| Was | Now | Source | +|---|---|---| +| `local.ollama_bad_frame` | `ollama.bad_frame` | `local-driver.ts` | +| `local.lmstudio_bad_frame` | `lmstudio.bad_frame` | `local-driver.ts` | +| `local.lmstudio_empty_stream` | `lmstudio.empty_stream` | `local-driver.ts` | +| `local.lmstudio_tool_call_deduped` | `lmstudio.tool_call_deduped` | `local-driver.ts` | +| `local.openrouter_bad_frame` | `openrouter.bad_frame` | `remote-driver.ts` | +| `local.openrouter_empty_stream` | `openrouter.empty_stream` | `remote-driver.ts` | +| `local.openrouter_tool_call_deduped` | `openrouter.tool_call_deduped` | `remote-driver.ts` | +| `local.stub_send_failed` | `engine.stub_send_failed` | `engine.ts` | +| `local.unexpected_tool_call_single_shot` | `engine.unexpected_tool_call_single_shot` | `engine.ts` | +| `local.driver_failed` | `engine.driver_failed` | `engine.ts` | +| `local.unexpected_error` | `engine.unexpected_error` | `engine.ts` | +| `local.edit_throttled` | `engine.edit_throttled` | `engine.ts` | +| `local.edit_final_failed` | `engine.edit_final_failed` | `engine.ts` | +| `local.final_send_failed` | `engine.final_send_failed` | `engine.ts` | +| `local.done` | `engine.done` | `engine.ts` | +| `local.boot` | `engine.boot` | `main.ts` | +| `local.boot_health_ok` | `engine.boot_health_ok` | `main.ts` | +| `local.boot_health_failed` | `engine.boot_health_failed` | `main.ts` | +| `local.boot_health_model_missing` | `engine.boot_health_model_missing` | `main.ts` | +| `local.disabled_ack_failed` | `engine.disabled_ack_failed` | `main.ts` | +| `local.tools_enabled_but_zero_loaded` | `engine.tools_enabled_but_zero_loaded` | `main.ts` | +| `local.tool_loop_start` / `local.tool_loop_done` / `local.tool_loop_failed` | `engine.tool_loop_*` | `engine-tools.ts` | +| `local.tool_loop_detected` / `local.tool_iteration_cap` / `local.cap_finalize_failed` | `engine.tool_*` | `engine-tools.ts` | +| `local.tool_unknown` / `local.tool_auto_allow` / `local.tool_denied_policy` / `local.tool_denied_hard` / `local.tool_hard_denied` / `local.tool_invalid_args` / `local.tool_handler_threw` / `local.tool_call_ok` | `engine.tool_*` | `engine-tools.ts` | +| `local.tool_confirm_request` / `local.tool_confirm_resolved` / `local.tool_confirm_send_failed` / `local.tool_confirm_round_cap` / `local.tool_confirm_skipped_round_cap` | `engine.tool_confirm_*` | `engine-tools.ts` | +| `local.progress_failed` | `engine.progress_failed` | `engine-tools.ts` | + +`remote.cost_missing` keeps its name — it's a remote-only signal, accurately scoped already. + +**Operator impact.** Any `journalctl ... | jq 'select(.event == "local.done")'` queries break against new boot logs. v0.8.0 audit rows already on disk keep the old event names — those don't change. Update grep patterns once. + +**Zero behavior change** — verified by typecheck + the full test suite passing (798 tests across 31 files). Zero env-var change. Zero DB schema change. The audit-tag DB column prefix (`local:` / `remote:`) is intentionally NOT renamed — that would require an SQL migration of `audit.model`; deferred. + +**Anti-goals.** No new runtime deps. No new HTTP framework. No SDK pin bump. No sub-agents enabled. + ## v0.7.1 — weak-local-model hardening + docs Four post-v0.7.0 fixes that together make LMStudio + small open-weight models (gpt-oss-20b class) usable on long-running chats. No breaking changes, no new env vars, no SDK pin bump. diff --git a/README.md b/README.md index e15de6f..24d8db6 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@ # Solrac -> A self-hosted, hackable personal Agent: free local LLM (Ollama or LMStudio) by default, Claude Sonnet/Opus on demand via Anthropic's Claude Agent SDK. Reach it from Telegram or a browser; own every audit row, permission rule, and budget cap. +> A self-hosted, hackable personal Agent: free local LLM (Ollama or LMStudio) or remote LLM (OpenRouter) by default, with explicit escalation to Anthropic's Claude Sonnet/Opus via the Claude Agent SDK. Reach it from Telegram or a browser; own every audit row, permission rule, and budget cap. ## Why Solrac -Solrac is a single-process agent that bridges Telegram and a browser UI to a local model served by Ollama or LMStudio (operator picks via `LOCAL_BACKEND`), escalating to Claude Sonnet or Opus only when you explicitly ask. It was built as part of [PNXStudios.com](https://pnxstudios.com) to manage that complex monorepo from anywhere — and, in doing so, to explore the mechanics of building a personal agent from first principles while enforcing hard cost controls and behavior auditing on every turn. +Solrac is a single-process agent that bridges Telegram and a browser UI to a **bring-your-own-model engine slot** — local (Ollama / LMStudio) or remote (OpenRouter) — escalating to Claude Sonnet (`@`) or Opus (`!`) only when you explicitly ask. It was built as part of [PNXStudios.com](https://pnxstudios.com) to manage that complex monorepo from anywhere — and, in doing so, to explore the mechanics of building a personal agent from first principles while enforcing hard cost controls and behavior auditing on every turn. It's deliberately smaller and narrower than other personal-assistant projects: @@ -15,9 +15,9 @@ It's deliberately smaller and narrower than other personal-assistant projects: Both are broader and better-resourced. **Solrac's distinct value:** -- **Local-LLM-first economics.** No-prefix messages route to the free local engine (Ollama or LMStudio); `@` and `!` are paid Claude escalations only on operator intent. -- **Cost enforcement, not just visibility.** Sliding hourly USD caps that *deny* turns when hit, plus a daily cost-report DM. -- **Audit-before-acting.** Every update (allowed, denied, queue-full) writes a row to one append-only SQLite table. +- **BYO-model engine slot.** No-prefix messages route to whichever model source you wire — free on-host (Ollama / LMStudio) or pay-per-token remote (OpenRouter). `@` (Sonnet) and `!` (Opus) are paid Claude escalations only on operator intent. +- **Cost enforcement, not just visibility.** Sliding per-chat and global hourly USD caps that *deny* turns when hit — they sum every `cost_usd` row (Claude or OpenRouter), so remote-mode burn is gated by the same ceilings without extra configuration. Plus a daily cost-report DM. +- **Audit-before-acting.** Every update (allowed, denied, queue-full) writes a row to one append-only SQLite table, tagged with the engine that served it (`local:ollama:...`, `remote:openrouter:...`, `claude:primary:...`). - **Single-process minimalism.** No HTTP framework, no Telegram framework runtime, no queue server, no Docker, no sub-agents. A few thousand lines of TypeScript you can read in an afternoon and fork. If you need multi-tenancy, voice wake, mobile companions, or 25 chat platforms, use OpenClaw or Hermes. If you want a small, cost-capped, fully audited foundation you can bend to your shape, Solrac fits. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 90e7a15..774b493 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -64,17 +64,21 @@ src/ ├── integrations.ts — operator-authored TS modules + blessed built-ins; │ returns SDK MCP tool definitions + tier map ├── agent.ts — wires Claude Agent SDK; runs one turn -├── local.ts — local-engine runner; single-shot + tool-loop dispatcher -│ (consumes driver events from local-driver.ts) -├── local-driver.ts — backend driver abstraction; createOllamaDriver (NDJSON) -│ + createLmstudioDriver (SSE); emits LocalChatEvent union -├── local-tools.ts — local-engine tool-loop driver (mcpToLocalTools, runToolLoop, +├── engine.ts — engine-slot runner; single-shot + tool-loop dispatcher +│ (consumes driver events from local-driver / remote-driver) +├── engine-driver.ts — shared backend abstraction (EngineDriver, EngineChatEvent, +│ EngineDriverError); driver.mode = "local" | "remote" +├── local-driver.ts — on-host backends: createOllamaDriver (NDJSON) +│ + createLmstudioDriver (SSE); driver.mode = "local" +├── remote-driver.ts — hosted backends: createOpenrouterDriver (SSE + +│ auth headers + cost capture); driver.mode = "remote" +├── engine-tools.ts — engine-slot tool-loop driver (mcpToEngineTools, runToolLoop, │ executeToolCall — policy + broker per call) │ ├── commands.ts — slash command parser + dispatcher │ (/clear, /compact, /context, /help, /status, /tasks) ├── skills.ts — load SKILL.md files; expose as / commands -├── skill-tools.ts — bridge tool:true skills to the local tool catalog as +├── skill-tools.ts — bridge tool:true skills to the engine-slot tool catalog as │ skills__; AsyncLocalStorage for per-turn context ├── scheduler.ts — load TASK.md files; fire on schedule via the queue │ @@ -106,10 +110,13 @@ markdown → telegram (htmlEscape only) policy → db + telegram + log + config integrations → log agent → session + policy + telegram + log + markdown + instance -local-driver → log -local-tools → policy + log + telegram (types) + integrations + local-driver -local → session + policy + telegram + log + markdown + - local-driver + local-tools + skill-tools + integrations + instance +engine-driver → log +local-driver → log + engine-driver +remote-driver → log + engine-driver +engine-tools → policy + log + telegram (types) + integrations + engine-driver +engine → session + policy + telegram + log + markdown + instance + + engine-driver + local-driver + remote-driver + + engine-tools + skill-tools + integrations poll → telegram + db + log skills → log + telegram (types) commands → agent + policy + db + telegram + skills + scheduler @@ -265,7 +272,7 @@ The body is a prompt template; `{{args}}` is the only placeholder and is replace - **Claude tiers (`primary` / `secondary`).** `runSkill` in `commands.ts`. Pre-flight cost cap (chat + global; cap-rejected skills cost $0), then `query()` with `maxTurns: skill.maxTurns`, no `resume` (fresh isolated turn), `tools: { type: "preset", preset: "claude_code" }`, `disallowedTools: ["Agent","Task"]` (sub-agents off; belt-and-suspenders with `policy.ts::SUBAGENT_DENY_TOOLS`). The interactive `canUseTool` factory + `PreToolUse` / `PostToolUse` / `PostToolUseFailure` hooks come from `deps.createCanUseTool` / `policy.ts` — same instances `runAgent` uses, so cost cap, loop detector, and the Telegram-confirm UX behave identically inside a skill. When integrations are loaded, `deps.mcpServer` is wired so the body sees `mcp__solrac__` tools too. Audit row tagged `claude:::skill:`; mid-turn cap or loop denials get promoted into `error_message` as `policy_deny:: …`. - **Local tier.** `runLocalSkill` (and the bare `runSkillBare` helper) in `commands.ts`. The helper dispatches on whether `LocalSkillDeps` has `tools + toolTiers + broker` wired: - - **Tools wired** → `runSkillBareWithTools` routes the body through the same `runToolLoop` driver that `runLocalTurnWithTools` uses. `maxIterations = skill.maxTurns`, fresh loop detector, full `mcp__solrac__*` + `skills__*` catalog with the skill's own `skills__` entry filtered out (recursion guard — see below). No history, no SOLRAC.md overlay, no streaming stub. + - **Tools wired** → `runSkillBareWithTools` routes the body through the same `runToolLoop` driver that `runEngineTurnWithTools` uses. `maxIterations = skill.maxTurns`, fresh loop detector, full `mcp__solrac__*` + `skills__*` catalog with the skill's own `skills__` entry filtered out (recursion guard — see below). No history, no SOLRAC.md overlay, no streaming stub. - **Tools absent** → fall through to a single-shot backend round trip (`stream: false`; NDJSON `/api/chat` for Ollama, SSE `/v1/chat/completions` for LMStudio). Preserves pure text-transform skills (no `requires:`, `max_turns: 1`) at minimum latency. Either way: audit row tagged `local:::skill:` with `cost_usd: 0`. Pre-flight Claude cap is skipped (a chat throttled by Claude burn shouldn't lose access to free local inference). @@ -274,7 +281,7 @@ Reply for both: model output verbatim, HTML-escaped, truncated to ≈3,500 chars **Skills as tools (Phase 1: local engine only).** Distinct axis from "skills using tools" above — *that* is shipped on both tiers. *This* is whether the local agent can call a skill **by name** as a tool entry in its catalog. A skill with `tool: true` is exposed as a callable MCP tool to the local agent (`skill-tools.ts::buildSkillTools`). The model sees it in its tool catalog as `mcp__solrac__skills__` (wire format on the local engine: `skills__`) with the operator-authored description. Tool dispatch: 1. **Catalog merge.** At boot, eligible skills (`tool: true && tier: local`) become `SdkMcpToolDefinition` entries with input schema `{ args: string }`. They're merged into `integrationTools` and `integrationToolTiers` (all `auto`-allow) before `localDeps` is constructed. -2. **Per-turn context propagation.** `runLocalTurnWithTools` wraps the loop in `skillToolCtx.run({chatId, fromId, updateId, parentAuditId}, () => runToolLoop(...))`. The skill handler reads the store via `AsyncLocalStorage.getStore()` — needed because the SDK tool-handler signature `(args, extra) => ...` leaves no slot for chat context, and concurrent turns require race-free context (the queue runs N chats in parallel). ALS is the standard Node primitive for this. +2. **Per-turn context propagation.** `runEngineTurnWithTools` wraps the loop in `skillToolCtx.run({chatId, fromId, updateId, parentAuditId}, () => runToolLoop(...))`. The skill handler reads the store via `AsyncLocalStorage.getStore()` — needed because the SDK tool-handler signature `(args, extra) => ...` leaves no slot for chat context, and concurrent turns require race-free context (the queue runs N chats in parallel). ALS is the standard Node primitive for this. 3. **Handler.** Reads ALS context, calls `runSkillBare`, writes a fresh audit row with `origin='tool_call'` so operators can distinguish agent-driven invocations from operator-typed `/` calls (`origin='user'`). Returns the model's text as the tool result; the parent local turn composes its final user-facing reply on top. 4. **Permission tier.** Auto-allow. Cost cap is the backstop (Phase 1 local-tier skills are free; Phase 2 unlocks Claude-tier skills with a per-skill cost cap). @@ -421,12 +428,12 @@ Tools surface to the model as `mcp__solrac__`. The full picture: ### Local-engine scope -`runLocalTurn` in `local.ts` branches on `LOCAL_TOOLS_ENABLED`. The wire-format work lives in `local-driver.ts`'s `LocalDriver` interface — `createOllamaDriver` (NDJSON `/api/chat`) and `createLmstudioDriver` (SSE `/v1/chat/completions`, with `parallel_tool_calls: false` Gemma-4 workaround + tool-call arg-delta accumulation + `[DONE]` terminator handling). Both drivers emit a uniform `LocalChatEvent` union (`{ kind: "text" | "tool_call" | "done" | "error", ... }`); `local.ts` and `local-tools.ts` are wire-format-agnostic above that line. +`runEngineTurn` in `engine.ts` branches on `LOCAL_TOOLS_ENABLED`. The wire-format work lives behind the `EngineDriver` interface in `engine-driver.ts`, implemented by `local-driver.ts` (`createOllamaDriver` — NDJSON `/api/chat`; `createLmstudioDriver` — SSE `/v1/chat/completions`, with `parallel_tool_calls: false` Gemma-4 workaround + tool-call arg-delta accumulation + `[DONE]` terminator handling) and `remote-driver.ts` (`createOpenrouterDriver` — SSE with auth + attribution headers + trailing `cost` capture). All three drivers emit a uniform `EngineChatEvent` union (`{ kind: "text" | "tool_call" | "done" | "error", ... }`); `engine.ts` and `engine-tools.ts` are wire-format-agnostic above that line. -- **Tools off (default for Claude-only deploys):** single-shot streaming through the driver. No tools exposed; `audit.tool_calls` is `null`. The capability note (`local.ts::buildLocalCapabilityNote`) tells the model it has no tools and nudges users toward `@`/`!` for tool-shaped requests. -- **Tools on (recommended for the local-default deploy; precondition: `SOLRAC_INTEGRATIONS_ENABLED=true`):** multi-round tool loop in `src/local-tools.ts::runToolLoop`. The local model receives the same `mcp__solrac__*` integration tools the Claude tiers see, with per-call gating reused from `policy.ts` (`classifyToolWithIntegrations`, the `LoopDetector`, the `ConfirmationBroker`). `LOCAL_MAX_TOOL_ITERATIONS` (default 8) backstops a single shared `AbortSignal` covering every fetch in the turn. `audit.tool_calls` records the executed calls. The capability note advertises the loaded tool names so the model knows what it can call. +- **Tools off (default for Claude-only deploys):** single-shot streaming through the driver. No tools exposed; `audit.tool_calls` is `null`. The capability note (`local-driver.ts::buildLocalCapabilityNote` for local mode, `remote-driver.ts::buildRemoteCapabilityNote` for remote mode; selected by `driver.mode`) tells the model it has no tools and nudges users toward `@`/`!` for tool-shaped requests. +- **Tools on (recommended for the local-default deploy; precondition: `SOLRAC_INTEGRATIONS_ENABLED=true`):** multi-round tool loop in `src/engine-tools.ts::runToolLoop`. The engine-slot model receives the same `mcp__solrac__*` integration tools the Claude tiers see, with per-call gating reused from `policy.ts` (`classifyToolWithIntegrations`, the `LoopDetector`, the `ConfirmationBroker`). `LOCAL_MAX_TOOL_ITERATIONS` (default 8) backstops a single shared `AbortSignal` covering every fetch in the turn. `audit.tool_calls` records the executed calls. The capability note advertises the loaded tool names so the model knows what it can call. -Both paths share the audit row format, the streaming stub UX, the cost-cap-doesn't-apply rule (`cost_usd = 0`), the cross-engine context bridge, and the `disallowedTools` belt-and-suspenders (`LOCAL_DENY_TOOLS` mirrors `agent.ts`'s SDK-level `disallowedTools: ["Agent","Task"]`). Reliability of local-engine tool-calling varies sharply by model — `gemma4:e4b` (Ollama) is the recommended baseline; LMStudio additionally needs the driver's identical-`(name, args)` dedup to work around Gemma-4's duplicate-tool-call quirk (lmstudio-bug-tracker #1756). +Both paths share the audit row format, the streaming stub UX, the cross-engine context bridge, and the `disallowedTools` belt-and-suspenders (`ENGINE_DENY_TOOLS` in `engine-tools.ts` mirrors `agent.ts`'s SDK-level `disallowedTools: ["Agent","Task"]`). Local-mode rows write `cost_usd = 0` and bypass the cap window; remote-mode rows carry driver-reported cost and gate against the same per-chat + global caps as Claude. Reliability of engine-slot tool-calling varies sharply by model — `gemma4:e4b` (Ollama) is the recommended local baseline; LMStudio additionally needs the driver's identical-`(name, args)` dedup to work around Gemma-4's duplicate-tool-call quirk (lmstudio-bug-tracker #1756). On OpenRouter, every frontier model handles tool-calling cleanly (Claude, GPT-4o, Gemini, Llama 3.3 70B). --- @@ -687,7 +694,7 @@ Global is checked first because if the host is over its absolute budget, every c **v1 limitation:** both caps measure Anthropic API spend only. Tools that call paid third-party APIs (e.g. a `replicate` CLI) aren't measured; auto-deny rules in the classifier are the v1 mitigation. See [`ROADMAP.md` OQ#5 — cost surprises beyond Anthropic](./ROADMAP.md#oq5-cost-surprises-beyond-anthropic). -**Local-engine tool calls are NOT gated by either cost cap.** The local engine is free; the cap exists to bound Anthropic spend. The `LOCAL_MAX_TOOL_ITERATIONS` ceiling and the per-turn loop detector are the runaway-loop defenses for the local path. Confirm-tier tools still go through the same `ConfirmationBroker` regardless of engine. +**Engine-slot tool calls in local mode are NOT gated by either cost cap.** Local-mode rows write `cost_usd = 0`; the cap exists to bound paid spend. The `LOCAL_MAX_TOOL_ITERATIONS` ceiling and the per-turn loop detector are the runaway-loop defenses for the local path. In remote mode, OpenRouter cost lands in `audit.cost_usd` and participates in the same per-chat + global hourly caps as Claude burn. Confirm-tier tools still go through the same `ConfirmationBroker` regardless of engine or mode. --- @@ -695,33 +702,40 @@ Global is checked first because if the host is over its absolute budget, every c ## Engine routing (prefix table) -The first non-whitespace character of `msg.text` picks the engine; with no prefix, `SOLRAC_DEFAULT_ENGINE` (default `local`) decides. The default routes no-prefix messages to the local-engine path, so Anthropic burn happens only on a deliberate `@` (Sonnet) or `!` (Opus). The local backend is picked at deploy time via `LOCAL_BACKEND` (`ollama` | `lmstudio`); the engine layer is backend-agnostic. +The first non-whitespace character of `msg.text` picks the engine; with no prefix, `SOLRAC_DEFAULT_ENGINE` (default `local`) decides. The default routes no-prefix messages to the engine-slot path (Ollama / LMStudio / OpenRouter), so Anthropic burn happens only on a deliberate `@` (Sonnet) or `!` (Opus). The engine-slot backend is picked at deploy time via `LOCAL_BACKEND` (`ollama` | `lmstudio`) **or** `REMOTE_BACKEND` (`openrouter`); the engine layer is backend-agnostic. | Prefix | Engine label | Model | Tools | Audit `model` value | |--------|--------------|-------|-------|---------------------| -| (none) | depends on `SOLRAC_DEFAULT_ENGINE` (`local` by default) | `LOCAL_MODEL` on `LOCAL_BACKEND` for default-local; otherwise the matching tier model | integrations only on the local engine (when `LOCAL_TOOLS_ENABLED=true`); `claude_code` preset + integrations on Claude | matches the resolved engine | +| (none), local mode | engine slot served by `LOCAL_BACKEND` (Ollama / LMStudio) | `LOCAL_MODEL` | integrations when `LOCAL_TOOLS_ENABLED=true` | `local::` | +| (none), remote mode | engine slot served by `REMOTE_BACKEND` (OpenRouter) | `REMOTE_MODEL` (`/` slug) | integrations when `LOCAL_TOOLS_ENABLED=true` | `remote::` | | `@` | `primary` (Claude) — escalation | `SOLRAC_PRIMARY_MODEL` (default `claude-sonnet-4-6`) | `claude_code` preset + integrations | `claude:primary:` | | `!` | `secondary` (Claude) — heaviest | `SOLRAC_SECONDARY_MODEL` (default `claude-opus-4-7`) | `claude_code` preset + integrations | `claude:secondary:` | -There is no `>`-style escape prefix. A leading `>` is literal user text routed via no-prefix → `defaultEngine`. The local-engine path is reached only when it is the default engine. +There is no `>`-style escape prefix. A leading `>` is literal user text routed via no-prefix → `defaultEngine`. The engine slot is reached only when it is the default engine. `policy.ts::parseEnginePrefix(text, defaultEngine)` returns `{ engine, explicit, prompt }`. `explicit` is true only when an actual prefix character (`@` or `!`) was consumed; `main.ts` uses it to render usage hints on empty explicit-prefix payloads. -**Design rationale.** *Claude only when explicitly requested.* Anthropic burn happens on a deliberate `@` or `!`; everything else stays local and free. The integration surface (operator-authored + blessed `mcp__solrac__*` tools) is shared across all three engines — the local engine gets it via `LOCAL_TOOLS_ENABLED=true`, both Claude tiers get it via the `claude_code` preset. +**Local vs remote mode.** The internal `Engine = "primary" | "secondary" | "local"` union is unchanged — adding OpenRouter did NOT add a new engine. Each `EngineDriver` carries a `mode: "local" | "remote"` field set by its factory (`createOllamaDriver`/`createLmstudioDriver` → `"local"`; `createOpenrouterDriver` → `"remote"`). `runEngineTurn` reads `driver.mode` directly — there's no parallel `mode` field on the deps. Mode drives three behaviors: audit-tag prefix (`local:` vs `remote:`), capability-note cost framing (free vs per-token), and the `cost_usd` write decision (always 0 for local; driver-reported for remote; `null` + `remote.cost_missing` warn if the driver returned no cost). The runner has one code path; mode is a one-line discriminator on the driver. + +**Design rationale.** *Claude only when explicitly requested.* Anthropic burn happens on a deliberate `@` or `!`; everything else stays in the engine slot (free on local mode, per-token on remote mode). The integration surface (operator-authored + blessed `mcp__solrac__*` tools) is shared across all three engines — the engine slot gets it via `LOCAL_TOOLS_ENABLED=true` regardless of mode, both Claude tiers get it via the `claude_code` preset. **Boot validation enforces reachability:** -- `defaultEngine === "local" && !localEnabled` → throw (the default would error every turn). -- `defaultEngine !== "local" && localToolsEnabled` → throw (the local engine runs only as the default; tools-on without it being the default would load tool schemas no engine can call). +- `defaultEngine === "local" && !localEnabled && !remoteEnabled` → throw (the default would error every turn). +- `defaultEngine !== "local" && localToolsEnabled` → throw (the engine slot runs only as the default; tools-on without it being the default would load tool schemas no engine can call). +- `localEnabled && remoteEnabled` → throw (mutually exclusive — the engine slot has one driver per boot). - `localEnabled === true && (!localBackend || !localModel)` → throw (the backend driver can't be constructed). +- `remoteEnabled === true && (!remoteBackend || !remoteModel || !remoteApiKey)` → throw. - `SOLRAC_DEFAULT_ENGINE=ollama` (legacy) → throw with a rename hint pointing at `local` + `LOCAL_BACKEND=ollama`. Same for every legacy `OLLAMA_*` env var. -When `defaultEngine === "local"`, boot fires a one-shot backend health probe via `driver.probe()` (`/api/tags` for Ollama, `/v1/models` for LMStudio); failures are logged (`local.boot_health_failed`) but non-fatal — the backend may come up after Solrac under systemd, and we don't want to crash the unit on a transient. +When `defaultEngine === "local"`, boot fires a one-shot health probe via `driver.probe()` against whichever backend is wired (`/api/tags` for Ollama, `/v1/models` for LMStudio, `/models` with bearer for OpenRouter); failures are logged (`engine.boot_health_failed`) but non-fatal — the backend may come up after Solrac under systemd, and we don't want to crash the unit on a transient. + +**Cost capture (remote mode).** OpenRouter's streaming SSE response includes a `cost` (USD) field in the trailing usage chunk automatically — no opt-in required as of 2026 ([OpenRouter docs](https://openrouter.ai/docs/guides/administration/usage-accounting)). `EngineChatEvent.done` carries `costUsd: number | null`; `engine.ts::resolveAuditCost` writes it to `audit.cost_usd`. The existing `HOURLY_COST_CAP_USD` + `GLOBAL_HOURLY_COST_CAP_USD` queries sum `cost_usd` indiscriminately, so remote burn is gated alongside Claude burn — no new cost-cap knob. Defensive: if the driver returns `null` (field disappeared, parse error), the runner writes `null` (NOT 0) so the cap query's `COALESCE(SUM(cost_usd), 0)` doesn't silently treat the row as free. A `remote.cost_missing` warn fires on every such row. ``` poll → gate → throttle → queue.enqueue └─ runTurn (queued) - ├─ engine === 'local' → runLocalTurn + ├─ engine === 'local' → runEngineTurn └─ 'primary' | 'secondary' → runAgent({engine, ...}) ``` @@ -799,11 +813,12 @@ Pre-tier rows ran on the then-default `SOLRAC_MODEL=claude-opus-4-7`, which is n The local engine is the default in the recommended config (`SOLRAC_DEFAULT_ENGINE=local`). No-prefix messages route here; Claude tiers are reached via explicit `@` / `!`. There is no `>`-style escape prefix — the local engine runs only as the default, so an extra prefix character would be redundant. -Backend selection sits one layer below the engine. `LOCAL_BACKEND` (`ollama` | `lmstudio`) picks the wire driver in `local-driver.ts`: +Backend selection sits one layer below the engine. `LOCAL_BACKEND` (`ollama` | `lmstudio`) picks the on-host wire driver in `local-driver.ts`; `REMOTE_BACKEND=openrouter` picks the hosted driver in `remote-driver.ts`: - `ollama` — NDJSON `/api/chat`, probe `/api/tags`; default port 11434. - `lmstudio` — SSE `/v1/chat/completions` (with `parallel_tool_calls: false` Gemma-4 workaround + tool-call argument-delta accumulation + `[DONE]` terminator + optional trailing `usage` chunk), probe `/v1/models`; default port 1234. +- `openrouter` — SSE `/v1/chat/completions` (OpenAI-compatible; bearer auth + `HTTP-Referer`/`X-Title` attribution headers; trailing `usage.cost` chunk captured to `audit.cost_usd`), probe `/v1/models`. -The `LocalDriver.streamChat` interface emits a uniform `LocalChatEvent` union (`{ kind: "text" | "tool_call" | "done" | "error", ... }`); everything above the driver layer (`local.ts`, `local-tools.ts`, `skill-tools.ts`) is wire-format-agnostic. Adding a third backend (vLLM, llama.cpp) means writing one more `createDriver` and registering it in the factory. +The `EngineDriver.streamChat` interface (defined in `engine-driver.ts`) emits a uniform `EngineChatEvent` union (`{ kind: "text" | "tool_call" | "done" | "error", ... }`); everything above the driver layer (`engine.ts`, `engine-tools.ts`, `skill-tools.ts`) is wire-format-agnostic. Adding a fourth backend (vLLM, llama.cpp, Together.ai) means writing one more `createDriver` setting `mode: "local" | "remote"` accordingly and registering it in the matching factory. Motivation: (1) most casual chat doesn't need Claude's reasoning, so the free local path becomes the workhorse; (2) when `LOCAL_TOOLS_ENABLED=true`, the local model can call the same `mcp__solrac__*` integrations Claude does — the operator's tool surface is what makes default-local useful for tool-driven work. @@ -818,8 +833,8 @@ Motivation: (1) most casual chat doesn't need Claude's reasoning, so the free lo - **No `canUseTool` / `PreToolUse` SDK hooks**: the SDK isn't in the loop. With `LOCAL_TOOLS_ENABLED=true`, the same gates run inside `runToolLoop` (cost cap doesn't apply since cost is zero, but `LoopDetector` and `ConfirmationBroker` do). With tools off, no gates run at all — there are no tool calls to gate. - **No `SessionStore` resume**: the backend chat endpoint is stateless per call (both Ollama and LMStudio). Conversation continuity comes from history reconstruction, not session IDs. -- **No `claude_code` system-prompt preset**: local backends don't know it. The first `system` message is `${soul}\n\n${capabilityNote}` — the operator-editable `SOUL.md` text plus a one-line engine-specific clause built by `local.ts::buildLocalCapabilityNote` (which adapts based on whether tools are on, and whether the local engine is the default vs. an explicit escalation target). When `SOLRAC.md` is present and activated, its content ships as a second `system` message wrapped in `` (a separate turn rather than concatenated, since local models lack RLHF on instruction hierarchy). -- **`cost_usd = 0`** in audit rows. Cost-cap queries sum over all rows so the local engine doesn't pollute the cap window — the per-chat and global cost caps are unaffected. +- **No `claude_code` system-prompt preset**: engine-slot backends don't know it. The first `system` message is `${soul}\n\n${capabilityNote}` — the operator-editable `SOUL.md` text plus a one-line clause built by `local-driver.ts::buildLocalCapabilityNote` (mode = "local", "free" framing) or `remote-driver.ts::buildRemoteCapabilityNote` (mode = "remote", "per-token via OpenRouter" framing). Each builder adapts based on whether tools are on and whether the engine slot is the default vs. an explicit escalation target. The runner picks the right builder via `driver.mode`. When `SOLRAC.md` is present and activated, its content ships as a second `system` message wrapped in `` (a separate turn rather than concatenated, since these models lack RLHF on instruction hierarchy). +- **`cost_usd = 0` in local mode**; **driver-reported cost in remote mode**. Cost-cap queries sum over all rows; local-mode rows are free (0), remote-mode rows carry OpenRouter's per-call cost so remote burn participates in the same per-chat + global hourly caps as Claude burn. - **`agent_session_id = null`** and **`tool_calls = null`** in audit rows when tools are off. ### Stateful conversation history @@ -852,7 +867,7 @@ Default `LOCAL_HISTORY_LIMIT=6` = 3 round-trips. At 256-char truncated prompts - **OQ-A**: history is per-chat across all local models. If we later add `>gemma3 ...` vs `>qwen2.5 ...` model selection, the query needs `AND model = ?`. - **OQ-B**: history is capped by *count*, not tokens. A 2k-context model will silently truncate. - **OQ-C**: per-local-engine concurrency cap. Today the local engine shares the global `MAX_CONCURRENT_TURNS=4` semaphore with Claude. Local inference is GPU-bound; 4 simultaneous local streams thrash a single GPU. Add a separate `MAX_CONCURRENT_LOCAL_TURNS` semaphore in front of the local path if measured. -- **OQ-D**: no inference-budget cap. The local engine is free, but a flooder could pin the GPU. Allowlist gates strangers. +- **OQ-D**: no inference-budget cap. The engine slot in local mode is free, but a flooder could pin the GPU. Allowlist gates strangers. Remote-mode OpenRouter burn is already gated by the per-chat + global hourly cost caps. --- @@ -1062,11 +1077,11 @@ If we reordered (offset before claim), a crash between steps 2 and 3 would re-pr **Problem.** Ollama signals request-level errors via HTTP status codes (caught by `LocalDriverError` in `streamChat`'s pre-stream `res.ok` check). **LMStudio does not.** Context-length overruns, prompt-template mismatches, and similar server-side rejections arrive with **HTTP 200** and a single SSE frame shaped `{"error":{"message":"…"},"message":"…"}`, often followed by `[DONE]` or an immediate stream close. The driver's frame parser previously only knew about `model`/`choices`/`usage`, so these frames fell through, the consumer saw zero events, and the UI rendered `(empty response)` with no diagnostic — a silent failure where the operator had no way to know LMStudio actually told us what went wrong. -**Solution.** `LmstudioSseFrame` has an `error?: { message?: string } | string` field; the parse loop in `createLmstudioDriver` checks `frame.error` immediately after JSON-parsing each frame and yields `{ kind: "error", message }` (mirroring the Ollama-side `frame.error` branch at the top of `streamChat`). `runStreamingRound` then breaks on `kind:"error"` and the message propagates through `errorMessage` to the `local.done` audit row and the rendered `❌ error: …` reply. +**Solution.** `LmstudioSseFrame` has an `error?: { message?: string } | string` field; the parse loop in `createLmstudioDriver` checks `frame.error` immediately after JSON-parsing each frame and yields `{ kind: "error", message }` (mirroring the Ollama-side `frame.error` branch at the top of `streamChat`). `runStreamingRound` then breaks on `kind:"error"` and the message propagates through `errorMessage` to the `engine.done` audit row and the rendered `❌ error: …` reply. -**Diagnostic layer.** A separate guard, `maybeLogEmptyStream` in `local-driver.ts`, captures up to 30 raw `data:` payloads (truncated to 400 chars each) and emits `local.lmstudio_empty_stream` at warn level if both text and tool-call counters end at zero. This is the safety net for *future* unknown frame shapes — if LMStudio adds a new error envelope or content channel, the raw frames land in the log so the cause is recoverable without a wire-capture rerun. Happy path is silent; only fires when no events were yielded. +**Diagnostic layer.** A separate guard, `maybeLogEmptyStream` in `engine-driver.ts` (shared between local + remote drivers), captures up to 30 raw `data:` payloads (truncated to 400 chars each) and emits `.empty_stream` at warn level — `lmstudio.empty_stream` from `local-driver.ts`, `openrouter.empty_stream` from `remote-driver.ts` — if both text and tool-call counters end at zero. This is the safety net for *future* unknown frame shapes — if LMStudio or OpenRouter adds a new error envelope or content channel, the raw frames land in the log so the cause is recoverable without a wire-capture rerun. Happy path is silent; only fires when no events were yielded. -**Implication.** Adding a new LMStudio response shape requires either (a) extending `LmstudioSseFrame` with the new field and adding a parse branch, or (b) extending the error detection. Either way, when a turn fails, **check `local.lmstudio_empty_stream` warns first** — they're the canary for protocol drift between releases. +**Implication.** Adding a new LMStudio response shape requires either (a) extending `LmstudioSseFrame` with the new field and adding a parse branch, or (b) extending the error detection. Either way, when a turn fails, **check `lmstudio.empty_stream` warns first** — they're the canary for protocol drift between releases. --- @@ -1151,7 +1166,7 @@ Off by default. Enabled via `SOLRAC_WEB_ENABLED=true` plus a token. Brings a bro ### How it preserves the existing path -`agent.ts` and `local.ts` already accept any `TelegramClient`. main.ts builds a parallel `WebClient`, a parallel `commandDeps` (with `tg = webClient`), a parallel `LocalRunDeps`, and a parallel `ConfirmationBroker` (also pointed at `webClient`). The single turn queue's `runTurn` dispatches to the web variants when the synthetic `webChatId` is on the update; otherwise the Telegram path runs unchanged. +`agent.ts` and `engine.ts` already accept any `TelegramClient`. main.ts builds a parallel `WebClient`, a parallel `commandDeps` (with `tg = webClient`), a parallel `EngineRunDeps`, and a parallel `ConfirmationBroker` (also pointed at `webClient`). The single turn queue's `runTurn` dispatches to the web variants when the synthetic `webChatId` is on the update; otherwise the Telegram path runs unchanged. ``` Browser ──HTTP──▶ web.ts (Bun.serve, separate port) @@ -1165,7 +1180,7 @@ Browser ──HTTP──▶ web.ts (Bun.serve, separate port) │ │ runTurn dispatches by chatId → webRunTurn / tgRunTurn ◀──events──── WebClient (TelegramClient impl) │ - └─▶ runAgent / runLocalTurn (tg = webClient) + └─▶ runAgent / runEngineTurn (tg = webClient) audit row written, cost cap, policy hooks — all unchanged ``` @@ -1173,7 +1188,7 @@ Browser ──HTTP──▶ web.ts (Bun.serve, separate port) Telegram's HTML parse_mode supports a small subset (`
 
`). `agent.ts:495` previously emitted `htmlEscapeText(text)` on Claude's body, which preserved markdown syntax as literal characters in Telegram. The fix: -- `agent.ts` and `local.ts` now run the response body through `mdToTelegramHtml(text)` for Telegram (proper bold, italic, code blocks; lists flattened to `• item`; headers to ``; tables to ASCII inside `
`).
+- `agent.ts` and `engine.ts` now run the response body through `mdToTelegramHtml(text)` for Telegram (proper bold, italic, code blocks; lists flattened to `• item`; headers to ``; tables to ASCII inside `
`).
 - `SendMessageOpts` and `EditMessageTextOpts` carry an optional `markdownSource: string` sidecar. The real Telegram client (`telegram.ts:205-215`) destructures-and-drops it before `tgCall` — never hits the wire.
 - `WebClient` reads `markdownSource` preferentially; consumer (browser) renders it with `marked` + `sanitizeHtml`. If absent, the html-fallback (already sanitized at the SSE boundary) is used.
 
diff --git a/docs/CONFIG.md b/docs/CONFIG.md
index 1c780a9..a512766 100644
--- a/docs/CONFIG.md
+++ b/docs/CONFIG.md
@@ -29,6 +29,16 @@ Every Solrac knob is an environment variable, validated and frozen at boot by `s
 | `LOCAL_HISTORY_LIMIT` | no | `6` | positive int | Last N successful turns reconstructed as conversation context per chat (cross-engine: includes Claude turns). At 256-char prompts × 6 turns ≈ ~3k tokens worst case. If you flip `LOCAL_TOOLS_ENABLED` off→on on an existing chat, prior "I do not have tools" turns get replayed and the model learns to refuse — use `/clear local` to wipe the chat's local history. |
 | `LOCAL_TOOLS_ENABLED` | no | `false` | boolean | Local model can call the same `mcp__solrac__*` integration tools the Claude tiers see. Requires `SOLRAC_INTEGRATIONS_ENABLED=true` AND `SOLRAC_DEFAULT_ENGINE=local` (boot rejects the unreachable `default!=local && tools=on` combo). Recommended `true` for local-default deploys. |
 | `LOCAL_MAX_TOOL_ITERATIONS` | no | `8` | positive int | Hard ceiling on tool-loop rounds per turn. Loop detector fires earlier on duplicate calls; this is the runaway-loop backstop. Iteration cap surfaces as `⚠️ stopped after N tool iterations`. |
+| `REMOTE_ENABLED` | no | `false` | boolean | Master switch for the remote-backend path (OpenRouter). **Mutually exclusive with `LOCAL_ENABLED`** — boot rejects both true. When `true`, the engine slot (`SOLRAC_DEFAULT_ENGINE=local` routing) dispatches to OpenRouter instead of an on-host LLM. Audit tag becomes `remote::` so cross-engine queries pattern-match correctly. `REMOTE_BACKEND`, `REMOTE_MODEL`, `REMOTE_API_KEY` MUST be set. |
+| `REMOTE_BACKEND` | when `REMOTE_ENABLED=true` | — | `openrouter` | Remote provider. Today only OpenRouter is supported; the type is provider-neutral for future vLLM/Anyscale/Together/Groq additions. |
+| `REMOTE_MODEL` | when `REMOTE_ENABLED=true` | — | string | OpenRouter slug (`/`). Examples: `anthropic/claude-3.5-sonnet`, `openai/gpt-4o-mini`, `meta-llama/llama-3.3-70b-instruct`. Browse the full list at `https://openrouter.ai/models`. The `/` separator is preserved across the audit log; nothing parses `model` by splitting on `/`. |
+| `REMOTE_API_KEY` | when `REMOTE_ENABLED=true` | — | string | OpenRouter API key (typically prefixed `sk-or-`). Get one at `https://openrouter.ai/keys`. **Scrubbed** from the SDK-spawned `claude` subprocess env (`agent.ts::sanitizedSubprocessEnv` strips the entire `REMOTE_*` prefix) so a compromised model can't exfiltrate the billed credential via an auto-allowed `Bash(echo $REMOTE_API_KEY)`. |
+| `REMOTE_BASE_URL` | no | `https://openrouter.ai/api/v1` | url | Override for proxies or staging. Trailing slash stripped at boot; URL validated (http/https scheme required). |
+| `REMOTE_TIMEOUT_MS` | no | `60000` (or `120000` when `LOCAL_TOOLS_ENABLED=true`) | positive int | Mirrors `LOCAL_TIMEOUT_MS` semantics. |
+| `REMOTE_HISTORY_LIMIT` | no | `6` | positive int | Mirrors `LOCAL_HISTORY_LIMIT` — last N successful turns reconstructed as conversation context per chat (cross-engine). `/clear local` wipes the slot for remote turns too via the triple-pattern LIKE clause in `db.hasLocalTurnsSince` / `db.outOfBandForEngine`. |
+| `REMOTE_MAX_TOOL_ITERATIONS` | no | `8` | positive int | Mirrors `LOCAL_MAX_TOOL_ITERATIONS`. Per-round cost from each iteration is summed into `audit.cost_usd` so the hourly cap gates the full tool-loop burn. |
+| `REMOTE_HTTP_REFERER` | no | `https://github.com/cjus/solrac` | string | OpenRouter recommends attribution headers so usage shows correctly on the per-model leaderboard. Override for branded forks. |
+| `REMOTE_X_TITLE` | no | `solrac` | string | Counterpart to `REMOTE_HTTP_REFERER`. |
 | `SOLRAC_SKILLS_ENABLED` | no | `false` | boolean | Master switch for operator-defined skills. When `true`, Solrac discovers `SKILL.md` files under `SOLRAC_SKILLS_DIR` at boot and exposes each as a `/` slash command. |
 | `SOLRAC_SKILLS_DIR` | no | `./skills` | path | Directory scanned for `/SKILL.md` files. Resolved relative to `SOLRAC_HOME`. Loaded ONCE at boot — edit files and restart. See [USAGE.md#skills-operator-defined-commands](./USAGE.md#skills-operator-defined-commands). |
 | `SOLRAC_TASKS_ENABLED` | no | `false` | boolean | Master switch for scheduled tasks. When `true`, Solrac discovers `TASK.md` files under `SOLRAC_TASKS_DIR` at boot and fires each on its configured schedule (5-field unix `cron:` or absolute `at:`). Fires synthesize updates through the existing turn queue, so cost caps + allowlist gate + policy hooks all apply automatically. |
@@ -55,10 +65,12 @@ Every Solrac knob is an environment variable, validated and frozen at boot by `s
 - **Webhook constraint:** when `SOLRAC_TRANSPORT=webhook`, `TG_WEBHOOK_SECRET` must be set and ≥32 characters.
 - **Legacy `OLLAMA_*` env var rejection:** any `OLLAMA_*` env var still set at boot causes Solrac to fail loud with the full list and a rename mapping (`OLLAMA_ENABLED` → `LOCAL_ENABLED`, etc., plus `add LOCAL_BACKEND=ollama`). Same for `SOLRAC_DEFAULT_ENGINE=ollama`. See [RUNBOOK.md#breaking-local-engine](./RUNBOOK.md#breaking-local-engine).
 - **Default-engine constraints:**
-  - `SOLRAC_DEFAULT_ENGINE=local` requires `LOCAL_ENABLED=true`. Boot throws with the actionable hint to either enable the local engine or pick a different default.
-  - `SOLRAC_DEFAULT_ENGINE=primary|secondary` with `LOCAL_TOOLS_ENABLED=true` is **unreachable** — the local engine only runs as the default engine, so this combination would load tools no engine can call. Boot throws.
+  - `SOLRAC_DEFAULT_ENGINE=local` requires `LOCAL_ENABLED=true` OR `REMOTE_ENABLED=true`. The "local" engine slot is mode-agnostic — on-host or OpenRouter both satisfy it. Boot throws with the actionable hint listing both paths.
+  - `SOLRAC_DEFAULT_ENGINE=primary|secondary` with `LOCAL_TOOLS_ENABLED=true` is **unreachable** — the engine slot only runs as the default engine, so this combination would load tools no engine can call. Boot throws.
   - When `SOLRAC_DEFAULT_ENGINE` is unset, a `solrac.default_engine_implicit` warn fires at boot so deployments never run on an implicit default. Set the variable explicitly (even to `local`) to silence the warning.
 - **Local-engine constraint:** when `LOCAL_ENABLED=true`, both `LOCAL_BACKEND` (∈ `ollama`/`lmstudio`) and `LOCAL_MODEL` must be set and non-blank. `LOCAL_TIMEOUT_MS`, `LOCAL_HISTORY_LIMIT`, and `LOCAL_MAX_TOOL_ITERATIONS` must parse as positive integers if provided. `LOCAL_URL` has its trailing slash stripped at boot.
+- **Remote-engine constraint:** when `REMOTE_ENABLED=true`, all of `REMOTE_BACKEND`, `REMOTE_MODEL`, `REMOTE_API_KEY` must be set and non-blank. `REMOTE_BASE_URL` (default `https://openrouter.ai/api/v1`) has its trailing slash stripped and scheme validated. `REMOTE_TIMEOUT_MS`, `REMOTE_HISTORY_LIMIT`, `REMOTE_MAX_TOOL_ITERATIONS` parse as positive integers.
+- **Local/remote mutex:** `LOCAL_ENABLED=true && REMOTE_ENABLED=true` is rejected at boot. The engine slot has a single driver per boot — picking between modes is structural, not per-message. Operators wanting both should pin `SOLRAC_DEFAULT_ENGINE=primary` and use Claude for the no-prefix path.
 - **Local-tools constraint:** `LOCAL_TOOLS_ENABLED=true` requires `SOLRAC_INTEGRATIONS_ENABLED=true` (else there are no tools to expose; boot throws).
 - **Web UI constraint:** when `SOLRAC_WEB_ENABLED=true`, `SOLRAC_WEB_TOKEN` must be set (any value; ≥32 chars recommended). `SOLRAC_WEB_PORT` must differ from `PORT`. `SOLRAC_WEB_CHAT_ID` must be a negative integer.
 
@@ -108,7 +120,7 @@ LMStudio operators must size the **model's loaded context window** to fit Solrac
 
 **Effective vs nominal context.** Most models that advertise 128K+ degrade noticeably past their training distribution (the RULER and "lost in the middle" benchmarks). Loading a 26B Gemma-derived model at its nominal 256K rarely buys real-world quality past 32-64K — wasted RAM either way.
 
-**Model selection gotcha.** Some MLX repacks (notably community Gemma variants like `gemma-4-26b-a4b-it-mlx`) silently dropped the tool-call branch from their chat template. When `tools` is in the request body the model emits zero output. Solrac's diagnostic catches this as `local.lmstudio_empty_stream` — see [RUNBOOK.md](./RUNBOOK.md#diagnosis). First-class OpenAI tool-calling models (Qwen 2.5 instruct, Llama 3.1 instruct) avoid this entirely.
+**Model selection gotcha.** Some MLX repacks (notably community Gemma variants like `gemma-4-26b-a4b-it-mlx`) silently dropped the tool-call branch from their chat template. When `tools` is in the request body the model emits zero output. Solrac's diagnostic catches this as `lmstudio.empty_stream` — see [RUNBOOK.md](./RUNBOOK.md#diagnosis). First-class OpenAI tool-calling models (Qwen 2.5 instruct, Llama 3.1 instruct) avoid this entirely.
 
 ## Example `.env`
 
@@ -167,23 +179,49 @@ SOLRAC_WEB_TOKEN=                 # required when enabled; generate: openssl ran
 
 ### Claude-only deploy
 
-For hosts that can't run a local model:
+For hosts that can't run a local model and don't want OpenRouter:
 
 ```sh
 SOLRAC_DEFAULT_ENGINE=primary     # no-prefix → Anthropic Sonnet
 LOCAL_ENABLED=false
+REMOTE_ENABLED=false
 LOCAL_TOOLS_ENABLED=false
 SOLRAC_INTEGRATIONS_ENABLED=true  # still useful for Claude tiers
 ```
 
+### Remote deploy via OpenRouter
+
+For hosts that can't run a local model but want a non-Claude default engine (e.g. `gpt-4o-mini` for cheap chat, with `@` / `!` escalation to Claude tiers for heavier reasoning):
+
+```sh
+SOLRAC_DEFAULT_ENGINE=local       # the engine slot, served by OpenRouter
+LOCAL_ENABLED=false               # mutually exclusive with REMOTE_ENABLED
+REMOTE_ENABLED=true
+REMOTE_BACKEND=openrouter
+REMOTE_MODEL=openai/gpt-4o-mini   # browse https://openrouter.ai/models
+REMOTE_API_KEY=sk-or-…            # get at https://openrouter.ai/keys
+# REMOTE_BASE_URL=https://openrouter.ai/api/v1  # default
+# REMOTE_HTTP_REFERER=https://github.com/cjus/solrac
+# REMOTE_X_TITLE=solrac
+
+# Per-token cost is captured from OpenRouter's streaming usage chunk into
+# audit.cost_usd, so the existing per-chat + global hourly caps gate burn
+# automatically — no separate REMOTE_HOURLY_COST_CAP_USD needed.
+HOURLY_COST_CAP_USD=1.00
+GLOBAL_HOURLY_COST_CAP_USD=4.00
+```
+
 ## Sensitive-secret handling
 
 The SDK spawns a `claude` subprocess that **inherits parent env**. Solrac scrubs Telegram-related and operator-only secrets before that spawn (see `agent.ts::sanitizedSubprocessEnv`):
 
 - `TELEGRAM_*` (any prefix)
 - `TG_*` (any prefix)
+- `LOCAL_*` (any prefix — backend URL/model)
+- `REMOTE_*` (any prefix — OpenRouter API key + base URL)
 - `STATS_BEARER_TOKEN`
 - `ALLOWLIST_BOOTSTRAP`
+- `NOTION_API_KEY`
 
 `ANTHROPIC_API_KEY`, `SOLRAC_PRIMARY_MODEL`, and `SOLRAC_SECONDARY_MODEL` are passed through (the agent needs them).
 
diff --git a/docs/FEATURES.md b/docs/FEATURES.md
index 1e76391..11a488a 100644
--- a/docs/FEATURES.md
+++ b/docs/FEATURES.md
@@ -4,9 +4,10 @@ The complete feature list, grouped by theme. See [../README.md](../README.md) fo
 
 ## Engines & routing
 
-- **Local-first engine routing** — *Claude only when explicitly requested.* No-prefix messages route to the local engine (free) by default; `@` escalates to Sonnet, `!` escalates to Opus. Pinable via `SOLRAC_DEFAULT_ENGINE` (`local` | `primary` | `secondary`) for Claude-only deploys. Boot validation rejects unreachable combinations.
+- **BYO-model engine slot — local OR remote** — *Claude only when explicitly requested.* No-prefix messages route to a single engine slot you wire at deploy time: on-host via `LOCAL_ENABLED=true` (Ollama / LMStudio), or hosted via `REMOTE_ENABLED=true` (OpenRouter). Mutually exclusive at boot. `@` escalates to Sonnet, `!` escalates to Opus. Pinable via `SOLRAC_DEFAULT_ENGINE` (`local` | `primary` | `secondary`) for Claude-only deploys. Boot validation rejects unreachable combinations.
 - **Multi-backend local engine with tool support** — `LOCAL_BACKEND` selects the wire protocol: `ollama` (NDJSON `/api/chat`) or `lmstudio` (SSE `/v1/chat/completions`). When `LOCAL_TOOLS_ENABLED=true`, the local model (e.g. `gemma4:e4b`, `qwen2.5-7b`) calls the same `mcp__solrac__*` integrations the Claude tiers see. Multi-round tool loop with shared loop detector, broker UX, and iteration cap (`LOCAL_MAX_TOOL_ITERATIONS=8`). Cross-engine context bridge means switching between local and Claude preserves the conversation thread.
-- **Dual-Claude tier routing** — `@` → primary tier (Sonnet by default), `!` → secondary tier (Opus by default). Each tier keeps its own SDK session id so prompt caching survives same-tier turns. Per-tier thinking-stub emoji (💻 local / 🙂 primary / 🤔 secondary) makes the routing visible in chat.
+- **Remote engine via OpenRouter (no on-host GPU required)** — `REMOTE_BACKEND=openrouter` points the engine slot at OpenRouter's catalog (`anthropic/claude-3.5-sonnet`, `openai/gpt-4o-mini`, `meta-llama/llama-3.3-70b-instruct`, … browse at [openrouter.ai/models](https://openrouter.ai/models)). Same runtime UX as local mode (no-prefix routing, `/clear local` semantics, capability note, tool-loop wiring via `LOCAL_TOOLS_ENABLED=true`) but per-token cost is captured from OpenRouter's streaming `usage.cost` chunk into `audit.cost_usd` — so the existing `HOURLY_COST_CAP_USD` + `GLOBAL_HOURLY_COST_CAP_USD` ceilings gate remote burn automatically, no new knob needed. `REMOTE_API_KEY` is scrubbed from the Claude SDK subprocess env (prefix-match `REMOTE_*`) so a compromised model can't exfiltrate the billed credential.
+- **Dual-Claude tier routing** — `@` → primary tier (Sonnet by default), `!` → secondary tier (Opus by default). Each tier keeps its own SDK session id so prompt caching survives same-tier turns. Per-tier thinking-stub emoji (💻 engine slot / 🙂 primary / 🤔 secondary) makes the routing visible in chat.
 
 ## Persona, commands & extensions
 
@@ -25,7 +26,7 @@ The complete feature list, grouped by theme. See [../README.md](../README.md) fo
 - **Three-tier permission policy** — auto-allow / auto-deny / Telegram-inline-keyboard-confirm. Configurable rule tables.
 - **Per-chat hourly cost cap** — sliding 60-minute window over the audit log. Default $1.00/chat/hour.
 - **Loop detector** — denies the third call to the same `(toolName, input)` within a turn. Order-insensitive over JSON keys.
-- **Persistent audit trail** — every turn (allowed, denied, queue-full) writes a SQLite row with prompt, response, tool calls, cost, tokens, session id, status, **and engine** (`claude:primary:` / `claude:secondary:` / `local::`).
+- **Persistent audit trail** — every turn (allowed, denied, queue-full) writes a SQLite row with prompt, response, tool calls, cost, tokens, session id, status, **and engine** (`claude:primary:` / `claude:secondary:` / `local::` / `remote:openrouter:`).
 - **Session resume across restarts** — SDK session ids persisted per chat **and per tier**; conversations survive process death.
 - **Inline-keyboard confirm UX** — 60-second timeout, fail-closed on send failure, verdict stamped into chat history after tap.
 - **Sub-agent default-deny** — `Agent`/`Task` tools disabled at SDK + policy layers.
diff --git a/docs/OPERATIONS.md b/docs/OPERATIONS.md
index 0141ec8..86e8012 100644
--- a/docs/OPERATIONS.md
+++ b/docs/OPERATIONS.md
@@ -360,16 +360,40 @@ Canonical event names:
 - `agent.oob_local_injected` — cross-engine bridge injected N local-engine turns into the user prompt (only fires when there are out-of-band local exchanges since the last successful Claude turn)
 - `agent.done` — per-turn summary (cost, turns, isError)
 
-### Local engine (default engine path)
-- `local.stub_send_failed` — couldn't send the 💻 stub
-- `local.bad_frame` — wire-format parse failure on a stream chunk (NDJSON for Ollama, SSE for LMStudio; logged, line skipped, stream continues)
-- `local.fetch_failed` — fetch to `LOCAL_URL` threw (unreachable, abort/timeout, etc.)
-- `local.edit_throttled` / `local.edit_final_failed` — Telegram edit failures
-- `local.final_send_failed` — final fallback send (when the stub creation itself failed earlier)
-- `local.disabled_ack_failed` / `local.usage_ack_failed` — couldn't reply with the disabled / usage hint
-- `local.boot_health_failed` — backend health probe failed at boot (`/api/tags` for Ollama, `/v1/models` for LMStudio); non-fatal warn — daemon may come up after Solrac under systemd
-- `local.lmstudio_empty_stream` — LMStudio closed the SSE stream with zero text + zero tool-call events. Carries up to 30 raw `data:` payloads (truncated to 400 chars each) for diagnosis. Fires only when no events were yielded; happy path is silent. See [ARCHITECTURE.md tricky seam §10](./ARCHITECTURE.md#10-lmstudio-inline-error-sse-frames-http-200) and [RUNBOOK.md](./RUNBOOK.md#diagnosis) for triage steps.
-- `local.done` — per-turn summary (backend, model, elapsedSec, inputTokens, outputTokens, isError)
+### Engine slot (default engine path)
+
+**Runner events** (`engine.ts` / `engine-tools.ts` — fire regardless of backend):
+
+- `engine.stub_send_failed` — couldn't send the 💻 stub
+- `engine.driver_failed` — driver threw `EngineDriverError` (unreachable, timeout, `model_missing`, `http_error`)
+- `engine.unexpected_error` — driver threw something other than `EngineDriverError`
+- `engine.edit_throttled` / `engine.edit_final_failed` — Telegram edit failures
+- `engine.final_send_failed` — final fallback send (when the stub creation itself failed earlier)
+- `engine.disabled_ack_failed` — couldn't reply with the disabled hint
+- `engine.boot_health_ok` / `engine.boot_health_failed` / `engine.boot_health_model_missing` — health probe results (`/api/tags` for Ollama, `/v1/models` for LMStudio, `/v1/models` with bearer for OpenRouter); non-fatal warn — daemon may come up after Solrac under systemd
+- `engine.boot` — engine-slot boot summary (backend, mode, url, model, isDefaultEngine, toolsEnabled)
+- `engine.tools_enabled_but_zero_loaded` — `LOCAL_TOOLS_ENABLED=true` but no integration tools loaded
+- `engine.unexpected_tool_call_single_shot` — single-shot path saw a `tool_call` event (model called a tool we didn't offer); logged, ignored
+- `engine.tool_loop_start` / `engine.tool_loop_done` / `engine.tool_loop_failed` — tool-loop bookends
+- `engine.tool_loop_detected` / `engine.tool_iteration_cap` / `engine.cap_finalize_failed` — loop-defense events
+- `engine.tool_unknown` / `engine.tool_auto_allow` / `engine.tool_denied_policy` / `engine.tool_denied_hard` / `engine.tool_hard_denied` / `engine.tool_invalid_args` / `engine.tool_handler_threw` / `engine.tool_call_ok` — per-tool dispositions
+- `engine.tool_confirm_request` / `engine.tool_confirm_resolved` / `engine.tool_confirm_send_failed` / `engine.tool_confirm_round_cap` / `engine.tool_confirm_skipped_round_cap` — confirm-tier broker events
+- `engine.progress_failed` — throttled stream-render hook threw
+- `engine.done` — per-turn summary (backend, mode, model, elapsedSec, inputTokens, outputTokens, costUsd, isError)
+
+**Backend-specific driver events** (per-prefix so operators can grep one wire-format):
+
+- `ollama.bad_frame` — NDJSON parse failure (`local-driver.ts`; line skipped, stream continues)
+- `lmstudio.bad_frame` — SSE frame parse failure (`local-driver.ts`)
+- `lmstudio.tool_call_deduped` — Gemma-4 emitted the same `(name, args)` tool call twice; silently deduped
+- `lmstudio.empty_stream` — LMStudio closed the SSE stream with zero text + zero tool-call events. Carries up to 30 raw `data:` payloads (truncated to 400 chars each) for diagnosis. Fires only when no events were yielded; happy path is silent. See [ARCHITECTURE.md tricky seam §10](./ARCHITECTURE.md#10-lmstudio-inline-error-sse-frames-http-200) and [RUNBOOK.md](./RUNBOOK.md#diagnosis) for triage steps.
+- `openrouter.bad_frame` — SSE frame parse failure (`remote-driver.ts`)
+- `openrouter.tool_call_deduped` — identical-`(name, args)` dedup, mirrors LMStudio
+- `openrouter.empty_stream` — same shape as `lmstudio.empty_stream` but for OpenRouter
+
+**Remote-mode signals** (fire from `engine.ts` when `driver.mode === "remote"`):
+
+- `remote.cost_missing` — driver returned `null` for `costUsd` in the usage chunk; runner wrote `null` to `audit.cost_usd` (NOT 0) so the cap query doesn't silently treat it as free. Verify the backend still emits `usage.cost`.
 
 ### Policy
 - `policy.auto_allow` — classifier returned allow
diff --git a/docs/RUNBOOK.md b/docs/RUNBOOK.md
index f00d760..c0be8e2 100644
--- a/docs/RUNBOOK.md
+++ b/docs/RUNBOOK.md
@@ -764,7 +764,7 @@ User sends a no-prefix message (which routes to the local engine under `SOLRAC_D
 - `❌ local timed out after 60s` (or `120s` when `LOCAL_TOOLS_ENABLED=true`)
 - `❌ local error:  `
 - `❌ error: local error: The number of tokens to keep from the initial prompt is greater than the context length…` (LMStudio: model's loaded context window is smaller than Solrac's prompt — SOUL.md + tool schemas + SOLRAC.md + history)
-- `(empty response)` with a `local.lmstudio_empty_stream` warn in the logs (LMStudio emitted an unrecognized frame shape — protocol drift; capture the logged raw frames and file a bug)
+- `(empty response)` with a `lmstudio.empty_stream` warn in the logs (LMStudio emitted an unrecognized frame shape — protocol drift; capture the logged raw frames and file a bug)
 - `⚠️ stopped after N tool iterations` (tool-loop didn't converge)
 - `local disabled in this deployment` (defensive — boot validation should have rejected this; investigate)
 
@@ -779,7 +779,7 @@ Each render maps to a distinct cause. Fixes vary by `LOCAL_BACKEND` (`ollama` vs
 | **timed out** | The model took longer than `LOCAL_TIMEOUT_MS` (default 60s, 120s with tools-on) to finish streaming. **First-turn-after-load is the worst case** — cold KV cache + MLX graph compilation + weight paging stack on top of normal prompt-eval cost. Subsequent turns are dramatically faster. | Bump `LOCAL_TIMEOUT_MS` for slow models / cold-start hardware, or pick a smaller model. Stream timing scales with parameter count and quantization. | Same — `LOCAL_TIMEOUT_MS` is backend-agnostic. LMStudio's `lms log stream` shows per-request timing. On Apple Silicon a 31B model with 25 tools in the schema can easily exceed 60s on first call; 180-300s timeout is reasonable. |
 | **error: 5xx** | Backend crashed or ran out of memory mid-request | Check `ollama serve` stderr / system log. Common cause: GPU OOM (a 31B model on a 24GB GPU). Restart Ollama; downsize model. | Check LMStudio's status indicator and `lms log stream`. Same GPU-OOM symptom; downsize model or quantization. |
 | **context-length exceeded** (`The number of tokens to keep from the initial prompt is greater than the context length…`) | LMStudio-loaded model's context window is smaller than Solrac's input. The system prompt is SOUL.md + capability note + 25 tool schemas (~6-10K tokens with the full integration set) + SOLRAC.md + history. A 4K-window model fails immediately; an 8K-window model fails as history grows. | N/A (Ollama returns HTTP 4xx with a different message; the same fix applies — load a bigger context.) | In LMStudio: open the loaded model's settings → bump **Context Length** to ≥16K (32K is the practical sweet spot — comfortable headroom with no perceptible decode-speed cost). Reload the model. KV cache memory scales linearly with context length, so don't pre-allocate 128K+ unless you genuinely need long-context turns. |
-| **empty response + `local.lmstudio_empty_stream` warn** | LMStudio yielded zero text and zero tool-call events before closing the stream. The diagnostic guard logged up to 30 raw `data:` payloads to identify the shape. | N/A (Ollama path doesn't hit this branch; NDJSON parse failures are surfaced as `local.ollama_bad_frame`.) | Pull the `frames` array out of the `local.lmstudio_empty_stream` warn and inspect: (1) frames containing `{"error":…}` — should have surfaced as `❌ error:` via the parser; if you see this, the error branch in `LmstudioSseFrame` regressed (see [Tricky seams §10](./ARCHITECTURE.md#10-lmstudio-inline-error-sse-frames-http-200)). (2) frames with an unrecognized shape — protocol drift; capture the payload, add a parse branch. (3) only `[DONE]` — the chat template silently dropped the prompt (some Gemma MLX repacks behave this way when `tools` is in the body); try a model with first-class OpenAI tool-calling (Qwen, Llama-3.1). |
+| **empty response + `lmstudio.empty_stream` warn** | LMStudio yielded zero text and zero tool-call events before closing the stream. The diagnostic guard logged up to 30 raw `data:` payloads to identify the shape. | N/A (Ollama path doesn't hit this branch; NDJSON parse failures are surfaced as `ollama.bad_frame`.) | Pull the `frames` array out of the `lmstudio.empty_stream` warn and inspect: (1) frames containing `{"error":…}` — should have surfaced as `❌ error:` via the parser; if you see this, the error branch in `LmstudioSseFrame` regressed (see [Tricky seams §10](./ARCHITECTURE.md#10-lmstudio-inline-error-sse-frames-http-200)). (2) frames with an unrecognized shape — protocol drift; capture the payload, add a parse branch. (3) only `[DONE]` — the chat template silently dropped the prompt (some Gemma MLX repacks behave this way when `tools` is in the body); try a model with first-class OpenAI tool-calling (Qwen, Llama-3.1). |
 | **disabled in this deployment** | Defensive ack — should be unreachable since boot validation throws on `defaultEngine=local && !localEnabled`. If you're seeing this, the boot threw a config error and the instance came up in a degraded state, OR you set `defaultEngine=primary/secondary` and somehow the parser still resolved to `local` (file a bug). | Set `LOCAL_ENABLED=true`, `LOCAL_BACKEND=ollama`, and `LOCAL_MODEL=` in `.env`, restart. See [SETUP.md](./SETUP.md#2-prerequisites-local-model-backend--model-recommended). | Same; set `LOCAL_BACKEND=lmstudio` instead. |
 
 The audit row also captures these:
diff --git a/docs/SETUP.md b/docs/SETUP.md
index ad18b45..20f9992 100644
--- a/docs/SETUP.md
+++ b/docs/SETUP.md
@@ -22,16 +22,22 @@ curl -fsSL https://bun.sh/install | bash
 bun --version   # should be ≥1.3.0
 ```
 
-## 2. Prerequisites: local-model backend + model (recommended)
+## 2. Prerequisites: engine-slot backend + model (recommended)
 
-The recommended Solrac config sets `SOLRAC_DEFAULT_ENGINE=local`, which makes a local-model backend a hard boot requirement. No-prefix Telegram messages route to the local engine for free; `@`/`!` reach Anthropic Sonnet/Opus.
+The recommended Solrac config sets `SOLRAC_DEFAULT_ENGINE=local` (the "engine slot"), which makes a backend a hard boot requirement. No-prefix Telegram messages route to the engine slot; `@`/`!` reach Anthropic Sonnet/Opus.
 
-Pick a backend via `LOCAL_BACKEND`:
+You have three paths — pick one:
+
+1. **Local on-host backend (§2 below — this section)** — Ollama or LMStudio running on the same machine. Free; needs a GPU or decent CPU.
+2. **Remote OpenRouter backend (§2-remote)** — hosted models via OpenRouter. Per-token cost; no on-host GPU required. Same runtime UX as local mode.
+3. **Claude-only deploy (§2-alt)** — no engine slot at all; every no-prefix message hits Anthropic Sonnet directly.
+
+This section walks through path 1. For OpenRouter, skip to §2-remote. For Claude-only, skip to §2-alt.
+
+For path 1, pick a backend via `LOCAL_BACKEND`:
 - **`ollama`** ([ollama.com](https://ollama.com)) — daemon + CLI; default URL `:11434`; NDJSON wire format.
 - **`lmstudio`** ([lmstudio.ai](https://lmstudio.ai)) — desktop app with a built-in server; default URL `:1234`; OpenAI-compatible SSE wire format.
 
-Don't want either? Skip to **§2-alt** for the Claude-only fallback.
-
 ### 2.1 Install your chosen backend
 
 **Ollama:**
@@ -91,11 +97,31 @@ If you can't run a local backend (no GPU/RAM, or air-gapped from local model hos
 ```sh
 SOLRAC_DEFAULT_ENGINE=primary    # no-prefix → Anthropic Sonnet
 LOCAL_ENABLED=false
+REMOTE_ENABLED=false
 LOCAL_TOOLS_ENABLED=false
 ```
 
 You'll lose the free local default path; every no-prefix message is an Anthropic call. `@` and `!` work as documented. The rest of this guide still applies.
 
+## 2-remote. Remote deploy via OpenRouter (skip if you completed §2 or §2-alt)
+
+If you can't run a local backend but want a non-Claude default engine, point the engine slot at OpenRouter. The runtime UX is identical to local mode (no-prefix routing, `/clear local` semantics, capability note) but per-token cost is captured into `audit.cost_usd` so the existing hourly caps gate burn.
+
+1. Get an OpenRouter API key at https://openrouter.ai/keys (the key prefix is typically `sk-or-`).
+2. Pick a model slug from https://openrouter.ai/models — e.g. `openai/gpt-4o-mini` (cheap chat), `anthropic/claude-3.5-sonnet` (parity with the `@` tier), or `meta-llama/llama-3.3-70b-instruct`.
+3. Add to your `.env`:
+
+```sh
+SOLRAC_DEFAULT_ENGINE=local       # the engine slot, served by OpenRouter
+LOCAL_ENABLED=false               # mutually exclusive with REMOTE_ENABLED
+REMOTE_ENABLED=true
+REMOTE_BACKEND=openrouter
+REMOTE_MODEL=openai/gpt-4o-mini
+REMOTE_API_KEY=sk-or-…
+```
+
+Boot logs the `remote (openrouter)` engine label and probes `GET /models` once with bearer auth — a bad API key surfaces as `auth_failed` at startup. The `@` (Sonnet) and `!` (Opus) prefixes still escalate to Claude tiers as in the local-mode deploy. The rest of this guide still applies.
+
 ## 3. Install Solrac
 
 ```sh
diff --git a/docs/USAGE.md b/docs/USAGE.md
index 6067b9c..f4aa682 100644
--- a/docs/USAGE.md
+++ b/docs/USAGE.md
@@ -54,7 +54,7 @@ The bot responds by editing a single thinking-stub message. The stub emoji tells
 |--------|------|
 | Primary Claude (Sonnet) | `🙂 thinking…` |
 | Secondary Claude (Opus) | `🤔 thinking…` |
-| Local (`ollama` / `lmstudio`) | `💻 thinking…` |
+| Engine slot (Ollama / LMStudio / OpenRouter) | `💻 thinking…` |
 
 You'll see it transition through:
 
@@ -69,12 +69,15 @@ The footer reports turn count and cost in USD.
 ## Engine routing (prefix table)
 
 The first non-whitespace character of your message picks the engine. The
-default routes to the local engine, so Anthropic burn happens only on a
-deliberate `@` or `!`; everything else stays local and free.
+default routes to the **engine slot** — on-host (Ollama / LMStudio) when
+`LOCAL_ENABLED=true`, or hosted (OpenRouter) when `REMOTE_ENABLED=true`.
+Anthropic burn happens only on a deliberate `@` or `!`; everything else
+stays on the engine slot.
 
 | Prefix | Engine | Default model | Use when |
 |--------|--------|---------------|----------|
-| (none) | **Default** (per `SOLRAC_DEFAULT_ENGINE`, ships as `local`) | `LOCAL_MODEL` (recommended `gemma4:e4b`); backend picked by `LOCAL_BACKEND` (`ollama` / `lmstudio`) | The free default. Local model handles casual chat + tool-driven work via integrations. |
+| (none), local mode | **Engine slot — local** (Ollama / LMStudio) | `LOCAL_MODEL` (recommended `gemma4:e4b`); backend picked by `LOCAL_BACKEND` | The free default. Local model handles casual chat + tool-driven work via integrations. |
+| (none), remote mode | **Engine slot — remote** (OpenRouter) | `REMOTE_MODEL` slug (e.g. `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`) | When the host can't run a local LLM. Per-token cost is captured into `audit.cost_usd` so the hourly cap gates burn. |
 | `@` | Primary Claude — escalate | `SOLRAC_PRIMARY_MODEL` (default `claude-sonnet-4-6`) | When the task needs Sonnet-level reasoning, file ops, or the SDK's preset tools. Costs $$$. |
 | `!` | Secondary Claude — heaviest | `SOLRAC_SECONDARY_MODEL` (default `claude-opus-4-7`) | When Sonnet isn't enough. Costs $$$$. Mnemonic: `!` = "important / hardest". |
 
@@ -103,17 +106,17 @@ double it (`!!literal` produces `!literal`).
 
 Reach for `@` (Sonnet) when:
 - The task needs structured tool use beyond the operator's integrations (file edits, web fetches, complex shell).
-- You want strong code reasoning or multi-step planning the local model can't sustain.
-- The conversation needs a long context window the local model truncates.
+- You want strong code reasoning or multi-step planning the engine-slot model can't sustain.
+- The conversation needs a long context window the engine-slot model truncates.
 
 Reach for `!` (Opus) when:
 - `@` already responded but missed the nuance.
 - You're doing architecture review, hard math, or anything where extra cost is justified by extra correctness.
 
-Stay on the default (local engine) when:
+Stay on the default (engine slot) when:
 - The question is casual / one-shot / self-contained.
-- The operator has integrations the local model can call (`LOCAL_TOOLS_ENABLED=true`).
-- You want zero Anthropic burn.
+- The operator has integrations the engine-slot model can call (`LOCAL_TOOLS_ENABLED=true`).
+- You want zero Anthropic burn (local mode is free; remote mode is per-token via OpenRouter — still typically cheaper than Claude for casual chat).
 
 Both Claude tiers run through the same SDK preset (`claude_code`), the same
 tools, the same `canUseTool` policy, and the same ``
@@ -122,29 +125,39 @@ prompt caching survives across same-tier turns.
 
 ### Default engine details
 
-The default-engine identity is server-resolved from `SOLRAC_DEFAULT_ENGINE`:
+The default-engine identity is server-resolved from `SOLRAC_DEFAULT_ENGINE`. When `SOLRAC_DEFAULT_ENGINE=local`, the engine slot is wired by exactly one of `LOCAL_ENABLED=true` (Ollama/LMStudio) or `REMOTE_ENABLED=true` (OpenRouter) — mutually exclusive at boot.
 
 | `SOLRAC_DEFAULT_ENGINE` | What no-prefix routes to | Capability note tone |
 |---|---|---|
-| `local` (default) | Local engine (`LOCAL_MODEL` on `LOCAL_BACKEND`) | "you are the default chat engine; tools when `LOCAL_TOOLS_ENABLED=true`; escalate via `@` / `!`" |
+| `local` + `LOCAL_ENABLED=true` | Engine slot → on-host (`LOCAL_MODEL` on `LOCAL_BACKEND`) | "you are the default chat engine; cost the operator nothing; tools when `LOCAL_TOOLS_ENABLED=true`; escalate via `@` / `!`" |
+| `local` + `REMOTE_ENABLED=true` | Engine slot → OpenRouter (`REMOTE_MODEL`) | "you are the default chat engine; cost the operator per-token via OpenRouter, so be concise; tools when `LOCAL_TOOLS_ENABLED=true`; escalate via `@` / `!`" |
 | `primary` | Anthropic Sonnet | Same as `@` Sonnet (Claude-only deploys) |
 | `secondary` | Anthropic Opus | Same as `!` Opus (Claude-only deploys) |
 
-**Default-local details:**
+**Engine-slot details (local mode):**
 - **Free** — `cost_usd = 0`; the per-chat and global cost caps don't apply.
 - **Footer** — `✅ local:ollama:gemma4:e4b · 1.2s` (or `· N tools · 1.2s` when tools fired). On LMStudio: `local:lmstudio:`.
 - **Tools** — when `LOCAL_TOOLS_ENABLED=true` and integrations are loaded, the local model can call `mcp__solrac__*` tools the same way Claude does.
 - **Cross-engine context** — sees prior Claude turns (both tiers).
 
-**Default-local failure modes:**
+**Engine-slot details (remote mode):**
+- **Per-token billed** — `cost_usd` is captured from OpenRouter's streaming `usage.cost` chunk into the audit row. The existing `HOURLY_COST_CAP_USD` + `GLOBAL_HOURLY_COST_CAP_USD` ceilings gate this burn alongside any Claude tier spend.
+- **Footer** — `✅ remote:openrouter:openai/gpt-4o-mini · 1.2s · $0.0042`. The mode prefix (`remote:`) matches the audit-row tag so log-grepping and chat-footer comparison are symmetric. The trailing `$X.XXXX` chip is the per-turn OpenRouter cost; it appears only in remote mode and only when the driver reported a number (mirroring the audit-write decision). If OpenRouter ever omits `usage.cost`, the chip is silently dropped and `remote.cost_missing` logs at warn — UI stays clean, the gap shows up in the operator's log feed.
+- **Tools** — `LOCAL_TOOLS_ENABLED=true` works the same way (the runner is mode-agnostic; tool-loop summation correctly accumulates per-round cost from OpenRouter).
+- **Cross-engine context** — sees prior Claude turns AND prior local turns (same audit-table query path).
+
+**Engine-slot failure modes:**
 
 | Condition | What you see |
 |-----------|--------------|
 | `@` / `!` alone with no payload | `usage: @ — sends to primary Claude (model: )` |
-| Backend not running | `❌ local unreachable: ` (boot also logs `local.boot_health_failed`) |
-| Model not pulled / loaded on the host | `❌ local model not found:  — pull with 'ollama pull ' (Ollama) or load via the LMStudio UI / 'lms load '` |
+| **Local mode:** backend not running | `❌ local unreachable: ` (boot also logs `engine.boot_health_failed`) |
+| **Local mode:** model not pulled / loaded on the host | `❌ local model not found:  — pull with 'ollama pull ' (Ollama) or load via the LMStudio UI / 'lms load '` |
+| **Remote mode:** bad API key | `❌ auth failed (HTTP 401) — check REMOTE_API_KEY` (boot also logs `engine.boot_health_failed` with `auth_failed`) |
+| **Remote mode:** model slug unknown / unavailable | `❌ model not available on OpenRouter: ` |
+| **Remote mode:** OpenRouter didn't return a cost (defensive) | turn renders normally; `audit.cost_usd` is `NULL` (not 0); `remote.cost_missing` warn fires in logs |
 | Tool loop didn't converge | `⚠️ stopped after N tool iterations` |
-| Inference exceeds `LOCAL_TIMEOUT_MS` | `❌ local timed out after 60s` |
+| Inference exceeds `LOCAL_TIMEOUT_MS` / `REMOTE_TIMEOUT_MS` | `❌ local timed out after 60s` |
 
 See [CONFIG.md](./CONFIG.md) for the full env list.
 
@@ -186,7 +199,7 @@ Examples:
 :context            → same as /context (alternate prefix)
 ```
 
-`/clear local` semantics differ from the Claude tiers because the local engine is stateless — there's no SDK session id to drop. Instead, the dispatcher writes `Date.now()` to `sessions.local_cutoff_ms` for this chat. Subsequent `recentChatTurns` lookups (the local engine's history reconstruction) and `outOfBandForEngine` lookups (Claude's cross-engine bridge) filter out local-engine rows with `started_at <= cutoff`. The audit log itself is untouched — operator queries against `audit` still show every turn. The cutoff is per-chat and survives restarts. A back-to-back `/clear local` with no intervening turn reports "Already clean" (the cutoff is already past every existing row).
+`/clear local` semantics differ from the Claude tiers because the engine slot (Ollama / LMStudio / OpenRouter) is stateless from the SDK's perspective — there's no session id to drop. Instead, the dispatcher writes `Date.now()` to `sessions.local_cutoff_ms` for this chat. Subsequent `recentChatTurns` lookups (the engine slot's history reconstruction) and `outOfBandForEngine` lookups (Claude's cross-engine bridge) filter out engine-slot rows with `started_at <= cutoff`. The triple-pattern LIKE clause matches `local:%` (Ollama, LMStudio), `ollama:%` (legacy, pre-v0.7.0), and `remote:%` (OpenRouter) — so `/clear local` is correct for any engine-slot mode. The audit log itself is untouched — operator queries against `audit` still show every turn. The cutoff is per-chat and survives restarts. A back-to-back `/clear local` with no intervening turn reports "Already clean" (the cutoff is already past every existing row).
 
 ### `/compact` semantics
 
@@ -263,7 +276,7 @@ HTML comments inside `SOLRAC.md` (``) are stripped before the file s
 
 ### Tier independence
 
-Both files apply to **all** engines: the default (local unless overridden), primary Claude (`@`, Sonnet), and secondary Claude (`!`, Opus). The only engine-specific text is a single capability sentence Solrac appends in code (the §3c matrix in `agent.ts::buildClaudeCapabilityNote` and `local.ts::buildLocalCapabilityNote`), so your `SOUL.md` doesn't need conditional sections.
+Both files apply to **all** engines: the default (engine slot unless overridden), primary Claude (`@`, Sonnet), and secondary Claude (`!`, Opus). The only engine-specific text is a single capability sentence Solrac appends in code (the §3c matrix in `agent.ts::buildClaudeCapabilityNote`, `local-driver.ts::buildLocalCapabilityNote`, and `remote-driver.ts::buildRemoteCapabilityNote`), so your `SOUL.md` doesn't need conditional sections.
 
 ### Re-read cadence (`SOLRAC.md`)
 
@@ -583,7 +596,7 @@ See `examples/tasks/` for two ready-to-edit samples.
 
 An **integration** is a TypeScript module under `$SOLRAC_INTEGRATIONS_DIR//index.ts` (or, for shipped reference integrations, `src/integrations-builtin//index.ts`) that adds new tools to the agent without touching solrac's source. Each module default-exports `setup(ctx)` and returns `{ apiVersion, tools, meta }`. Tools surface to the model as `mcp__solrac__`.
 
-> **Engine reach.** Integrations are reachable from both Claude tiers (`@`, `!`) and the local-engine default — the latter when `LOCAL_TOOLS_ENABLED=true` (precondition: `SOLRAC_INTEGRATIONS_ENABLED=true`). With local-engine tools-on, the local model gets the same `mcp__solrac__*` tool surface; `local.ts::buildLocalCapabilityNote` advertises the loaded tool names so the model knows what it can call. With `LOCAL_TOOLS_ENABLED=false`, the local engine falls back to single-shot inference and the capability note tells it to redirect tool-shaped requests to `@`/`!`. Reliability still varies by local model — `gemma4:e4b` (on Ollama) is the recommended baseline.
+> **Engine reach.** Integrations are reachable from both Claude tiers (`@`, `!`) and the engine-slot default — the latter when `LOCAL_TOOLS_ENABLED=true` (precondition: `SOLRAC_INTEGRATIONS_ENABLED=true`). With engine-slot tools-on, the slot's model gets the same `mcp__solrac__*` tool surface; the capability note (`local-driver.ts::buildLocalCapabilityNote` or `remote-driver.ts::buildRemoteCapabilityNote`, selected by `driver.mode`) advertises the loaded tool names so the model knows what it can call. With `LOCAL_TOOLS_ENABLED=false`, the engine slot falls back to single-shot inference and the capability note tells it to redirect tool-shaped requests to `@`/`!`. Reliability still varies by local model — `gemma4:e4b` (on Ollama) is the recommended baseline; OpenRouter-hosted frontier models handle tools cleanly.
 
 ### Shipping model
 
diff --git a/src/agent.ts b/src/agent.ts
index a57d489..e2df33e 100644
--- a/src/agent.ts
+++ b/src/agent.ts
@@ -601,6 +601,13 @@ export function sanitizedSubprocessEnv(): Record {
     // network topology (e.g. http://lmstudio.internal:1234) via an
     // auto-allowed Bash(echo $LOCAL_URL).
     if (key.startsWith("LOCAL_")) continue;
+    // REMOTE_* mirrors the LOCAL_* rationale for the OpenRouter path.
+    // REMOTE_API_KEY in particular is a billed credential — exfiltration via
+    // Bash(echo $REMOTE_API_KEY) would let a compromised model burn the
+    // operator's OpenRouter balance. The whole prefix is scrubbed so any
+    // future REMOTE_* secret (additional providers, BYO keys, etc.) is
+    // covered without needing to revisit this list.
+    if (key.startsWith("REMOTE_")) continue;
     if (key === "STATS_BEARER_TOKEN") continue;
     if (key === "ALLOWLIST_BOOTSTRAP") continue;
     if (key === "NOTION_API_KEY") continue;
diff --git a/src/commands.ts b/src/commands.ts
index 1a319f9..4b4fd3d 100644
--- a/src/commands.ts
+++ b/src/commands.ts
@@ -76,13 +76,14 @@ import type { ChatHistoryRow, SolracDb } from "./db.ts";
 import type { IntegrationTier } from "./integrations.ts";
 import { log } from "./log.ts";
 import { mdToTelegramHtml } from "./markdown.ts";
-import { buildToolCapabilityNote } from "./local.ts";
 import {
-  type LocalChatMessage,
-  type LocalDriver,
-  LocalDriverError,
-} from "./local-driver.ts";
-import { mcpToLocalTools, runToolLoop } from "./local-tools.ts";
+  type EngineChatMessage,
+  type EngineDriver,
+  EngineDriverError,
+} from "./engine-driver.ts";
+import { buildLocalToolCapabilityNote } from "./local-driver.ts";
+import { buildRemoteToolCapabilityNote } from "./remote-driver.ts";
+import { mcpToEngineTools, runToolLoop } from "./engine-tools.ts";
 import {
   createLoopDetector,
   createPostToolUseHook,
@@ -340,15 +341,15 @@ export function parseCommand(text: string, deps: ParseCommandDeps): ParseCommand
 // Dispatcher
 // ---------------------------------------------------------------------------
 
-// Subset of LocalRunDeps the skill path needs. Skills don't reuse runLocalTurn
+// Subset of EngineRunDeps the skill path needs. Skills don't reuse runEngineTurn
 // because they don't carry history or SOLRAC.md overlays and have no streaming
 // stub — but they DO route through the same tool loop (`runToolLoop`) when
 // tool deps are wired, so the skill body can call `mcp__solrac__*` / `skills__*`
 // tools end-to-end. When tool deps are absent or `tools` is empty, `runSkillBare`
 // falls through to a single-shot driver call (preserving back-compat for pure
 // text-transform skills like `tldr`).
-export interface LocalSkillDeps {
-  driver: LocalDriver;
+export interface EngineSkillDeps {
+  driver: EngineDriver;
   model: string;
   timeoutMs: number;
   // SOUL.md text loaded once at boot. Sent as the system message so local
@@ -391,12 +392,17 @@ export interface RunCommandDeps {
   // `null` when the local engine isn't configured — a `tier: local` skill in
   // that case fails loud with a config error rather than silently routing to
   // Claude.
-  localSkillDeps: LocalSkillDeps | null;
-  // `/help` renders the engine section dynamically from these two fields so
+  localSkillDeps: EngineSkillDeps | null;
+  // `/help` renders the engine section dynamically from these fields so
   // the card matches the deploy. Static text would lie in three of four
-  // config combinations (default-local vs default-Claude × tools on/off).
+  // config combinations (default-local vs default-Claude × tools on/off ×
+  // local-mode vs remote-mode).
   defaultEngine: "local" | "primary" | "secondary";
   localToolsEnabled: boolean;
+  // Engine-slot mode. `null` when the slot is disabled (Claude-only deploy);
+  // `"local"` for on-host Ollama/LMStudio; `"remote"` for OpenRouter. Drives
+  // the help card's cost framing for the no-prefix path.
+  engineSlotMode?: "local" | "remote" | null;
   // Phase 2 — scheduled tasks operator surface. Both optional so deploys
   // with `SOLRAC_TASKS_ENABLED=false` can build the deps object without
   // dummy values; `/tasks` surfaces a "scheduler disabled" reply when the
@@ -1126,6 +1132,7 @@ async function runHelp(
   updateId: number,
 ): Promise {
   const md = renderHelpMarkdown(deps.skillRegistry, {
+    engineSlotMode: deps.engineSlotMode,
     defaultEngine: deps.defaultEngine,
     localToolsEnabled: deps.localToolsEnabled,
   });
@@ -1136,19 +1143,26 @@ async function runHelp(
   writeSystemAudit(deps, msg, updateId, "help_shown", "ok");
 }
 
-// Engine section reads `defaultEngine` + `localToolsEnabled` and renders
-// one of the matrix-shaped descriptions. Static text would lie in three
-// of four deploys (default-Claude vs default-local, tools on/off); the
-// dynamic render is one config-read per `/help` call which is free.
+// Engine section reads `defaultEngine` + `localToolsEnabled` + engine-slot
+// mode and renders one of the matrix-shaped descriptions. Static text would
+// lie in many of these deploys (default-Claude vs default-engine-slot,
+// local vs remote mode, tools on/off); the dynamic render is one config-read
+// per `/help` call which is free.
 function renderEngineSection(opts: {
   defaultEngine: "local" | "primary" | "secondary";
   localToolsEnabled: boolean;
+  // null when the engine slot is disabled (no on-host LLM, no OpenRouter).
+  // `"local"` = Ollama/LMStudio on-host; `"remote"` = OpenRouter. The cost
+  // framing in the description differs — local is free, remote is billed.
+  engineSlotMode?: "local" | "remote" | null;
 }): string[] {
   const lines: string[] = ["**Engines** (first character of your message):", ""];
   if (opts.defaultEngine === "local") {
+    const mode = opts.engineSlotMode ?? "local";
+    const costFraming = mode === "remote" ? "per-token via OpenRouter" : "free";
     const localDesc = opts.localToolsEnabled
-      ? "local engine (free, with operator-authored tools)"
-      : "local engine (free, no tools)";
+      ? `${mode} engine (${costFraming}, with operator-authored tools)`
+      : `${mode} engine (${costFraming}, no tools)`;
     lines.push(`- plain text → ${localDesc} *(default)*`);
     lines.push("- `@` → primary Claude (Sonnet) — heavier reasoning");
     lines.push("- `!` → secondary Claude (Opus) — heaviest reasoning, costs more");
@@ -1191,6 +1205,7 @@ export function renderHelpMarkdown(
   opts: {
     defaultEngine: "local" | "primary" | "secondary";
     localToolsEnabled: boolean;
+    engineSlotMode?: "local" | "remote" | null;
   },
 ): string {
   const head = ["## 🤖 Solrac help", "", ...renderEngineSection(opts), "", HELP_COMMANDS_MD];
@@ -1523,7 +1538,7 @@ function writeSkillAudit(
 // (`runLocalSkill`) and the tool-call path (`skill-tools.ts::dispatch`) wrap
 // this with their own audit + reply / return-string handling.
 //
-// **RECURSION SAFETY INVARIANT** — when `LocalSkillDeps` is wired with
+// **RECURSION SAFETY INVARIANT** — when `EngineSkillDeps` is wired with
 // `tools/toolTiers/broker`, the skill body sees the full MCP catalog MINUS
 // its own `skills__` entry (recursion guard). The regression test in
 // `skill-tools.test.ts` asserts that filter — keep both in sync.
@@ -1539,13 +1554,13 @@ export interface RunSkillBareResult {
 }
 
 export async function runSkillBare(
-  local: LocalSkillDeps,
+  local: EngineSkillDeps,
   skill: Skill,
   args: string,
 ): Promise {
   // Tool surface wired → route through the tool loop so the body can call
   // `mcp__solrac__*` / `skills__*` exactly like a regular local turn.
-  // Mirrors the same gate in `runLocalTurn`.
+  // Mirrors the same gate in `runEngineTurn`.
   if (
     local.tools !== undefined &&
     local.tools.length > 0 &&
@@ -1556,7 +1571,7 @@ export async function runSkillBare(
   }
 
   const prompt = renderSkillTemplate(skill.body, args);
-  const messages: LocalChatMessage[] = [
+  const messages: EngineChatMessage[] = [
     { role: "system", content: local.soul },
     { role: "user", content: prompt },
   ];
@@ -1585,7 +1600,7 @@ export async function runSkillBare(
       }
     }
   } catch (err) {
-    if (err instanceof LocalDriverError) {
+    if (err instanceof EngineDriverError) {
       errorMessage = err.message;
     } else {
       errorMessage = `local unexpected error: ${(err as Error).message}`;
@@ -1616,7 +1631,7 @@ export async function runSkillBare(
 // runSkillBareWithTools — PR-skills-tools tool-loop path
 // ---------------------------------------------------------------------------
 //
-// Mirrors `runLocalTurnWithTools` (local.ts) but skill-shaped:
+// Mirrors `runEngineTurnWithTools` (engine.ts) but skill-shaped:
 //   - No history, no SOLRAC.md overlay, no streaming UX (skills already cap
 //     their reply by template; live rendering would muddy the operator's
 //     intent baked into the skill body).
@@ -1630,7 +1645,7 @@ export async function runSkillBare(
 // agent-driven invocations) is responsible for wrapping this in
 // `skillToolCtx.run(...)` so any nested `skills__*` calls have ALS context.
 async function runSkillBareWithTools(
-  local: LocalSkillDeps,
+  local: EngineSkillDeps,
   skill: Skill,
   args: string,
 ): Promise {
@@ -1659,16 +1674,21 @@ async function runSkillBareWithTools(
   const selfToolName = `${SKILL_TOOL_PREFIX}${skill.name}`;
   const filteredTools = allTools.filter((t) => t.name !== selfToolName);
   const toolMap = new Map(filteredTools.map((t) => [t.name, t]));
-  const toolDefs = mcpToLocalTools(filteredTools);
+  const toolDefs = mcpToEngineTools(filteredTools);
   const toolNames = filteredTools.map((t) => t.name);
 
   const prompt = renderSkillTemplate(skill.body, args);
   // Skills are tier-stable (`tier: local` for tool-callable skills, per
   // skills.ts). Build the capability note as the default-engine variant —
-  // accurate when the skill body runs on the deploy's main local model.
-  const capabilityNote = buildToolCapabilityNote(toolNames, true);
-
-  const initialMessages: LocalChatMessage[] = [
+  // accurate when the skill body runs on the deploy's main engine slot.
+  // Dispatch on `driver.mode` so remote-backed engine slots get per-token
+  // cost framing instead of the "free" framing.
+  const capabilityNote =
+    local.driver.mode === "remote"
+      ? buildRemoteToolCapabilityNote(toolNames, true)
+      : buildLocalToolCapabilityNote(toolNames, true);
+
+  const initialMessages: EngineChatMessage[] = [
     { role: "system", content: `${local.soul}\n\n${capabilityNote}` },
     { role: "user", content: prompt },
   ];
diff --git a/src/config.test.ts b/src/config.test.ts
index 4016664..834f7f7 100644
--- a/src/config.test.ts
+++ b/src/config.test.ts
@@ -486,3 +486,155 @@ describe("loadConfig — solracHome resolution", () => {
     expect(cfg.dataDir).toBe(`${TEST_HOME}/mydata`);
   });
 });
+
+describe("loadConfig — REMOTE_* (OpenRouter)", () => {
+  test("default: remoteEnabled=false, no provider, defaults populated", () => {
+    const cfg = loadConfig({ ...baseEnv });
+    expect(cfg.remoteEnabled).toBe(false);
+    expect(cfg.remoteBackend).toBe(null);
+    expect(cfg.remoteModel).toBe(null);
+    expect(cfg.remoteApiKey).toBe(null);
+    // The base URL is empty when remote is disabled (no backend selected to
+    // resolve a default for); validation only runs on enabled paths.
+    expect(cfg.remoteBaseUrl).toBe("");
+    expect(cfg.remoteHttpReferer).toBe("https://github.com/cjus/solrac");
+    expect(cfg.remoteXTitle).toBe("solrac");
+  });
+
+  test("REMOTE_ENABLED=true requires REMOTE_BACKEND", () => {
+    expect(() =>
+      loadConfig({ ...baseEnv, REMOTE_ENABLED: "true" }),
+    ).toThrow(/REMOTE_BACKEND is required/);
+  });
+
+  test("REMOTE_ENABLED=true requires REMOTE_MODEL", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        REMOTE_ENABLED: "true",
+        REMOTE_BACKEND: "openrouter",
+        REMOTE_API_KEY: "sk-or-test",
+      }),
+    ).toThrow(/REMOTE_MODEL is required/);
+  });
+
+  test("REMOTE_ENABLED=true requires REMOTE_API_KEY", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        REMOTE_ENABLED: "true",
+        REMOTE_BACKEND: "openrouter",
+        REMOTE_MODEL: "anthropic/claude-3.5-sonnet",
+      }),
+    ).toThrow(/REMOTE_API_KEY is required/);
+  });
+
+  test("invalid REMOTE_BACKEND value throws", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        REMOTE_ENABLED: "true",
+        REMOTE_BACKEND: "totally-real-provider",
+      }),
+    ).toThrow(/REMOTE_BACKEND must be "openrouter"/);
+  });
+
+  test("full REMOTE_* config + SOLRAC_DEFAULT_ENGINE=local passes", () => {
+    const cfg = loadConfig({
+      ...baseEnv,
+      SOLRAC_DEFAULT_ENGINE: "local",
+      REMOTE_ENABLED: "true",
+      REMOTE_BACKEND: "openrouter",
+      REMOTE_MODEL: "anthropic/claude-3.5-sonnet",
+      REMOTE_API_KEY: "sk-or-test",
+    });
+    expect(cfg.remoteEnabled).toBe(true);
+    expect(cfg.remoteBackend).toBe("openrouter");
+    expect(cfg.remoteModel).toBe("anthropic/claude-3.5-sonnet");
+    expect(cfg.remoteApiKey).toBe("sk-or-test");
+    expect(cfg.remoteBaseUrl).toBe("https://openrouter.ai/api/v1");
+    expect(cfg.defaultEngine).toBe("local");
+  });
+
+  test("REMOTE_BASE_URL override strips trailing slash", () => {
+    const cfg = loadConfig({
+      ...baseEnv,
+      REMOTE_ENABLED: "true",
+      REMOTE_BACKEND: "openrouter",
+      REMOTE_MODEL: "x/y",
+      REMOTE_API_KEY: "k",
+      REMOTE_BASE_URL: "https://proxy.example.com/api/v1/",
+    });
+    expect(cfg.remoteBaseUrl).toBe("https://proxy.example.com/api/v1");
+  });
+
+  test("malformed REMOTE_BASE_URL throws at boot", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        REMOTE_ENABLED: "true",
+        REMOTE_BACKEND: "openrouter",
+        REMOTE_MODEL: "x/y",
+        REMOTE_API_KEY: "k",
+        REMOTE_BASE_URL: "openrouter.ai/api/v1",
+      }),
+    ).toThrow(/REMOTE_BASE_URL/);
+  });
+
+  test("LOCAL_ENABLED + REMOTE_ENABLED both true is mutually exclusive", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        LOCAL_ENABLED: "true",
+        LOCAL_BACKEND: "ollama",
+        LOCAL_MODEL: "gemma3",
+        REMOTE_ENABLED: "true",
+        REMOTE_BACKEND: "openrouter",
+        REMOTE_MODEL: "anthropic/claude-3.5-sonnet",
+        REMOTE_API_KEY: "k",
+      }),
+    ).toThrow(/mutually exclusive/);
+  });
+
+  test("SOLRAC_DEFAULT_ENGINE=local works with REMOTE_ENABLED=true (no on-host LLM)", () => {
+    // The user's stated framing: a host that can't run a local LLM still
+    // gets a default-engine option via OpenRouter.
+    const cfg = loadConfig({
+      ...baseEnv,
+      SOLRAC_DEFAULT_ENGINE: "local",
+      REMOTE_ENABLED: "true",
+      REMOTE_BACKEND: "openrouter",
+      REMOTE_MODEL: "openai/gpt-4o-mini",
+      REMOTE_API_KEY: "k",
+    });
+    expect(cfg.defaultEngine).toBe("local");
+    expect(cfg.localEnabled).toBe(false);
+    expect(cfg.remoteEnabled).toBe(true);
+  });
+
+  test("SOLRAC_DEFAULT_ENGINE=local with neither LOCAL nor REMOTE enabled throws", () => {
+    expect(() =>
+      loadConfig({
+        ...baseEnv,
+        SOLRAC_DEFAULT_ENGINE: "local",
+      }),
+    ).toThrow(/requires LOCAL_ENABLED=true.*or REMOTE_ENABLED=true/s);
+  });
+
+  test("REMOTE_HTTP_REFERER + REMOTE_X_TITLE overrides are accepted", () => {
+    const cfg = loadConfig({
+      ...baseEnv,
+      REMOTE_HTTP_REFERER: "https://my-fork.example",
+      REMOTE_X_TITLE: "my-solrac-fork",
+    });
+    expect(cfg.remoteHttpReferer).toBe("https://my-fork.example");
+    expect(cfg.remoteXTitle).toBe("my-solrac-fork");
+  });
+
+  test("REMOTE_TIMEOUT_MS / REMOTE_HISTORY_LIMIT / REMOTE_MAX_TOOL_ITERATIONS defaults", () => {
+    const cfg = loadConfig({ ...baseEnv });
+    expect(cfg.remoteTimeoutMs).toBe(60_000);
+    expect(cfg.remoteHistoryLimit).toBe(6);
+    expect(cfg.remoteMaxToolIterations).toBe(8);
+  });
+});
diff --git a/src/config.ts b/src/config.ts
index 720ca64..7bf8388 100644
--- a/src/config.ts
+++ b/src/config.ts
@@ -62,6 +62,13 @@ export type DefaultEngine = "local" | "primary" | "secondary";
 // (driver factory, UI label) only runs in the enabled path.
 export type LocalBackend = "ollama" | "lmstudio";
 
+// Backend driver behind the `remote` engine slot. Mutually exclusive with
+// `LocalBackend` at the operator-config level — only one of LOCAL_ENABLED /
+// REMOTE_ENABLED may be true per boot. Today `openrouter` is the only value;
+// the type is kept open for future remote providers (vLLM/Anyscale/Together/
+// Groq), which all share the OpenAI-compatible wire shape.
+export type RemoteBackend = "openrouter";
+
 // Cap on prompt text persisted to the audit table. A single user can flood
 // strings of arbitrary length; truncating before insert bounds per-row size.
 // Not env-tunable in v1 — the value is a defensive constant, not policy.
@@ -128,6 +135,34 @@ export interface Config {
   // infinite-loop bug too much rope. Loop detector bites earlier on duplicate
   // calls.
   readonly localMaxToolIterations: number;
+  // Remote-engine routing — mutually exclusive with `localEnabled`. When true,
+  // the `local` Engine slot dispatches to a hosted provider (OpenRouter) via
+  // the same runner + history + cost-cap path that backs Ollama/LMStudio.
+  // Boot validation rejects `localEnabled && remoteEnabled`. The runner
+  // distinguishes the two via a `mode: "local" | "remote"` flag — the audit
+  // tag becomes `remote::` and per-turn cost is captured from
+  // the backend's usage chunk so the existing per-chat + global hourly caps
+  // gate it without additional plumbing.
+  readonly remoteEnabled: boolean;
+  // Provider — `null` when remote is disabled.
+  readonly remoteBackend: RemoteBackend | null;
+  // OpenRouter's base URL already contains `/api/v1`; trailing slash stripped
+  // by the parser so endpoint composition (`${url}/chat/completions`) is
+  // unambiguous.
+  readonly remoteBaseUrl: string;
+  readonly remoteModel: string | null;
+  // `null` when remote is disabled. Subprocess scrub (`agent.ts::
+  // sanitizedSubprocessEnv`) removes `REMOTE_*` keys from the Claude SDK's
+  // env so the per-token API key never leaks to the SDK subprocess.
+  readonly remoteApiKey: string | null;
+  readonly remoteTimeoutMs: number;
+  readonly remoteHistoryLimit: number;
+  readonly remoteMaxToolIterations: number;
+  // OpenRouter attribution headers. Defaults send the public solrac repo URL
+  // and "solrac" title so usage shows up correctly on OpenRouter's per-model
+  // leaderboard; operators with branded forks override.
+  readonly remoteHttpReferer: string;
+  readonly remoteXTitle: string;
   // PNX-167.1 — operator-defined skills loaded from the filesystem at boot.
   // `skillsEnabled` is the master switch; `skillsDir` is resolved from cwd
   // so the same Solrac binary can ship to multiple operators each with their
@@ -238,6 +273,13 @@ function parseLocalBackend(raw: string | undefined): LocalBackend | null {
   throw new Error(`LOCAL_BACKEND must be "ollama" or "lmstudio", got "${raw}"`);
 }
 
+function parseRemoteBackend(raw: string | undefined): RemoteBackend | null {
+  if (raw === undefined || raw.trim() === "") return null;
+  const v = raw.trim().toLowerCase();
+  if (v === "openrouter") return v;
+  throw new Error(`REMOTE_BACKEND must be "openrouter", got "${raw}"`);
+}
+
 /**
  * Resolve `SOLRAC_HOME` to an absolute path. Order:
  *
@@ -404,25 +446,111 @@ export function loadConfig(env: NodeJS.ProcessEnv = process.env): Config {
     );
   }
 
+  // Remote-engine parsing — read alongside local so the cross-mode validations
+  // (mutex, default-engine requires one) have both values in scope. The
+  // `REMOTE_*` namespace is provider-neutral; today only OpenRouter is
+  // supported behind `REMOTE_BACKEND=openrouter`.
+  const remoteEnabled = parseBoolean("REMOTE_ENABLED", env.REMOTE_ENABLED, false);
+  const remoteBackend = parseRemoteBackend(env.REMOTE_BACKEND);
+  if (remoteEnabled && remoteBackend === null) {
+    throw new Error(
+      'REMOTE_BACKEND is required when REMOTE_ENABLED=true (set to "openrouter")',
+    );
+  }
+  const remoteModel =
+    env.REMOTE_MODEL && env.REMOTE_MODEL.trim() !== "" ? env.REMOTE_MODEL.trim() : null;
+  if (remoteEnabled && !remoteModel) {
+    throw new Error(
+      "REMOTE_MODEL is required when REMOTE_ENABLED=true " +
+        "(OpenRouter slug, e.g. anthropic/claude-3.5-sonnet, openai/gpt-4o-mini)",
+    );
+  }
+  const remoteApiKey =
+    env.REMOTE_API_KEY && env.REMOTE_API_KEY.trim() !== "" ? env.REMOTE_API_KEY.trim() : null;
+  if (remoteEnabled && !remoteApiKey) {
+    throw new Error(
+      "REMOTE_API_KEY is required when REMOTE_ENABLED=true " +
+        "(get one at https://openrouter.ai/keys)",
+    );
+  }
+  // Backend-aware base-URL default. OpenRouter's `/api/v1` is the canonical
+  // root; operator-set `REMOTE_BASE_URL` always wins (e.g. for an internal
+  // proxy or a staging environment).
+  const remoteBaseUrlDefault =
+    remoteBackend === "openrouter" ? "https://openrouter.ai/api/v1" : "";
+  const remoteBaseUrl =
+    env.REMOTE_BASE_URL && env.REMOTE_BASE_URL.trim() !== ""
+      ? env.REMOTE_BASE_URL.trim().replace(/\/$/, "")
+      : remoteBaseUrlDefault;
+  if (remoteEnabled) {
+    let remoteProtocol: string;
+    try {
+      remoteProtocol = new URL(remoteBaseUrl).protocol;
+    } catch {
+      throw new Error(`REMOTE_BASE_URL is not a valid URL: "${remoteBaseUrl}"`);
+    }
+    if (remoteProtocol !== "http:" && remoteProtocol !== "https:") {
+      throw new Error(
+        `REMOTE_BASE_URL must use http:// or https://, got "${remoteProtocol}//" in "${remoteBaseUrl}"`,
+      );
+    }
+  }
+  // Mutual exclusion — the operator must pick exactly one BYO model source.
+  // Boot-time enforcement is structural: the `local` engine slot has one
+  // driver per boot. Allowing both would force a per-message backend-pick
+  // surface (a new prefix? a config key?) that this PR explicitly avoided.
+  if (localEnabled && remoteEnabled) {
+    throw new Error(
+      "LOCAL_ENABLED and REMOTE_ENABLED are mutually exclusive — set exactly one. " +
+        "Local mode runs an on-host LLM (Ollama/LMStudio); " +
+        "remote mode dispatches to OpenRouter.",
+    );
+  }
+  const remoteTimeoutDefault = localToolsEnabled ? 120_000 : 60_000;
+  const remoteTimeoutMs = parsePositiveInt(
+    "REMOTE_TIMEOUT_MS",
+    env.REMOTE_TIMEOUT_MS,
+    remoteTimeoutDefault,
+  );
+  const remoteHistoryLimit = parsePositiveInt(
+    "REMOTE_HISTORY_LIMIT",
+    env.REMOTE_HISTORY_LIMIT,
+    6,
+  );
+  const remoteMaxToolIterations = parsePositiveInt(
+    "REMOTE_MAX_TOOL_ITERATIONS",
+    env.REMOTE_MAX_TOOL_ITERATIONS,
+    8,
+  );
+  const remoteHttpReferer =
+    env.REMOTE_HTTP_REFERER && env.REMOTE_HTTP_REFERER.trim() !== ""
+      ? env.REMOTE_HTTP_REFERER.trim()
+      : "https://github.com/cjus/solrac";
+  const remoteXTitle =
+    env.REMOTE_X_TITLE && env.REMOTE_X_TITLE.trim() !== ""
+      ? env.REMOTE_X_TITLE.trim()
+      : "solrac";
+
   // Default-engine validation. Two cells of the capability matrix are
   // unreachable; refuse them at boot rather than letting them run with
-  // confusing UX (local engine unreachable, or a default engine that errors
+  // confusing UX (engine slot unreachable, or a default engine that errors
   // every turn).
   const defaultEngine = parseDefaultEngine(env.SOLRAC_DEFAULT_ENGINE);
   const defaultEngineExplicit =
     env.SOLRAC_DEFAULT_ENGINE !== undefined && env.SOLRAC_DEFAULT_ENGINE.trim() !== "";
-  if (defaultEngine === "local" && !localEnabled) {
+  if (defaultEngine === "local" && !localEnabled && !remoteEnabled) {
     throw new Error(
-      "SOLRAC_DEFAULT_ENGINE=local requires LOCAL_ENABLED=true; " +
-        "set LOCAL_ENABLED=true (and LOCAL_BACKEND=ollama|lmstudio, LOCAL_MODEL=) " +
-        "to run the local engine as the default, or " +
-        "SOLRAC_DEFAULT_ENGINE=primary to make Anthropic Sonnet the default",
+      "SOLRAC_DEFAULT_ENGINE=local requires LOCAL_ENABLED=true (on-host) " +
+        "or REMOTE_ENABLED=true (OpenRouter); " +
+        "set LOCAL_ENABLED=true (with LOCAL_BACKEND + LOCAL_MODEL) for an on-host LLM, " +
+        "REMOTE_ENABLED=true (with REMOTE_BACKEND + REMOTE_MODEL + REMOTE_API_KEY) for OpenRouter, " +
+        "or SOLRAC_DEFAULT_ENGINE=primary to make Anthropic Sonnet the default",
     );
   }
   if (defaultEngine !== "local" && localToolsEnabled) {
     throw new Error(
       `SOLRAC_DEFAULT_ENGINE=${defaultEngine} with LOCAL_TOOLS_ENABLED=true is unreachable: ` +
-        "the local engine only runs when it's the default. " +
+        "the local engine slot only runs when it's the default. " +
         "Set LOCAL_TOOLS_ENABLED=false or SOLRAC_DEFAULT_ENGINE=local",
     );
   }
@@ -499,6 +627,16 @@ export function loadConfig(env: NodeJS.ProcessEnv = process.env): Config {
     localHistoryLimit,
     localToolsEnabled,
     localMaxToolIterations,
+    remoteEnabled,
+    remoteBackend,
+    remoteBaseUrl,
+    remoteModel,
+    remoteApiKey,
+    remoteTimeoutMs,
+    remoteHistoryLimit,
+    remoteMaxToolIterations,
+    remoteHttpReferer,
+    remoteXTitle,
     skillsEnabled: parseBoolean("SOLRAC_SKILLS_ENABLED", env.SOLRAC_SKILLS_ENABLED, false),
     skillsDir: resolveAgainstHome(solracHome, skillsDirRaw),
     tasksEnabled: parseBoolean("SOLRAC_TASKS_ENABLED", env.SOLRAC_TASKS_ENABLED, false),
diff --git a/src/db.test.ts b/src/db.test.ts
index 54e725e..eecb8f6 100644
--- a/src/db.test.ts
+++ b/src/db.test.ts
@@ -633,6 +633,64 @@ describe("openDb engine-scoped helpers (PNX-167)", () => {
     expect(db.hasLocalTurnsSince(1, 100)).toBe(false);
   });
 
+  test("hasLocalTurnsSince triple-pattern: also matches remote:% (OpenRouter) rows", async () => {
+    // OpenRouter-only deploys never write local:% rows; `/clear local` must
+    // still report "Already clean" correctly by recognizing remote:%.
+    const dir = newDir();
+    const db = await openDb(dir);
+    dbs.push(db);
+    seedTurns(db, [
+      {
+        chatId: 1,
+        model: "remote:openrouter:anthropic/claude-3.5-sonnet",
+        startedAt: 100,
+        response: "hi",
+        cost: 0.00007,
+        status: "ok",
+      },
+    ]);
+    expect(db.hasLocalTurnsSince(1, 0)).toBe(true);
+    expect(db.hasLocalTurnsSince(1, 100)).toBe(false);
+  });
+
+  test("outOfBandForEngine triple-pattern: remote:% rows hidden by local cutoff", async () => {
+    // Claude reading the engine-slot bridge must see remote turns as part
+    // of the local-cutoff scope — otherwise `/clear local` clears Ollama
+    // history but Claude still recites the freshly-cleared OpenRouter turns.
+    const dir = newDir();
+    const db = await openDb(dir);
+    dbs.push(db);
+    seedTurns(db, [
+      {
+        chatId: 1,
+        model: "remote:openrouter:openai/gpt-4o-mini",
+        startedAt: 100,
+        response: "remote-pre",
+        cost: 0.00005,
+        status: "ok",
+      },
+      {
+        chatId: 1,
+        model: "claude:secondary:m",
+        startedAt: 150,
+        response: "opus",
+        cost: 0.02,
+        status: "ok",
+      },
+    ]);
+    // Without cutoff: both rows appear out-of-band for the primary tier.
+    const all = db
+      .outOfBandForEngine(1, "claude:primary:%", 10)
+      .map((r) => r.response);
+    expect(all).toContain("remote-pre");
+    // With cutoff at 150: remote:% pre-cutoff row is hidden.
+    const filtered = db
+      .outOfBandForEngine(1, "claude:primary:%", 10, 150)
+      .map((r) => r.response);
+    expect(filtered).not.toContain("remote-pre");
+    expect(filtered).toContain("opus");
+  });
+
   test("sumChatBytesForEngine totals prompt+response over status='ok' rows", async () => {
     const dir = newDir();
     const db = await openDb(dir);
diff --git a/src/db.ts b/src/db.ts
index 3b0a666..0f0764c 100644
--- a/src/db.ts
+++ b/src/db.ts
@@ -548,15 +548,20 @@ export async function openDb(dataDir: string): Promise {
   // engine naturally sees an empty window because the cutoff has advanced.
   // Excludes 'system' rows (denials/queue-full).
   //
-  // The cutoff clause matches BOTH `local:%` (post-migration) AND `ollama:%`
-  // (legacy, pre-migration) so a partial migration / rollback still hides
-  // pre-cutoff local-engine rows. The legacy clause is removed in a
-  // follow-up release once the migration has propagated.
+  // The cutoff clause matches THREE engine-slot patterns:
+  //   - `local:%`  (current local-mode tag — Ollama, LMStudio)
+  //   - `ollama:%` (legacy, pre-v0.7.0; removed in a follow-up release)
+  //   - `remote:%` (current remote-mode tag — OpenRouter)
+  // `/clear local` wipes the engine slot for ALL of these in one shot via
+  // `local_cutoff_ms`. The legacy `ollama:%` row remains in the clause until
+  // the v0.7.0 cleanup ships; the `remote:%` row is added here so cross-engine
+  // bridge queries (Claude reading prior engine-slot turns) honor the cutoff
+  // for remote turns too.
   const stOutOfBandOther = db.prepare(
     "SELECT prompt, response, model FROM audit " +
       "WHERE chat_id = ? AND model NOT LIKE ? AND status = 'ok' " +
       "AND prompt IS NOT NULL AND response IS NOT NULL " +
-      "AND ((model NOT LIKE 'local:%' AND model NOT LIKE 'ollama:%') OR started_at > ?) " +
+      "AND ((model NOT LIKE 'local:%' AND model NOT LIKE 'ollama:%' AND model NOT LIKE 'remote:%') OR started_at > ?) " +
       "AND started_at > COALESCE(" +
       "  (SELECT MAX(started_at) FROM audit WHERE chat_id = ? AND model LIKE ? AND status = 'ok'), " +
       "  0" +
@@ -564,10 +569,12 @@ export async function openDb(dataDir: string): Promise {
       "ORDER BY started_at ASC LIMIT ?",
   );
   // Existence probe used by `/clear local` for the "Already clean" reply.
-  // Dual-pattern: matches both `local:%` and legacy `ollama:%`.
+  // Triple-pattern: matches `local:%`, legacy `ollama:%`, and `remote:%` so
+  // an engine-slot deploy on OpenRouter (no local Ollama/LMStudio rows ever)
+  // still reports "Already clean" correctly after the cutoff is bumped.
   const stHasLocalSince = db.prepare(
     "SELECT 1 FROM audit " +
-      "WHERE chat_id = ? AND (model LIKE 'local:%' OR model LIKE 'ollama:%') AND status = 'ok' " +
+      "WHERE chat_id = ? AND (model LIKE 'local:%' OR model LIKE 'ollama:%' OR model LIKE 'remote:%') AND status = 'ok' " +
       "AND prompt IS NOT NULL AND response IS NOT NULL " +
       "AND started_at > ? LIMIT 1",
   );
diff --git a/src/engine-driver.ts b/src/engine-driver.ts
new file mode 100644
index 0000000..ab41d8e
--- /dev/null
+++ b/src/engine-driver.ts
@@ -0,0 +1,215 @@
+/**
+ * @fileoverview Engine-slot driver abstraction — the wire-format-agnostic
+ *               contract that `local-driver.ts` (Ollama, LMStudio) and
+ *               `remote-driver.ts` (OpenRouter) both implement.
+ * @purpose Centralize the normalized event stream + error class + shared
+ *          helpers so the runner in `engine.ts` consumes one driver shape
+ *          regardless of whether the engine slot is filled by an on-host
+ *          daemon (Ollama, LMStudio) or a hosted provider (OpenRouter).
+ *
+ * Position in the dependency graph:
+ *   log → engine-driver → local-driver, remote-driver
+ *                       → engine-tools → engine
+ *
+ * Exports:
+ *   - `EngineBackend` — `"ollama" | "lmstudio" | "openrouter"`.
+ *   - `EngineChatRole`, `EngineChatMessage`, `EngineToolCallRef`, `EngineToolDef`.
+ *   - `EngineChatEvent` — `text | tool_call | done | error`.
+ *   - `EngineProbeResult` — `{ ok; reason?; modelMissing? }`.
+ *   - `EngineStreamChatOpts` — `streamChat` argument shape.
+ *   - `EngineDriver` — interface (`backend`, `mode`, `probe`, `streamChat`).
+ *   - `EngineDriverError` — typed error for connection/HTTP failures.
+ *   - `DriverOpts` — base factory options (`url`, optional `fetch`).
+ *   - `stableStringify` — order-insensitive JSON stringify for tool-call dedup.
+ *   - `maybeLogEmptyStream` — diagnostic log helper for the empty-stream
+ *     failure mode (parsed zero text + zero tool calls). Per-driver
+ *     `logEvent` parameter so operators can grep distinct events per backend.
+ *
+ * Key invariants:
+ *   - `streamChat` ALWAYS resolves the async iterable, even on errors —
+ *     errors surface as `kind: "error"` events OR throw `EngineDriverError`
+ *     for network-level failures (connection refused, timeout, 4xx/5xx).
+ *   - `EngineDriverError` carries a `code` discriminant so callers can render
+ *     different UX for `unreachable` vs `model_missing` vs `timeout` vs
+ *     `http_error`.
+ *   - `EngineDriver.mode` is the engine-slot identity — `"local"` for on-host
+ *     backends (cost is always free; audit tag prefix `local:`), `"remote"`
+ *     for hosted providers (cost captured per-turn from the usage chunk;
+ *     audit tag prefix `remote:`). The runner reads this field to pick the
+ *     capability-note framing and the cost-write matrix.
+ */
+
+import { log } from "./log.ts";
+
+export type EngineBackend = "ollama" | "lmstudio" | "openrouter";
+
+export type EngineChatRole = "system" | "user" | "assistant" | "tool";
+
+/**
+ * Reference to one tool call emitted by an assistant message. `id` is set
+ * by backends that namespace calls (LMStudio, OpenRouter); for Ollama, the
+ * consumer synthesizes one (`call__`) so cross-backend message
+ * arrays carry a stable identifier.
+ */
+export interface EngineToolCallRef {
+  id?: string;
+  function: { name: string; arguments: unknown };
+}
+
+/**
+ * Unified chat message shape. Each driver maps to its backend's wire shape:
+ *   - Ollama matches tool results by `tool_name`.
+ *   - LMStudio / OpenRouter match by `tool_call_id`.
+ * Consumers populate both on tool-result messages; drivers pick what they
+ * need. Extra fields are harmless on either wire.
+ */
+export interface EngineChatMessage {
+  role: EngineChatRole;
+  content: string;
+  tool_calls?: ReadonlyArray;
+  tool_call_id?: string;
+  tool_name?: string;
+}
+
+/**
+ * Wire-shape tool definition shared by all backends — Ollama adopted OpenAI's
+ * function-calling JSON Schema directly; LMStudio + OpenRouter are
+ * OpenAI-compatible.
+ */
+export interface EngineToolDef {
+  readonly type: "function";
+  readonly function: {
+    readonly name: string;
+    readonly description: string;
+    readonly parameters: Readonly>;
+  };
+}
+
+/**
+ * One event from `EngineDriver.streamChat`. Driver consumers iterate until
+ * the stream ends or a `done`/`error` event arrives.
+ */
+export type EngineChatEvent =
+  | { kind: "text"; delta: string }
+  | { kind: "tool_call"; call: EngineToolCallRef }
+  | {
+      kind: "done";
+      inputTokens: number | null;
+      outputTokens: number | null;
+      // USD cost reported by the backend's usage chunk. Always `null` for
+      // on-host backends (Ollama, LMStudio); set by remote backends that
+      // bill per-token (OpenRouter populates from `usage.cost` in the
+      // trailing SSE chunk). Consumed by the runner to decide what to
+      // write to `audit.cost_usd` — `null` here means "this backend
+      // didn't tell us the cost," not "the cost was zero." The runner
+      // distinguishes the two based on `driver.mode`.
+      costUsd: number | null;
+    }
+  | { kind: "error"; message: string };
+
+export interface EngineProbeResult {
+  ok: boolean;
+  reason?: string;
+  modelMissing?: boolean;
+}
+
+export interface EngineStreamChatOpts {
+  model: string;
+  messages: ReadonlyArray;
+  tools?: ReadonlyArray;
+  signal?: AbortSignal;
+}
+
+/**
+ * Engine-slot driver contract. Implementations live in `local-driver.ts`
+ * (Ollama, LMStudio) and `remote-driver.ts` (OpenRouter); the runner in
+ * `engine.ts` consumes only this interface.
+ *
+ * `mode` is the engine-slot identity:
+ *   - `"local"` — on-host backend (Ollama, LMStudio). Cost always 0; audit
+ *     tag prefix `local:`. The runner writes `cost_usd = 0` regardless of
+ *     what the driver reports.
+ *   - `"remote"` — hosted backend (OpenRouter). Cost captured from the
+ *     driver's `done` event; audit tag prefix `remote:`. The runner writes
+ *     either the captured cost or `null` (with a `remote.cost_missing` warn)
+ *     when the driver returned no cost.
+ */
+export interface EngineDriver {
+  readonly backend: EngineBackend;
+  readonly mode: "local" | "remote";
+  probe(model: string, signal?: AbortSignal): Promise;
+  streamChat(opts: EngineStreamChatOpts): AsyncIterable;
+}
+
+/**
+ * Typed error surface for `streamChat` and `probe`. `code` lets callers
+ * render distinct UX for "ollama daemon not running" (`unreachable`) vs
+ * "model not pulled" (`model_missing`) without parsing the message.
+ */
+export class EngineDriverError extends Error {
+  readonly backend: EngineBackend;
+  readonly code: "unreachable" | "timeout" | "model_missing" | "http_error";
+  readonly status?: number;
+  constructor(
+    backend: EngineBackend,
+    code: "unreachable" | "timeout" | "model_missing" | "http_error",
+    message: string,
+    status?: number,
+  ) {
+    super(message);
+    this.name = "EngineDriverError";
+    this.backend = backend;
+    this.code = code;
+    this.status = status;
+  }
+}
+
+export interface DriverOpts {
+  url: string; // base, no trailing slash
+  fetch?: typeof fetch;
+}
+
+// ---------------------------------------------------------------------------
+// Shared helpers
+// ---------------------------------------------------------------------------
+
+/**
+ * Order-insensitive JSON stringify so `{a:1,b:2}` and `{b:2,a:1}` hash to the
+ * same dedup key. Used by SSE drivers (LMStudio, OpenRouter) to suppress
+ * duplicate tool calls inside one assistant message (Gemma-4
+ * `parallel_tool_calls: false` workaround).
+ */
+export function stableStringify(value: unknown): string {
+  if (value === null || typeof value !== "object") return JSON.stringify(value) ?? "null";
+  if (Array.isArray(value)) return `[${value.map(stableStringify).join(",")}]`;
+  const obj = value as Record;
+  const keys = Object.keys(obj).sort();
+  return `{${keys.map((k) => `${JSON.stringify(k)}:${stableStringify(obj[k])}`).join(",")}}`;
+}
+
+// Caps on the empty-stream diagnostic buffer — bounded so a runaway stream
+// (huge content + zero parsed events somehow) can't blow up the log line.
+export const RAW_FRAME_BUFFER_MAX = 30;
+export const RAW_FRAME_TRUNC = 400;
+
+/**
+ * Diagnostic log for the empty-stream failure mode (some models close the SSE
+ * with `[DONE]` immediately when `tools` is in the body — template lacks tool
+ * branch). Driver buffers raw `data:` payloads and calls this on close; the
+ * log fires only when zero text + zero tool calls were emitted. Per-driver
+ * `logEvent` parameter so operators can grep distinct events per backend.
+ */
+export function maybeLogEmptyStream(args: {
+  model: string;
+  textCharsEmitted: number;
+  toolCallsEmitted: number;
+  rawFrameBuffer: ReadonlyArray;
+  logEvent: string;
+}): void {
+  if (args.textCharsEmitted > 0 || args.toolCallsEmitted > 0) return;
+  log.warn(args.logEvent, {
+    model: args.model,
+    frameCount: args.rawFrameBuffer.length,
+    frames: args.rawFrameBuffer,
+  });
+}
diff --git a/src/local-tools.test.ts b/src/engine-tools.test.ts
similarity index 87%
rename from src/local-tools.test.ts
rename to src/engine-tools.test.ts
index 23f2509..84ebf2c 100644
--- a/src/local-tools.test.ts
+++ b/src/engine-tools.test.ts
@@ -1,35 +1,36 @@
 /**
- * @fileoverview Unit tests for `local-tools.ts`.
+ * @fileoverview Unit tests for `engine-tools.ts`.
  * @proves Schema converter shape, thought-fence stripper, and the
  *         multi-round `runToolLoop` driver behaviors that the
- *         `local-driver.test.ts` event-stream tests don't already cover.
+ *         `local-driver.test.ts` / `remote-driver.test.ts` event-stream
+ *         tests don't already cover.
  *
- * `runToolLoop` is tested via a hand-rolled fake `LocalDriver` that yields
- * scripted `LocalChatEvent` sequences — that isolates loop logic from
- * wire-format concerns (already covered in `local-driver.test.ts`).
+ * `runToolLoop` is tested via a hand-rolled fake `EngineDriver` that yields
+ * scripted `EngineChatEvent` sequences — that isolates loop logic from
+ * wire-format concerns (already covered in the driver-impl test files).
  */
 
 import { describe, expect, test } from "bun:test";
 import { z } from "zod";
 import type { SdkMcpToolDefinition } from "@anthropic-ai/claude-agent-sdk";
 import {
-  type LocalChatEvent,
-  type LocalDriver,
-  type LocalStreamChatOpts,
-} from "./local-driver.ts";
+  type EngineChatEvent,
+  type EngineDriver,
+  type EngineStreamChatOpts,
+} from "./engine-driver.ts";
 import {
-  mcpToLocalTools,
+  mcpToEngineTools,
   runToolLoop,
   stripThoughts,
   TOOL_RESULT_MAX_LEN,
-} from "./local-tools.ts";
+} from "./engine-tools.ts";
 import { createLoopDetector, type ConfirmationBroker } from "./policy.ts";
 
 // ---------------------------------------------------------------------------
 // Pure converter tests
 // ---------------------------------------------------------------------------
 
-describe("mcpToLocalTools", () => {
+describe("mcpToEngineTools", () => {
   function makeTool(
     name: string,
     inputSchema: z.ZodRawShape,
@@ -44,7 +45,7 @@ describe("mcpToLocalTools", () => {
   }
 
   test("converts a simple object schema with required + optional fields", () => {
-    const out = mcpToLocalTools([
+    const out = mcpToEngineTools([
       makeTool("time_now", {
         tz: z.string().describe("IANA timezone"),
         format: z.enum(["iso", "human"]).optional(),
@@ -65,7 +66,7 @@ describe("mcpToLocalTools", () => {
   });
 
   test("preserves descriptions on individual fields", () => {
-    const out = mcpToLocalTools([
+    const out = mcpToEngineTools([
       makeTool("t", { foo: z.string().describe("the foo") }),
     ]);
     const params = out[0]!.function.parameters as Record;
@@ -74,7 +75,7 @@ describe("mcpToLocalTools", () => {
   });
 
   test("empty tools list → empty output", () => {
-    expect(mcpToLocalTools([])).toEqual([]);
+    expect(mcpToEngineTools([])).toEqual([]);
   });
 });
 
@@ -115,14 +116,15 @@ describe("TOOL_RESULT_MAX_LEN", () => {
 // ---------------------------------------------------------------------------
 
 // Scriptable fake — each call to `streamChat` consumes the next event batch.
-function scriptedDriver(rounds: Array): LocalDriver {
+function scriptedDriver(rounds: Array): EngineDriver {
   let i = 0;
   return {
     backend: "ollama",
+    mode: "local",
     async probe() {
       return { ok: true };
     },
-    async *streamChat(_opts: LocalStreamChatOpts): AsyncIterable {
+    async *streamChat(_opts: EngineStreamChatOpts): AsyncIterable {
       const events = rounds[i++] ?? [];
       for (const evt of events) yield evt;
     },
@@ -142,7 +144,7 @@ describe("runToolLoop — single round, no tools", () => {
     const driver = scriptedDriver([
       [
         { kind: "text", delta: "hello world" },
-        { kind: "done", inputTokens: 5, outputTokens: 3 },
+        { kind: "done", inputTokens: 5, outputTokens: 3 , costUsd: null },
       ],
     ]);
     const result = await runToolLoop(
@@ -208,12 +210,12 @@ describe("runToolLoop — with tool calls", () => {
           kind: "tool_call",
           call: { id: "call_1", function: { name: "echo", arguments: { msg: "hi" } } },
         },
-        { kind: "done", inputTokens: 8, outputTokens: 4 },
+        { kind: "done", inputTokens: 8, outputTokens: 4 , costUsd: null },
       ],
       // Round 2: text-only finalization
       [
         { kind: "text", delta: "done!" },
-        { kind: "done", inputTokens: 20, outputTokens: 2 },
+        { kind: "done", inputTokens: 20, outputTokens: 2 , costUsd: null },
       ],
     ]);
 
@@ -235,7 +237,7 @@ describe("runToolLoop — with tool calls", () => {
         signal: new AbortController().signal,
         tools: new Map([["echo", echoTool]]),
         toolTiers: new Map([["echo", "auto"]]),
-        toolDefs: mcpToLocalTools([echoTool]),
+        toolDefs: mcpToEngineTools([echoTool]),
         broker: noopBroker,
         loopDetector: createLoopDetector(),
         maxIterations: 4,
@@ -263,11 +265,11 @@ describe("runToolLoop — with tool calls", () => {
           kind: "tool_call",
           call: { function: { name: "dangerous", arguments: {} } },
         },
-        { kind: "done", inputTokens: 5, outputTokens: 1 },
+        { kind: "done", inputTokens: 5, outputTokens: 1 , costUsd: null },
       ],
       [
         { kind: "text", delta: "ok, moving on" },
-        { kind: "done", inputTokens: 10, outputTokens: 3 },
+        { kind: "done", inputTokens: 10, outputTokens: 3 , costUsd: null },
       ],
     ]);
 
@@ -289,7 +291,7 @@ describe("runToolLoop — with tool calls", () => {
         signal: new AbortController().signal,
         tools: new Map([["dangerous", dangerousTool]]),
         toolTiers: new Map([["dangerous", "auto"]]),
-        toolDefs: mcpToLocalTools([dangerousTool]),
+        toolDefs: mcpToEngineTools([dangerousTool]),
         broker: noopBroker,
         loopDetector: createLoopDetector(),
         maxIterations: 4,
@@ -310,17 +312,17 @@ describe("runToolLoop — iteration cap", () => {
   test("cap hit fires the finalize round and sets iterationCapHit", async () => {
     // Build N+1 scripted rounds: N tool-calling rounds (cap) + 1 finalize round.
     const cap = 2;
-    const rounds: Array = [];
+    const rounds: Array = [];
     for (let i = 0; i < cap; i++) {
       rounds.push([
         { kind: "tool_call", call: { function: { name: "echo", arguments: { i } } } },
-        { kind: "done", inputTokens: i === 0 ? 5 : 30, outputTokens: 2 },
+        { kind: "done", inputTokens: i === 0 ? 5 : 30, outputTokens: 2 , costUsd: null },
       ]);
     }
     // The finalize round (after cap nudge).
     rounds.push([
       { kind: "text", delta: "best effort answer" },
-      { kind: "done", inputTokens: 40, outputTokens: 5 },
+      { kind: "done", inputTokens: 40, outputTokens: 5 , costUsd: null },
     ]);
     const driver = scriptedDriver(rounds);
 
@@ -340,7 +342,7 @@ describe("runToolLoop — iteration cap", () => {
         signal: new AbortController().signal,
         tools: new Map([["echo", echoTool]]),
         toolTiers: new Map([["echo", "auto"]]),
-        toolDefs: mcpToLocalTools([echoTool]),
+        toolDefs: mcpToEngineTools([echoTool]),
         broker: noopBroker,
         loopDetector: createLoopDetector(),
         maxIterations: cap,
diff --git a/src/local-tools.ts b/src/engine-tools.ts
similarity index 84%
rename from src/local-tools.ts
rename to src/engine-tools.ts
index 237386c..efc4f7a 100644
--- a/src/local-tools.ts
+++ b/src/engine-tools.ts
@@ -1,18 +1,19 @@
 /**
- * @fileoverview Local-engine tool-calling support — schema converter,
+ * @fileoverview Engine-slot tool-calling support — schema converter,
  *               per-call executor, and multi-round loop driver.
  * @purpose Bridge solrac integrations (`SdkMcpToolDefinition`, designed for
  *          the Anthropic-hosted Claude Agent SDK) into the OpenAI-compatible
- *          tool format both local backends (Ollama, LMStudio) accept, and
- *          run a single tool call through the same safety layers (loop
- *          detector, classifier, broker) the SDK path uses on Claude tiers.
- *          One source of truth for the tool surface — the same operator-
- *          authored integrations reach Claude tiers AND every local backend.
+ *          tool format every non-Anthropic backend accepts — local (Ollama,
+ *          LMStudio) and remote (OpenRouter) alike — and run a single tool
+ *          call through the same safety layers (loop detector, classifier,
+ *          broker) the SDK path uses on Claude tiers. One source of truth
+ *          for the tool surface — the same operator-authored integrations
+ *          reach Claude tiers AND every engine-slot backend.
  *
  * Why a converter at all:
  *   `SdkMcpToolDefinition.inputSchema` is a raw `ZodRawShape` (object of zod
  *   field defs), NOT a wrapped `z.object(...)`. The SDK applies the wrap
- *   internally; for the local path we wrap before producing JSON Schema.
+ *   internally; for the engine-slot path we wrap before producing JSON Schema.
  *
  * Why `z.toJSONSchema` and not a hand-rolled walker:
  *   Verified empirically that zod 4.4.3's output is already OpenAI-compatible
@@ -21,10 +22,10 @@
  *   the top-level `$schema` JSON-Schema-version marker (some strict models
  *   reject unrecognized fields). Pin or vendor zod if churn becomes an issue.
  *
- * Why a separate executor for the local path (vs reusing the SDK's path):
+ * Why a separate executor for the engine-slot path (vs reusing the SDK's path):
  *   The Anthropic SDK drives the tool-call loop internally — every classified
  *   `mcp__solrac__*` call lands at the integration's handler without solrac
- *   needing to invoke it. The local backends return one assistant message;
+ *   needing to invoke it. Engine-slot backends return one assistant message;
  *   if it contains `tool_calls`, WE execute them and feed results back. So
  *   we re-implement the per-call gate path (loop → classify → broker → invoke)
  *   that `agent.ts` gets for free from the SDK. The same `policy.ts` building
@@ -40,28 +41,29 @@
  *   6. handler invoke — the integration's own code.
  *
  * Cost cap is intentionally NOT checked here. Anthropic per-chat + global
- * caps gate Anthropic burn only. Local is $0; the loop detector and the
- * iteration cap are the runaway-loop defenses.
+ * caps gate Anthropic burn only. The loop detector and the iteration cap
+ * are the runaway-loop defenses.
  *
  * Position in the dependency graph:
- *   integrations + policy + telegram + log + zod + local-driver → local-tools → local
+ *   integrations + policy + telegram + log + zod + engine-driver
+ *     → engine-tools → engine
  *
  * Cross-references:
  *   - src/integrations.ts — the producer side
  *   - src/policy.ts — `classifyToolWithIntegrations`, `LoopDetector`,
  *     `ConfirmationBroker` (all reused as-is)
- *   - src/local-driver.ts — backend abstraction this loop consumes
+ *   - src/engine-driver.ts — shared backend abstraction this loop consumes
  */
 
 import { z } from "zod";
 import type { SdkMcpToolDefinition } from "@anthropic-ai/claude-agent-sdk";
 import {
-  type LocalChatMessage,
-  type LocalDriver,
-  LocalDriverError,
-  type LocalToolCallRef,
-  type LocalToolDef,
-} from "./local-driver.ts";
+  type EngineChatMessage,
+  type EngineDriver,
+  EngineDriverError,
+  type EngineToolCallRef,
+  type EngineToolDef,
+} from "./engine-driver.ts";
 import {
   classifyToolWithIntegrations,
   type ConfirmationBroker,
@@ -72,26 +74,27 @@ import type { IntegrationTier } from "./integrations.ts";
 import { log } from "./log.ts";
 
 /**
- * Re-export the wire-shape tool def under the local-tools-flavored name so
+ * Re-export the wire-shape tool def under the engine-tools-flavored name so
  * downstream callers can import everything tool-related from one module.
  */
-export type { LocalToolDef } from "./local-driver.ts";
+export type { EngineToolDef } from "./engine-driver.ts";
 
 /**
- * Convert solrac integration tools to the wire-shape both local backends use.
+ * Convert solrac integration tools to the wire-shape every engine-slot
+ * backend uses.
  *
  * Names pass through unchanged — integrations register short names like
  * `time_now`; the `mcp__solrac__` prefix is added at the SDK boundary in
- * `agent.ts` and is NOT used over the local wire (both backends use flat
- * tool registries).
+ * `agent.ts` and is NOT used over the engine-slot wire (every backend uses
+ * a flat tool registry).
  *
  * The `` schema generic mirrors the SDK's own `tools?: Array<…>`
  * field (`sdk.d.ts:426`) — heterogeneous tool arrays can't share a single
  * concrete schema type.
  */
-export function mcpToLocalTools(
+export function mcpToEngineTools(
   tools: ReadonlyArray>,
-): LocalToolDef[] {
+): EngineToolDef[] {
   return tools.map((t) => {
     const objectSchema = z.object(t.inputSchema as z.ZodRawShape);
     const parameters = z.toJSONSchema(objectSchema) as Record;
@@ -164,7 +167,7 @@ export interface ExecuteToolCallDeps {
   readonly broker: Pick;
   readonly loopDetector: LoopDetector;
   /**
-   * `LOCAL_DENY_TOOLS` belt-and-suspenders set. Names in this set bypass the
+   * `ENGINE_DENY_TOOLS` belt-and-suspenders set. Names in this set bypass the
    * classifier and broker; any call whose name appears here is denied
    * immediately with `denied_policy`. Mirrors `disallowedTools: ["Agent","Task"]`
    * for the SDK path.
@@ -202,7 +205,7 @@ export async function executeToolCall(
 
   if (deps.loopDetector.check(fullName, args) === "loop") {
     const reason = `loop_detected: ${shortName} called ${deps.loopDetector.threshold}× with same input`;
-    log.warn("local.tool_loop_detected", {
+    log.warn("engine.tool_loop_detected", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -214,7 +217,7 @@ export async function executeToolCall(
   const tool = deps.tools.get(shortName);
   if (!tool) {
     const reason = `unknown tool: ${shortName}`;
-    log.warn("local.tool_unknown", {
+    log.warn("engine.tool_unknown", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -227,8 +230,8 @@ export async function executeToolCall(
   }
 
   if (deps.deniedTools?.has(shortName)) {
-    const reason = `tool ${shortName} is in LOCAL_DENY_TOOLS`;
-    log.warn("local.tool_denied_hard", {
+    const reason = `tool ${shortName} is in ENGINE_DENY_TOOLS`;
+    log.warn("engine.tool_denied_hard", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -238,7 +241,7 @@ export async function executeToolCall(
 
   const decision = classifyToolWithIntegrations(fullName, args, deps.toolTiers);
   if (decision.kind === "deny") {
-    log.warn("local.tool_denied_policy", {
+    log.warn("engine.tool_denied_policy", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -252,7 +255,7 @@ export async function executeToolCall(
   }
 
   if (decision.kind === "confirm" && deps.autoAllow) {
-    log.info("local.tool_auto_allow", {
+    log.info("engine.tool_auto_allow", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -260,7 +263,7 @@ export async function executeToolCall(
   } else if (decision.kind === "confirm") {
     if (deps.roundState && deps.roundState.confirmsRemaining <= 0) {
       const reason = "only one confirmable tool per round; retry one at a time";
-      log.warn("local.tool_confirm_round_cap", {
+      log.warn("engine.tool_confirm_round_cap", {
         auditId: deps.auditId,
         chatId: deps.chatId,
         tool: shortName,
@@ -268,7 +271,7 @@ export async function executeToolCall(
       return { content: `denied: ${reason}`, disposition: "denied_policy", reason };
     }
     if (deps.roundState) deps.roundState.confirmsRemaining -= 1;
-    log.info("local.tool_confirm_request", {
+    log.info("engine.tool_confirm_request", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -282,7 +285,7 @@ export async function executeToolCall(
       });
     } catch (err) {
       const msg = (err as Error).message;
-      log.warn("local.tool_confirm_send_failed", {
+      log.warn("engine.tool_confirm_send_failed", {
         auditId: deps.auditId,
         chatId: deps.chatId,
         tool: shortName,
@@ -294,7 +297,7 @@ export async function executeToolCall(
         reason: msg,
       };
     }
-    log.info("local.tool_confirm_resolved", {
+    log.info("engine.tool_confirm_resolved", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -322,7 +325,7 @@ export async function executeToolCall(
     const issues = parsed.error.issues
       .map((i) => `${i.path.join(".") || "(root)"}: ${i.message}`)
       .join("; ");
-    log.warn("local.tool_invalid_args", {
+    log.warn("engine.tool_invalid_args", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -341,7 +344,7 @@ export async function executeToolCall(
     result = await tool.handler(parsed.data, {});
   } catch (err) {
     const msg = (err as Error).message;
-    log.warn("local.tool_handler_threw", {
+    log.warn("engine.tool_handler_threw", {
       auditId: deps.auditId,
       chatId: deps.chatId,
       tool: shortName,
@@ -356,7 +359,7 @@ export async function executeToolCall(
   }
 
   const { content, truncated } = coalesceResultContent(result);
-  log.debug("local.tool_call_ok", {
+  log.debug("engine.tool_call_ok", {
     auditId: deps.auditId,
     chatId: deps.chatId,
     tool: shortName,
@@ -507,7 +510,7 @@ const EDIT_THROTTLE_MS = 1500;
  * `disallowedTools: ["Agent","Task"]`. Any tool name in this set is rejected
  * before the executor is called.
  */
-export const LOCAL_DENY_TOOLS: ReadonlySet = Object.freeze(new Set());
+export const ENGINE_DENY_TOOLS: ReadonlySet = Object.freeze(new Set());
 
 export interface ToolLoopResult {
   readonly assistantText: string;
@@ -516,6 +519,14 @@ export interface ToolLoopResult {
   readonly inputTokens: number | null;
   /** Sum of `outputTokens` across all rounds (true total generated). */
   readonly outputTokens: number | null;
+  /**
+   * Sum of `costUsd` across every round (including cap-finalize). `null` only
+   * if NO round reported a cost — for local backends this is always null;
+   * for the openrouter backend a null here is the `remote.cost_missing` signal.
+   * Per-round sums (vs round-0-only) match how the tool-loop actually bills
+   * on a remote backend: each round is a separate API call with its own cost.
+   */
+  readonly costUsd: number | null;
   readonly rounds: number;
   readonly toolsFired: number;
   readonly iterationCapHit: boolean;
@@ -539,7 +550,7 @@ export interface RunToolLoopRenderer {
 }
 
 export interface RunToolLoopDeps {
-  readonly driver: LocalDriver;
+  readonly driver: EngineDriver;
   readonly model: string;
   /**
    * Single shared `AbortSignal` for every fetch this turn — model rounds AND
@@ -549,7 +560,7 @@ export interface RunToolLoopDeps {
   readonly signal: AbortSignal;
   readonly tools: ReadonlyMap>;
   readonly toolTiers: ReadonlyMap;
-  readonly toolDefs: ReadonlyArray;
+  readonly toolDefs: ReadonlyArray;
   readonly broker: Pick;
   readonly loopDetector: LoopDetector;
   readonly maxIterations: number;
@@ -561,7 +572,7 @@ export interface RunToolLoopDeps {
 }
 
 export interface RunToolLoopInput {
-  readonly initialMessages: ReadonlyArray;
+  readonly initialMessages: ReadonlyArray;
 }
 
 /**
@@ -587,12 +598,20 @@ export async function runToolLoop(
   deps: RunToolLoopDeps,
   input: RunToolLoopInput,
 ): Promise {
-  const denyTools = deps.denyTools ?? LOCAL_DENY_TOOLS;
-  const messages: LocalChatMessage[] = input.initialMessages.map((m) => ({ ...m }));
+  const denyTools = deps.denyTools ?? ENGINE_DENY_TOOLS;
+  const messages: EngineChatMessage[] = input.initialMessages.map((m) => ({ ...m }));
 
   let inputTokens: number | null = null;
   let outputTokens = 0;
   let outputTokensSeen = false;
+  // Remote-backend cost accumulator. Each round writes its own usage chunk
+  // with its own per-round cost; the tool-loop is a series of independent API
+  // calls so the summed cost is the real billed total. `costUsdSeen` tracks
+  // whether ANY round reported a cost — a tool-loop where every round skips
+  // the field must return `costUsd: null` (not 0) so `resolveAuditCost` in
+  // `engine.ts` writes null + emits `remote.cost_missing`.
+  let costUsd = 0;
+  let costUsdSeen = false;
   const toolCallSummaries: Array<{ name: string; input: unknown }> = [];
   let assistantText = "";
   let errorMessage: string | null = null;
@@ -602,7 +621,7 @@ export async function runToolLoop(
   let lastEditedKey = "";
   let round = 0;
 
-  log.info("local.tool_loop_start", {
+  log.info("engine.tool_loop_start", {
     auditId: deps.auditId,
     chatId: deps.chatId,
     backend: deps.driver.backend,
@@ -621,6 +640,7 @@ export async function runToolLoop(
     toolCalls: LocalToolCall[];
     inputTokens: number | null;
     outputTokens: number | null;
+    costUsd: number | null;
     error: string | null;
   }> {
     const result = {
@@ -628,6 +648,7 @@ export async function runToolLoop(
       toolCalls: [] as LocalToolCall[],
       inputTokens: null as number | null,
       outputTokens: null as number | null,
+      costUsd: null as number | null,
       error: null as string | null,
     };
 
@@ -652,7 +673,7 @@ export async function runToolLoop(
                 try {
                   await deps.renderer.onProgress(result.text, toolNames);
                 } catch (renderErr) {
-                  log.debug("local.progress_failed", {
+                  log.debug("engine.progress_failed", {
                     auditId: deps.auditId,
                     error: (renderErr as Error).message,
                   });
@@ -669,13 +690,14 @@ export async function runToolLoop(
         } else if (evt.kind === "done") {
           result.inputTokens = evt.inputTokens;
           result.outputTokens = evt.outputTokens;
+          result.costUsd = evt.costUsd;
         } else if (evt.kind === "error") {
           result.error = `local error: ${evt.message}`;
           break;
         }
       }
     } catch (err) {
-      if (err instanceof LocalDriverError) {
+      if (err instanceof EngineDriverError) {
         result.error = formatDriverErrorForLoop(err);
       } else {
         const e = err as Error;
@@ -699,6 +721,10 @@ export async function runToolLoop(
         outputTokens += r.outputTokens;
         outputTokensSeen = true;
       }
+      if (r.costUsd !== null) {
+        costUsd += r.costUsd;
+        costUsdSeen = true;
+      }
       assistantText = r.text;
 
       if (r.error !== null) {
@@ -730,7 +756,7 @@ export async function runToolLoop(
 
         if (denyTools.has(call.name)) {
           const denyMsg = `denied: ${call.name} is hard-disabled in this build`;
-          log.warn("local.tool_hard_denied", {
+          log.warn("engine.tool_hard_denied", {
             auditId: deps.auditId,
             chatId: deps.chatId,
             tool: call.name,
@@ -751,7 +777,7 @@ export async function runToolLoop(
         const wouldConfirm = tier !== "auto" && !deps.autoAllow;
         if (wouldConfirm && confirmsUsedThisRound > 0) {
           const msg = "denied: only one confirmable tool per round; retry separately";
-          log.info("local.tool_confirm_skipped_round_cap", {
+          log.info("engine.tool_confirm_skipped_round_cap", {
             auditId: deps.auditId,
             chatId: deps.chatId,
             tool: call.name,
@@ -803,7 +829,7 @@ export async function runToolLoop(
     // tool stream as the final UX state.
     if (round >= deps.maxIterations && errorMessage === null && !isAborted()) {
       iterationCapHit = true;
-      log.warn("local.tool_iteration_cap", {
+      log.warn("engine.tool_iteration_cap", {
         auditId: deps.auditId,
         chatId: deps.chatId,
         cap: deps.maxIterations,
@@ -830,6 +856,10 @@ export async function runToolLoop(
         outputTokens += finalRound.outputTokens;
         outputTokensSeen = true;
       }
+      if (finalRound.costUsd !== null) {
+        costUsd += finalRound.costUsd;
+        costUsdSeen = true;
+      }
     }
   } catch (err) {
     const e = err as Error;
@@ -837,7 +867,7 @@ export async function runToolLoop(
       // Caller aborted (timeout / shutdown). Distinct from a fetch failure.
     } else {
       errorMessage = `local unexpected error: ${e.message}`;
-      log.error("local.tool_loop_failed", {
+      log.error("engine.tool_loop_failed", {
         auditId: deps.auditId,
         backend: deps.driver.backend,
         error: e.message,
@@ -852,6 +882,7 @@ export async function runToolLoop(
     toolCallSummaries,
     inputTokens,
     outputTokens: outputTokensSeen ? outputTokens : null,
+    costUsd: costUsdSeen ? costUsd : null,
     rounds: round + (iterationCapHit ? 1 : 0),
     toolsFired,
     iterationCapHit,
@@ -861,7 +892,7 @@ export async function runToolLoop(
     aborted,
   };
 
-  log.info("local.tool_loop_done", {
+  log.info("engine.tool_loop_done", {
     auditId: deps.auditId,
     chatId: deps.chatId,
     backend: deps.driver.backend,
@@ -869,6 +900,7 @@ export async function runToolLoop(
     rounds: result.rounds,
     inputTokens: result.inputTokens,
     outputTokens: result.outputTokens,
+    costUsd: result.costUsd,
     toolsFired,
     iterationCapHit,
     aborted,
@@ -879,8 +911,8 @@ export async function runToolLoop(
 }
 
 // Format a driver error into a loop-level message. Mirrors the formatting in
-// local.ts but kept local so the loop driver doesn't depend back on the runner.
-function formatDriverErrorForLoop(err: LocalDriverError): string {
+// engine.ts but kept local so the loop driver doesn't depend back on the runner.
+function formatDriverErrorForLoop(err: EngineDriverError): string {
   if (err.code === "model_missing") return err.message;
   return `local ${err.backend} ${err.code}: ${err.message}`;
 }
@@ -889,13 +921,14 @@ function formatDriverErrorForLoop(err: LocalDriverError): string {
 // Used by the cap-finalize path where we want a closing message but no tools
 // surface and no UI throttling.
 async function collectFinalText(opts: {
-  driver: LocalDriver;
+  driver: EngineDriver;
   model: string;
-  messages: ReadonlyArray;
+  messages: ReadonlyArray;
   signal: AbortSignal;
-}): Promise<{ text: string; outputTokens: number | null }> {
+}): Promise<{ text: string; outputTokens: number | null; costUsd: number | null }> {
   let text = "";
   let outputTokens: number | null = null;
+  let costUsd: number | null = null;
   try {
     for await (const evt of opts.driver.streamChat({
       model: opts.model,
@@ -903,18 +936,20 @@ async function collectFinalText(opts: {
       signal: opts.signal,
     })) {
       if (evt.kind === "text") text += evt.delta;
-      else if (evt.kind === "done") outputTokens = evt.outputTokens;
-      else if (evt.kind === "error") break;
+      else if (evt.kind === "done") {
+        outputTokens = evt.outputTokens;
+        costUsd = evt.costUsd;
+      } else if (evt.kind === "error") break;
     }
   } catch (err) {
-    log.warn("local.cap_finalize_failed", {
+    log.warn("engine.cap_finalize_failed", {
       error: (err as Error).message,
     });
   }
-  return { text, outputTokens };
+  return { text, outputTokens, costUsd };
 }
 
 /**
- * Re-export `LocalToolCallRef` so consumers don't need a second import.
+ * Re-export `EngineToolCallRef` so consumers don't need a second import.
  */
-export type { LocalToolCallRef };
+export type { EngineToolCallRef };
diff --git a/src/local.test.ts b/src/engine.test.ts
similarity index 58%
rename from src/local.test.ts
rename to src/engine.test.ts
index 5d5700d..6ba0c18 100644
--- a/src/local.test.ts
+++ b/src/engine.test.ts
@@ -1,30 +1,27 @@
 /**
- * @fileoverview Unit tests for `local.ts`.
+ * @fileoverview Unit tests for `engine.ts`.
  * @proves Capability-note matrix (pure), audit-tag invariant
- *         (`local::`), driver-error → render translation,
+ *         (`::`), driver-error → render translation,
  *         and token-count capture from `done` events.
  *
  * Wire-format edge cases (NDJSON / SSE parsing) belong in
- * `local-driver.test.ts`. Tool-loop behavior belongs in
- * `local-tools.test.ts`. This file exercises only the runner-level
- * concerns that survive the driver abstraction.
+ * `local-driver.test.ts` / `remote-driver.test.ts`. Tool-loop behavior
+ * belongs in `engine-tools.test.ts`. This file exercises only the
+ * runner-level concerns that survive the driver abstraction.
  */
 
 import { describe, expect, test } from "bun:test";
 import { mkdir, rm } from "node:fs/promises";
 import type { Message } from "@grammyjs/types";
+import { runEngineTurn, scrubLocalControlTokens } from "./engine.ts";
 import {
-  buildLocalCapabilityNote,
-  buildToolCapabilityNote,
-  runLocalTurn,
-  scrubLocalControlTokens,
-} from "./local.ts";
-import {
-  type LocalChatEvent,
-  type LocalDriver,
-  type LocalStreamChatOpts,
-  LocalDriverError,
-} from "./local-driver.ts";
+  type EngineChatEvent,
+  type EngineDriver,
+  type EngineStreamChatOpts,
+  EngineDriverError,
+} from "./engine-driver.ts";
+import { buildLocalCapabilityNote } from "./local-driver.ts";
+import { buildRemoteCapabilityNote } from "./remote-driver.ts";
 import { openDb, type SolracDb } from "./db.ts";
 import type { SendMessageOpts, TelegramClient } from "./telegram.ts";
 
@@ -74,15 +71,16 @@ function makeFakeTg(): {
 }
 
 function fakeDriver(
-  backend: "ollama" | "lmstudio",
-  events: LocalChatEvent[] | Error,
-): LocalDriver {
+  backend: "ollama" | "lmstudio" | "openrouter",
+  events: EngineChatEvent[] | Error,
+): EngineDriver {
   return {
     backend,
+    mode: backend === "openrouter" ? "remote" : "local",
     async probe() {
       return { ok: true };
     },
-    async *streamChat(_opts: LocalStreamChatOpts): AsyncIterable {
+    async *streamChat(_opts: EngineStreamChatOpts): AsyncIterable {
       if (events instanceof Error) throw events;
       for (const evt of events) yield evt;
     },
@@ -137,32 +135,20 @@ describe("buildLocalCapabilityNote", () => {
   });
 });
 
-describe("buildToolCapabilityNote", () => {
-  test("defers to buildLocalCapabilityNote with toolsEnabled=true", () => {
-    const a = buildToolCapabilityNote(["x"], true);
-    const b = buildLocalCapabilityNote({
-      toolsEnabled: true,
-      isDefaultEngine: true,
-      toolNames: ["x"],
-    });
-    expect(a).toBe(b);
-  });
-});
-
 // ---------------------------------------------------------------------------
-// runLocalTurn — integration with real db + fake tg + fake driver
+// runEngineTurn — integration with real db + fake tg + fake driver
 // ---------------------------------------------------------------------------
 
-describe("runLocalTurn — audit tag invariant", () => {
+describe("runEngineTurn — audit tag invariant", () => {
   test("ollama backend writes audit.model = 'local:ollama:'", async () => {
-    const { db, dir } = await freshDb("local-audit-ollama");
+    const { db, dir } = await freshDb("engine-audit-ollama");
     try {
       const { tg } = makeFakeTg();
       const driver = fakeDriver("ollama", [
         { kind: "text", delta: "hello" },
-        { kind: "done", inputTokens: 5, outputTokens: 3 },
+        { kind: "done", inputTokens: 5, outputTokens: 3, costUsd: null },
       ]);
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -185,14 +171,14 @@ describe("runLocalTurn — audit tag invariant", () => {
   });
 
   test("lmstudio backend writes audit.model = 'local:lmstudio:'", async () => {
-    const { db, dir } = await freshDb("local-audit-lmstudio");
+    const { db, dir } = await freshDb("engine-audit-lmstudio");
     try {
       const { tg } = makeFakeTg();
       const driver = fakeDriver("lmstudio", [
         { kind: "text", delta: "hello" },
-        { kind: "done", inputTokens: 5, outputTokens: 3 },
+        { kind: "done", inputTokens: 5, outputTokens: 3, costUsd: null },
       ]);
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -216,16 +202,16 @@ describe("runLocalTurn — audit tag invariant", () => {
   });
 });
 
-describe("runLocalTurn — error rendering", () => {
-  test("LocalDriverError unreachable → audit status='error', edit shows error", async () => {
-    const { db, dir } = await freshDb("local-err-unreachable");
+describe("runEngineTurn — error rendering", () => {
+  test("EngineDriverError unreachable → audit status='error', edit shows error", async () => {
+    const { db, dir } = await freshDb("engine-err-unreachable");
     try {
       const { tg, edits } = makeFakeTg();
       const driver = fakeDriver(
         "ollama",
-        new LocalDriverError("ollama", "unreachable", "unreachable: http://x"),
+        new EngineDriverError("ollama", "unreachable", "unreachable: http://x"),
       );
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -252,20 +238,20 @@ describe("runLocalTurn — error rendering", () => {
     }
   });
 
-  test("LocalDriverError model_missing → error_message preserves pull hint", async () => {
-    const { db, dir } = await freshDb("local-err-model");
+  test("EngineDriverError model_missing → error_message preserves pull hint", async () => {
+    const { db, dir } = await freshDb("engine-err-model");
     try {
       const { tg } = makeFakeTg();
       const driver = fakeDriver(
         "ollama",
-        new LocalDriverError(
+        new EngineDriverError(
           "ollama",
           "model_missing",
           "model not found: gemma3:e4b — pull with `ollama pull gemma3:e4b` on the host",
           404,
         ),
       );
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -290,14 +276,14 @@ describe("runLocalTurn — error rendering", () => {
   });
 
   test("in-stream error event also lands as audit status='error'", async () => {
-    const { db, dir } = await freshDb("local-err-stream");
+    const { db, dir } = await freshDb("engine-err-stream");
     try {
       const { tg } = makeFakeTg();
       const driver = fakeDriver("ollama", [
         { kind: "text", delta: "started" },
         { kind: "error", message: "OOM" },
       ]);
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -322,16 +308,16 @@ describe("runLocalTurn — error rendering", () => {
   });
 });
 
-describe("runLocalTurn — token capture", () => {
+describe("runEngineTurn — token capture", () => {
   test("done event token counts flow into audit", async () => {
-    const { db, dir } = await freshDb("local-tokens");
+    const { db, dir } = await freshDb("engine-tokens");
     try {
       const { tg } = makeFakeTg();
       const driver = fakeDriver("ollama", [
         { kind: "text", delta: "answer" },
-        { kind: "done", inputTokens: 42, outputTokens: 17 },
+        { kind: "done", inputTokens: 42, outputTokens: 17, costUsd: null },
       ]);
-      await runLocalTurn(
+      await runEngineTurn(
         {
           tg,
           db,
@@ -349,7 +335,7 @@ describe("runLocalTurn — token capture", () => {
         .get() as { input_tokens: number; output_tokens: number; cost_usd: number };
       expect(row.input_tokens).toBe(42);
       expect(row.output_tokens).toBe(17);
-      // Local engine is always zero-cost.
+      // Local mode is always zero-cost.
       expect(row.cost_usd).toBe(0);
     } finally {
       await rm(dir, { recursive: true, force: true });
@@ -357,6 +343,159 @@ describe("runLocalTurn — token capture", () => {
   });
 });
 
+describe("runEngineTurn — remote mode (OpenRouter)", () => {
+  test("audit.model is 'remote:openrouter:' with the slash-bearing slug intact", async () => {
+    const { db, dir } = await freshDb("remote-audit-tag");
+    try {
+      const { tg } = makeFakeTg();
+      const driver = fakeDriver("openrouter", [
+        { kind: "text", delta: "hi" },
+        { kind: "done", inputTokens: 12, outputTokens: 3, costUsd: 0.00007 },
+      ]);
+      await runEngineTurn(
+        {
+          tg,
+          db,
+          driver,
+          model: "anthropic/claude-3.5-sonnet",
+          timeoutMs: 5000,
+          historyLimit: 6,
+          soul: SOUL,
+          instanceMdPath: "/dev/null/nope",
+          isDefaultEngine: true,
+        },
+        { chatId: 99, fromId: 7, updateId: 1, prompt: "hello" },
+      );
+      const row = db.raw
+        .query("SELECT model FROM audit")
+        .get() as { model: string };
+      expect(row.model).toBe("remote:openrouter:anthropic/claude-3.5-sonnet");
+    } finally {
+      await rm(dir, { recursive: true, force: true });
+    }
+  });
+
+  test("driver-reported costUsd is written to audit.cost_usd in remote mode", async () => {
+    // The load-bearing path. Without this, remote turns bypass the hourly
+    // cost cap (COALESCE(SUM(cost_usd), 0) treats a 0 write as free).
+    const { db, dir } = await freshDb("remote-cost-capture");
+    try {
+      const { tg } = makeFakeTg();
+      const driver = fakeDriver("openrouter", [
+        { kind: "text", delta: "answer" },
+        { kind: "done", inputTokens: 100, outputTokens: 50, costUsd: 0.00125 },
+      ]);
+      await runEngineTurn(
+        {
+          tg,
+          db,
+          driver,
+          model: "openai/gpt-4o-mini",
+          timeoutMs: 5000,
+          historyLimit: 6,
+          soul: SOUL,
+          instanceMdPath: "/dev/null/nope",
+        },
+        { chatId: 1, fromId: 2, updateId: 1, prompt: "hi" },
+      );
+      const row = db.raw
+        .query("SELECT cost_usd FROM audit")
+        .get() as { cost_usd: number };
+      expect(row.cost_usd).toBe(0.00125);
+    } finally {
+      await rm(dir, { recursive: true, force: true });
+    }
+  });
+
+  test("remote mode with null costUsd → audit.cost_usd is NULL (not 0)", async () => {
+    // Defensive: if OpenRouter ever stops including cost, we must NOT write
+    // 0 (that would silently bypass the cap query) — null preserves the
+    // audit row but excludes it from the cap sum.
+    const { db, dir } = await freshDb("remote-cost-missing");
+    try {
+      const { tg } = makeFakeTg();
+      const driver = fakeDriver("openrouter", [
+        { kind: "text", delta: "answer" },
+        { kind: "done", inputTokens: 10, outputTokens: 2, costUsd: null },
+      ]);
+      await runEngineTurn(
+        {
+          tg,
+          db,
+          driver,
+          model: "openai/gpt-4o-mini",
+          timeoutMs: 5000,
+          historyLimit: 6,
+          soul: SOUL,
+          instanceMdPath: "/dev/null/nope",
+        },
+        { chatId: 1, fromId: 2, updateId: 1, prompt: "hi" },
+      );
+      const row = db.raw
+        .query("SELECT cost_usd FROM audit")
+        .get() as { cost_usd: number | null };
+      expect(row.cost_usd).toBe(null);
+    } finally {
+      await rm(dir, { recursive: true, force: true });
+    }
+  });
+
+  test("local mode ignores driver costUsd and writes 0", async () => {
+    // Symmetric guard: even if a future local-mode driver started reporting
+    // a cost field, we ignore it — local mode is always free.
+    const { db, dir } = await freshDb("local-mode-cost-zero");
+    try {
+      const { tg } = makeFakeTg();
+      const driver = fakeDriver("ollama", [
+        { kind: "text", delta: "answer" },
+        // Pretend a driver erroneously reports cost even in local mode.
+        { kind: "done", inputTokens: 5, outputTokens: 3, costUsd: 0.99 },
+      ]);
+      await runEngineTurn(
+        {
+          tg,
+          db,
+          driver,
+          model: "gemma3:e4b",
+          timeoutMs: 5000,
+          historyLimit: 6,
+          soul: SOUL,
+          instanceMdPath: "/dev/null/nope",
+        },
+        { chatId: 1, fromId: 2, updateId: 1, prompt: "hi" },
+      );
+      const row = db.raw
+        .query("SELECT cost_usd FROM audit")
+        .get() as { cost_usd: number };
+      expect(row.cost_usd).toBe(0);
+    } finally {
+      await rm(dir, { recursive: true, force: true });
+    }
+  });
+});
+
+describe("buildRemoteCapabilityNote", () => {
+  test("injects per-token cost framing", () => {
+    const note = buildRemoteCapabilityNote({
+      toolsEnabled: false,
+      isDefaultEngine: true,
+      toolNames: [],
+    });
+    expect(note).toMatch(/per-token via OpenRouter/);
+    expect(note).not.toMatch(/cost the operator nothing/);
+  });
+
+  test("local builder keeps the free-cost framing", () => {
+    const note = buildLocalCapabilityNote({
+      toolsEnabled: false,
+      isDefaultEngine: true,
+      toolNames: [],
+    });
+    expect(note).toMatch(/cost the operator nothing/);
+  });
+
+});
+
 describe("scrubLocalControlTokens", () => {
   test("strips paired <|channel>... block including header content", () => {
     const input = "<|channel>thoughtIt is 8:44 PM in London.";
diff --git a/src/local.ts b/src/engine.ts
similarity index 67%
rename from src/local.ts
rename to src/engine.ts
index 0954023..f15a689 100644
--- a/src/local.ts
+++ b/src/engine.ts
@@ -1,65 +1,76 @@
 /**
- * @fileoverview Local-engine runner for Telegram messages routed to the
- *               `local` engine (default no-prefix path).
- * @purpose Stream a chat completion from a `LocalDriver` (Ollama or LMStudio)
- *          into the same Telegram throttled-edit UX that `agent.ts` uses for
- *          the Anthropic SDK path.
+ * @fileoverview Engine-slot runner for Telegram messages routed to the
+ *               `local` engine slot (default no-prefix path). Despite the slot
+ *               name, the runner is mode-polymorphic: both on-host backends
+ *               (Ollama, LMStudio — `driver.mode === "local"`) and hosted
+ *               providers (OpenRouter — `driver.mode === "remote"`) reach
+ *               Telegram through this path.
+ * @purpose Stream a chat completion from an `EngineDriver` into the same
+ *          Telegram throttled-edit UX that `agent.ts` uses for the Anthropic
+ *          SDK path.
  *
- * One call to `runLocalTurn` = one turn against the local model. The function:
- *   1. inserts the in-progress audit row tagged `model='local::'`;
+ * One call to `runEngineTurn` = one turn against the driver. The function:
+ *   1. inserts the in-progress audit row tagged
+ *      `model='::'`;
  *   2. assembles a chat-style messages array — system prompt + capability note,
  *      optional SOLRAC.md overlay, prior history reconstructed from `audit`,
  *      current user prompt;
- *   3. iterates the driver's normalized `LocalChatEvent` stream — `text`,
+ *   3. iterates the driver's normalized `EngineChatEvent` stream — `text`,
  *      `tool_call` (single-shot path ignores them), `done`, `error`;
  *   4. throttle-edits the 💻 stub with rendered partial text;
- *   5. finalizes the audit row with token counts, `cost_usd = 0`,
+ *   5. finalizes the audit row with token counts, `cost_usd`,
  *      `agent_session_id = null`, `tool_calls = null`;
- *   6. on error, renders a clear failure (`❌ local unreachable`, etc.) and
+ *   6. on error, renders a clear failure (`❌  unreachable`, etc.) and
  *      writes `status='error'` with the diagnostic in `error_message`.
  *
  * Why a sibling module (not a branch in `agent.ts`):
  *   - The Anthropic SDK runner depends on `@anthropic-ai/claude-agent-sdk`,
  *     `policy.ts` hooks, the per-chat `SessionStore`, the SDK preset prompt,
- *     the SDK env scrub. The local path needs none of that.
+ *     the SDK env scrub. The engine-slot path needs none of that.
  *   - Pure inference (single-shot): no `canUseTool`, no `PreToolUse` hook,
- *     no `disallowedTools`. The cost cap is unaffected because local writes
- *     `cost_usd = 0`; the global cap query sums every row regardless.
+ *     no `disallowedTools`. For `mode === "local"`, `cost_usd` is always 0
+ *     and the global cap query sums every row regardless. For
+ *     `mode === "remote"`, the driver-reported cost is summed too.
  *
  * Stateful history: conversation continuity within a chat, across every engine
  * boundary. `db.recentChatTurns(chatId, limit)` returns the last N successful
  * turns in chronological order regardless of which engine produced them. Each
  * contributes a user/assistant pair before the current turn. Default limit is
- * `LOCAL_HISTORY_LIMIT=6` (three round-trips). Cross-engine means a local
+ * `LOCAL_HISTORY_LIMIT=6` (three round-trips). Cross-engine means an engine-slot
  * follow-up to a prior Claude exchange sees the Claude response.
  *
  * Position in the dependency graph:
- *   db + policy + telegram + log + local-driver → local → consumed by main
+ *   db + policy + telegram + log + engine-driver + local-driver + remote-driver
+ *     → engine → consumed by main
  *
  * Exports:
- *   - `runLocalTurn(deps, input)` — runs one local turn end-to-end.
- *   - `LocalRunDeps` — runtime deps (tg, db, driver, model, timeout, history).
- *   - `LocalRunInput` — per-turn input (chatId, fromId, updateId, prompt).
- *   - `buildLocalCapabilityNote` — engine-specific clause appended to SOUL.md
- *     before it ships as the first `system` message.
- *   - `buildToolCapabilityNote` — convenience for the tools-on path.
+ *   - `runEngineTurn(deps, input)` — runs one engine-slot turn end-to-end.
+ *   - `EngineRunDeps` — runtime deps (tg, db, driver, model, timeout, history).
+ *   - `EngineRunInput` — per-turn input (chatId, fromId, updateId, prompt).
+ *   - `buildToolCapabilityNote` — back-compat dispatcher that picks the
+ *     mode-specific tool-capability note (local/remote driver files own the
+ *     actual builders). Deletes when commands.ts migrates in Phase 5.
  *
  * Key invariants:
  *   - Audit row is inserted BEFORE the driver call (`status='in_progress'`)
  *     and updated to `'ok'`/`'error'` after; lifecycle drain prevents
  *     orphaned in-progress rows on graceful shutdown.
- *   - `cost_usd` is always `0` and `agent_session_id` is always `null`.
+ *   - For `mode === "local"`, `cost_usd` is always `0`; `agent_session_id`
+ *     is always `null`.
  *   - The streaming editor reuses the `lastEditedContent` no-op-edit guard
  *     and 1.5s throttle so the UX matches the Claude path.
- *   - The footer (`✅ local:: · Ns`) is load-bearing —
- *     guarantees the final edit differs from any streaming render so Telegram
- *     doesn't 400 on a no-op.
+ *   - The footer (`:: · Ns [· $C.CCCC]`) is
+ *     load-bearing — guarantees the final edit differs from any streaming
+ *     render so Telegram doesn't 400 on a no-op. The cost segment appears
+ *     only in remote mode when the driver reported a cost (matches audit).
  *
  * Cross-references:
  *   - docs/ARCHITECTURE.md#local-routing — design discussion
  *   - policy.ts::parseEnginePrefix — engine prefix detection (called from main.ts)
- *   - main.ts::makeRunTurn — dispatcher between runAgent and runLocalTurn
- *   - local-driver.ts — wire-format abstraction (Ollama NDJSON / LMStudio SSE)
+ *   - main.ts::makeRunTurn — dispatcher between runAgent and runEngineTurn
+ *   - engine-driver.ts — shared wire-format-agnostic contract
+ *   - local-driver.ts — Ollama NDJSON / LMStudio SSE
+ *   - remote-driver.ts — OpenRouter SSE
  */
 
 import type { SdkMcpToolDefinition } from "@anthropic-ai/claude-agent-sdk";
@@ -68,10 +79,18 @@ import type { SessionStore } from "./session.ts";
 import { readInstanceMd, wrapInstanceMd } from "./instance.ts";
 import type { IntegrationTier } from "./integrations.ts";
 import {
-  type LocalChatMessage,
-  type LocalDriver,
-  LocalDriverError,
+  type EngineChatMessage,
+  type EngineDriver,
+  EngineDriverError,
+} from "./engine-driver.ts";
+import {
+  buildLocalCapabilityNote,
+  buildLocalToolCapabilityNote,
 } from "./local-driver.ts";
+import {
+  buildRemoteCapabilityNote,
+  buildRemoteToolCapabilityNote,
+} from "./remote-driver.ts";
 import { log } from "./log.ts";
 import {
   createLoopDetector,
@@ -80,10 +99,10 @@ import {
 } from "./policy.ts";
 import { mdToTelegramHtml } from "./markdown.ts";
 import {
-  mcpToLocalTools,
+  mcpToEngineTools,
   runToolLoop,
   type RunToolLoopRenderer,
-} from "./local-tools.ts";
+} from "./engine-tools.ts";
 import { skillToolCtx } from "./skill-tools.ts";
 import { htmlEscapeText, type TelegramClient } from "./telegram.ts";
 
@@ -91,79 +110,19 @@ const TELEGRAM_TEXT_MAX = 3800;
 const EDIT_THROTTLE_MS = 1500;
 const THINKING_STUB = "💻 thinking…";
 
-/**
- * Engine-specific capability statement appended to SOUL.md before it ships
- * as the first `system` message. The appropriate cell is picked at boot from
- * `(toolsEnabled, isDefaultEngine)`. SOUL.md ships engine-agnostic so the
- * same file serves every engine path; this builder is where engine-specific
- * facts (tools surface, escalation prefixes) get layered in.
- *
- * Matrix:
- *   tools=off, default=local    → "you are the default; for tool-driven work prefix @ or !"
- *   tools=off, default=Claude   → "you do not have tools; redirect tool requests to @ or !"
- *   tools=on,  default=local    → "you are the default; you have these tools: ; escalate via @ / !"
- *   tools=on,  default=Claude   → unreachable (boot validation in config.ts rejects this combo);
- *                                 falls through to the tools-on default-engine cell defensively.
- */
-export interface LocalCapabilityNoteOpts {
-  toolsEnabled: boolean;
-  isDefaultEngine: boolean;
-  toolNames: ReadonlyArray;
-}
-
-export function buildLocalCapabilityNote(opts: LocalCapabilityNoteOpts): string {
-  const { toolsEnabled, isDefaultEngine, toolNames } = opts;
-  if (toolsEnabled) {
-    const list = toolNames.join(", ");
-    return (
-      "You are the default chat engine; your replies cost the operator nothing. " +
-      `You have these tools available: ${list}. ` +
-      "Call them when the user's request needs information or actions you " +
-      "can't deliver from your training alone (current data, external APIs, " +
-      "operator-authored integrations). Tool results return into your " +
-      "context — never tell the user 'I cannot do that' if a listed tool can. " +
-      "If a request is too complex for these tools or for local reasoning, " +
-      "suggest the user re-send with `@` (Sonnet) or `!` (Opus) for heavier reasoning."
-    );
-  }
-  if (isDefaultEngine) {
-    return (
-      "You are the default chat engine; your replies cost the operator nothing. " +
-      "You do not have tools — answer from what you know. " +
-      "If the user asks for something that needs tools (file edits, API calls, " +
-      "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
-      "or `!` (Opus) to escalate to a Claude tier."
-    );
-  }
-  return (
-    "You do not have tools; answer from what you know. " +
-    "If the user asks for something that needs tools (file edits, API calls, " +
-    "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
-    "or `!` (Opus)."
-  );
-}
-
-/**
- * Convenience for the tools-on path. Defers to `buildLocalCapabilityNote` so
- * the matrix has a single source of truth. Exported so the skill tool-loop
- * runner in commands.ts can build the same capability note for skill bodies
- * without duplicating the matrix.
- */
-export function buildToolCapabilityNote(
-  toolNames: ReadonlyArray,
-  isDefaultEngine: boolean,
-): string {
-  return buildLocalCapabilityNote({ toolsEnabled: true, isDefaultEngine, toolNames });
-}
-
-export interface LocalRunDeps {
+export interface EngineRunDeps {
   tg: TelegramClient;
   db: SolracDb;
   // `/clear local` cutoff store. Reads `getLocalCutoff(chatId)` once per
   // turn before assembling history. Optional for back-compat with tests that
   // construct deps inline; production wiring in main.ts always provides it.
   sessions?: SessionStore;
-  driver: LocalDriver;
+  // The driver is the single source of truth for engine mode — each impl
+  // sets `mode` (`createOllamaDriver`/`createLmstudioDriver` → "local";
+  // `createOpenrouterDriver` → "remote"). The runner reads `driver.mode`
+  // wherever it needs to branch: audit tag prefix, capability-note framing,
+  // cost-write decision.
+  driver: EngineDriver;
   model: string;
   timeoutMs: number;
   historyLimit: number;
@@ -176,18 +135,18 @@ export interface LocalRunDeps {
   // capability note's tone (default chat engine vs. tools-less escape hatch).
   isDefaultEngine?: boolean;
   // Tools surface. When `toolEnabled === true && tools.length > 0`,
-  // `runLocalTurn` dispatches through `runToolLoop` so the local model can
+  // `runEngineTurn` dispatches through `runToolLoop` so the model can
   // call the same `mcp__solrac__*` integrations Claude tiers see.
   toolEnabled?: boolean;
   tools?: ReadonlyArray>;
   toolTiers?: ReadonlyMap;
   broker?: Pick;
-  // `LOCAL_MAX_TOOL_ITERATIONS`. Defaults to 8; only consulted when tools
-  // are enabled.
+  // `LOCAL_MAX_TOOL_ITERATIONS` / `REMOTE_MAX_TOOL_ITERATIONS`. Defaults to
+  // 8; only consulted when tools are enabled.
   maxToolIterations?: number;
 }
 
-export interface LocalRunInput {
+export interface EngineRunInput {
   chatId: number;
   fromId: number;
   // Nullable for synthesized scheduler updates — they don't ride the poll
@@ -200,24 +159,31 @@ export interface LocalRunInput {
   scheduledTaskName?: string | null;
 }
 
-export async function runLocalTurn(
-  deps: LocalRunDeps,
-  input: LocalRunInput,
+export async function runEngineTurn(
+  deps: EngineRunDeps,
+  input: EngineRunInput,
 ): Promise {
   const backend = deps.driver.backend;
+  const mode = deps.driver.mode;
+  // Audit-tag prefix tracks `mode` (not backend) so cross-engine queries
+  // (`/clear local`, the bridge in `outOfBandForEngine`, `/status` counters)
+  // can pattern-match the engine SLOT, not the wire protocol. `remote:%`
+  // rows also carry real costs in `cost_usd` so the existing hourly caps
+  // gate them — `local:%` rows are always free (cost_usd = 0).
+  const modelTag = `${mode}:${backend}:${deps.model}`;
   const auditId = deps.db.insertAudit({
     chatId: input.chatId,
     fromId: input.fromId,
     updateId: input.updateId,
     prompt: truncateAuditPrompt(input.prompt),
     startedAt: Date.now(),
-    model: `local:${backend}:${deps.model}`,
+    model: modelTag,
     origin: input.scheduledTaskName ? "scheduled" : "user",
     taskName: input.scheduledTaskName ?? null,
   });
 
   const stub = await deps.tg.sendMessage(input.chatId, THINKING_STUB).catch((err) => {
-    log.warn("local.stub_send_failed", { auditId, error: (err as Error).message });
+    log.warn("engine.stub_send_failed", { auditId, error: (err as Error).message });
     return null;
   });
   const stubId = stub && typeof stub === "object" ? stub.message_id : null;
@@ -232,15 +198,19 @@ export async function runLocalTurn(
     deps.toolTiers !== undefined &&
     deps.broker !== undefined
   ) {
-    return runLocalTurnWithTools(deps, input, auditId, stubId);
+    return runEngineTurnWithTools(deps, input, auditId, stubId);
   }
 
-  const capabilityNote = buildLocalCapabilityNote({
+  const noteOpts = {
     toolsEnabled: false,
     isDefaultEngine: deps.isDefaultEngine === true,
     toolNames: [],
-  });
-  const messages: LocalChatMessage[] = [
+  };
+  const capabilityNote =
+    mode === "remote"
+      ? buildRemoteCapabilityNote(noteOpts)
+      : buildLocalCapabilityNote(noteOpts);
+  const messages: EngineChatMessage[] = [
     { role: "system", content: `${deps.soul}\n\n${capabilityNote}` },
   ];
   // Re-read SOLRAC.md per turn so operator edits land on the next message.
@@ -276,6 +246,12 @@ export async function runLocalTurn(
   let lastEditedContent = THINKING_STUB;
   let inputTokens: number | null = null;
   let outputTokens: number | null = null;
+  // Captured from the driver's done event. Distinct from the value written to
+  // `audit.cost_usd` — local mode writes 0 regardless of what the driver
+  // reported (always null in practice for on-host backends); remote mode
+  // writes this value, or null + a `remote.cost_missing` warn if the driver
+  // didn't report one. See the audit-write site below for the matrix.
+  let driverCostUsd: number | null = null;
   let isError = false;
   let errorMessage: string | null = null;
 
@@ -301,6 +277,7 @@ export async function runLocalTurn(
       } else if (evt.kind === "done") {
         inputTokens = evt.inputTokens;
         outputTokens = evt.outputTokens;
+        driverCostUsd = evt.costUsd;
       } else if (evt.kind === "error") {
         errorMessage = `local error: ${evt.message}`;
         isError = true;
@@ -310,7 +287,7 @@ export async function runLocalTurn(
       // didn't offer. Surface to logs but don't break — the model will likely
       // also produce text we can show.
       else if (evt.kind === "tool_call") {
-        log.warn("local.unexpected_tool_call_single_shot", {
+        log.warn("engine.unexpected_tool_call_single_shot", {
           auditId,
           tool: evt.call.function.name,
         });
@@ -318,9 +295,9 @@ export async function runLocalTurn(
     }
   } catch (err) {
     isError = true;
-    if (err instanceof LocalDriverError) {
+    if (err instanceof EngineDriverError) {
       errorMessage = formatDriverError(err, deps.timeoutMs);
-      log.error("local.driver_failed", {
+      log.error("engine.driver_failed", {
         auditId,
         backend,
         code: err.code,
@@ -329,8 +306,8 @@ export async function runLocalTurn(
       });
     } else {
       const e = err as Error;
-      errorMessage = `local unexpected error: ${e.message}`;
-      log.error("local.unexpected_error", {
+      errorMessage = `engine unexpected error: ${e.message}`;
+      log.error("engine.unexpected_error", {
         auditId,
         backend,
         error: e.message,
@@ -348,7 +325,7 @@ export async function runLocalTurn(
   const elapsedSec = (Date.now() - startedAt) / 1000;
   const finalRender: Rendered = isError
     ? renderError(errorMessage ?? "unknown")
-    : renderFinal(assistantText, backend, deps.model, elapsedSec);
+    : renderFinal(assistantText, mode, backend, deps.model, elapsedSec, driverCostUsd);
 
   if (stubId !== null) {
     if (finalRender.html !== lastEditedContent) {
@@ -358,7 +335,7 @@ export async function runLocalTurn(
         stubId,
         finalRender.html,
         finalRender.markdown,
-        "local.edit_final_failed",
+        "engine.edit_final_failed",
       );
     }
   } else if (!isError && assistantText.trim()) {
@@ -367,9 +344,20 @@ export async function runLocalTurn(
         parse_mode: "HTML",
         markdownSource: finalRender.markdown,
       })
-      .catch((err) => log.warn("local.final_send_failed", { error: (err as Error).message }));
+      .catch((err) => log.warn("engine.final_send_failed", { error: (err as Error).message }));
   }
 
+  // Cost-write matrix:
+  //   mode=local  → 0 (on-host backends are free; driverCostUsd ignored).
+  //   mode=remote && driverCostUsd != null → driverCostUsd (real billed cost).
+  //   mode=remote && driverCostUsd == null → null + remote.cost_missing warn.
+  // The third row writes `null` (NOT 0) because COALESCE(SUM(cost_usd), 0)
+  // in the cap query would treat 0 as "free" and silently bypass the cap.
+  // Writing null preserves the audit row but excludes it from the cap sum;
+  // operators see the warn and can react (capture from /generation lookup if
+  // OpenRouter ever stops including cost in the streaming chunk).
+  const costForAudit = resolveAuditCost(mode, driverCostUsd, auditId, backend);
+
   deps.db.updateAuditEnd({
     id: auditId,
     response: assistantText || null,
@@ -379,31 +367,59 @@ export async function runLocalTurn(
     // Local engine doesn't expose cache telemetry — the API is stateless per call.
     cacheCreationInputTokens: null,
     cacheReadInputTokens: null,
-    costUsd: 0,
+    costUsd: costForAudit,
     agentSessionId: null,
     status: isError ? "error" : "ok",
     errorMessage,
     endedAt: Date.now(),
   });
 
-  log.info("local.done", {
+  log.info("engine.done", {
     auditId,
     chatId: input.chatId,
     backend,
+    mode,
     model: deps.model,
     elapsedSec,
     inputTokens,
     outputTokens,
+    costUsd: costForAudit,
     isError,
   });
 }
 
+/**
+ * Pick the value written to `audit.cost_usd` for an engine-slot turn.
+ * See the matrix comment at the call site for the rationale; this helper is
+ * extracted so the tools-on path reuses the same decision (and so a single
+ * future bug fix lands in one place).
+ */
+function resolveAuditCost(
+  mode: "local" | "remote",
+  driverCostUsd: number | null,
+  auditId: number,
+  backend: string,
+): number | null {
+  if (mode === "local") return 0;
+  if (driverCostUsd === null) {
+    log.warn("remote.cost_missing", {
+      auditId,
+      backend,
+      hint:
+        "driver returned no cost in the usage chunk; audit.cost_usd written as NULL " +
+        "to keep the row out of cap sums. Verify the backend still emits usage.cost.",
+    });
+    return null;
+  }
+  return driverCostUsd;
+}
+
 interface Rendered {
   html: string;
   markdown: string;
 }
 
-function formatDriverError(err: LocalDriverError, timeoutMs: number): string {
+function formatDriverError(err: EngineDriverError, timeoutMs: number): string {
   switch (err.code) {
     case "timeout":
       return `local timed out after ${(timeoutMs / 1000).toFixed(0)}s`;
@@ -445,17 +461,23 @@ function renderStreamingStub(text: string): Rendered {
 
 function renderFinal(
   text: string,
+  mode: "local" | "remote",
   backend: string,
   model: string,
   elapsedSec: number,
+  costUsd: number | null,
 ): Rendered {
   const scrubbed = scrubLocalControlTokens(text);
   const hasText = scrubbed.trim().length > 0;
   const htmlBody = hasText ? mdToTelegramHtml(scrubbed) : "(empty response)";
   const mdBody = hasText ? scrubbed : "(empty response)";
-  const tag = `local:${backend}:${model}`;
-  const htmlFooter = `✅ ${htmlEscapeText(tag)} · ${elapsedSec.toFixed(1)}s`;
-  const mdFooter = `*✅ ${tag} · ${elapsedSec.toFixed(1)}s*`;
+  // Footer tag mirrors the audit-row tag so operators reading either source
+  // see the same identifier — `remote:openrouter:anthropic/claude-3.5-sonnet`
+  // or `local:ollama:gemma3:4b` — and can grep across both surfaces.
+  const tag = `${mode}:${backend}:${model}`;
+  const costChip = formatFooterCost(mode, costUsd);
+  const htmlFooter = `✅ ${htmlEscapeText(tag)} · ${elapsedSec.toFixed(1)}s${costChip}`;
+  const mdFooter = `*✅ ${tag} · ${elapsedSec.toFixed(1)}s${costChip}*`;
   return {
     html: truncate(`${htmlBody}\n\n${htmlFooter}`, TELEGRAM_TEXT_MAX),
     markdown: `${mdBody}\n\n${mdFooter}`,
@@ -475,7 +497,7 @@ async function tryEdit(
   messageId: number,
   text: string,
   markdownSource: string | undefined,
-  errEvent: string = "local.edit_throttled",
+  errEvent: string = "engine.edit_throttled",
 ): Promise {
   await tg
     .editMessageText(chatId, messageId, text, { parse_mode: "HTML", markdownSource })
@@ -486,15 +508,25 @@ function truncate(s: string, max: number): string {
   return s.length <= max ? s : s.slice(0, max - 1) + "…";
 }
 
+// Cost chip for the engine-slot footer. Mirrors the audit-write matrix in
+// `resolveAuditCost`: local mode is always free (no chip even if a driver
+// erroneously reports cost); remote mode shows the chip only when the driver
+// surfaced a real number. A null in remote mode means OpenRouter omitted the
+// usage.cost field — `remote.cost_missing` already logs; the UI just stays quiet.
+function formatFooterCost(mode: "local" | "remote", costUsd: number | null): string {
+  if (mode !== "remote" || costUsd === null) return "";
+  return ` · $${costUsd.toFixed(4)}`;
+}
+
 // ---------------------------------------------------------------------------
 // Tools-on path — dispatches through `runToolLoop`
 // ---------------------------------------------------------------------------
 
 const DEFAULT_MAX_TOOL_ITERATIONS = 8;
 
-async function runLocalTurnWithTools(
-  deps: LocalRunDeps,
-  input: LocalRunInput,
+async function runEngineTurnWithTools(
+  deps: EngineRunDeps,
+  input: EngineRunInput,
   auditId: number,
   stubId: number | null,
 ): Promise {
@@ -503,16 +535,21 @@ async function runLocalTurnWithTools(
   const broker = deps.broker!;
   const maxIterations = deps.maxToolIterations ?? DEFAULT_MAX_TOOL_ITERATIONS;
   const backend = deps.driver.backend;
+  const mode = deps.driver.mode;
 
   const toolNames = tools.map((t) => t.name);
-  const capabilityNote = buildToolCapabilityNote(toolNames, deps.isDefaultEngine === true);
-  const toolDefs = mcpToLocalTools(tools);
+  const isDefault = deps.isDefaultEngine === true;
+  const capabilityNote =
+    mode === "remote"
+      ? buildRemoteToolCapabilityNote(toolNames, isDefault)
+      : buildLocalToolCapabilityNote(toolNames, isDefault);
+  const toolDefs = mcpToEngineTools(tools);
   const toolMap = new Map(tools.map((t) => [t.name, t]));
 
   // Build initial messages — same shape as the single-shot path, only the
   // capability note differs. Inlined rather than factored to keep the diff
   // for the tools-on path scoped.
-  const initialMessages: LocalChatMessage[] = [
+  const initialMessages: EngineChatMessage[] = [
     { role: "system", content: `${deps.soul}\n\n${capabilityNote}` },
   ];
   const instanceMd = readInstanceMd(deps.instanceMdPath);
@@ -589,11 +626,13 @@ async function runLocalTurnWithTools(
     ? renderError(result.errorMessage ?? "unknown")
     : renderToolLoopFinal(
         result.assistantText,
+        mode,
         backend,
         deps.model,
         elapsedSec,
         result.toolsFired,
         result.iterationCapHit,
+        result.costUsd,
       );
 
   if (stubId !== null) {
@@ -604,7 +643,7 @@ async function runLocalTurnWithTools(
         stubId,
         finalRender.html,
         finalRender.markdown,
-        "local.edit_final_failed",
+        "engine.edit_final_failed",
       );
     }
   } else if (!isError && result.assistantText.trim()) {
@@ -614,10 +653,18 @@ async function runLocalTurnWithTools(
         markdownSource: finalRender.markdown,
       })
       .catch((err) =>
-        log.warn("local.final_send_failed", { error: (err as Error).message }),
+        log.warn("engine.final_send_failed", { error: (err as Error).message }),
       );
   }
 
+  // Tool-loop cost: under remote mode, each round writes its own usage chunk
+  // with its own per-round cost. `runToolLoop` only surfaces the FINAL round's
+  // costUsd (via `result.costUsd`), which would undercount multi-round remote
+  // turns. The fix lives in `local-tools.ts` (sum costs across rounds);
+  // here we pass the summed value through the same `resolveAuditCost` matrix
+  // as the single-shot path.
+  const costForAudit = resolveAuditCost(mode, result.costUsd, auditId, backend);
+
   deps.db.updateAuditEnd({
     id: auditId,
     response: result.assistantText || null,
@@ -629,21 +676,23 @@ async function runLocalTurnWithTools(
     outputTokens: result.outputTokens,
     cacheCreationInputTokens: null,
     cacheReadInputTokens: null,
-    costUsd: 0,
+    costUsd: costForAudit,
     agentSessionId: null,
     status: isError ? "error" : "ok",
     errorMessage: result.errorMessage,
     endedAt: Date.now(),
   });
 
-  log.info("local.done", {
+  log.info("engine.done", {
     auditId,
     chatId: input.chatId,
     backend,
+    mode,
     model: deps.model,
     elapsedSec,
     inputTokens: result.inputTokens,
     outputTokens: result.outputTokens,
+    costUsd: costForAudit,
     toolsFired: result.toolsFired,
     iterationCapHit: result.iterationCapHit,
     isError,
@@ -682,11 +731,13 @@ function renderToolLoopStub(
 
 function renderToolLoopFinal(
   text: string,
+  mode: "local" | "remote",
   backend: string,
   model: string,
   elapsedSec: number,
   toolsFired: number,
   iterationCapHit: boolean,
+  costUsd: number | null,
 ): Rendered {
   const scrubbed = scrubLocalControlTokens(text);
   const hasText = scrubbed.trim().length > 0;
@@ -696,9 +747,10 @@ function renderToolLoopFinal(
     ? `⚠️ stopped after ${toolsFired} tool iterations · `
     : "";
   const toolsChip = toolsFired > 0 ? `${toolsFired} tools · ` : "";
-  const tag = `local:${backend}:${model}`;
-  const htmlFooter = `✅ ${htmlEscapeText(tag)} · ${capChip}${toolsChip}${elapsedSec.toFixed(1)}s`;
-  const mdFooter = `*✅ ${tag} · ${capChip}${toolsChip}${elapsedSec.toFixed(1)}s*`;
+  const tag = `${mode}:${backend}:${model}`;
+  const costChip = formatFooterCost(mode, costUsd);
+  const htmlFooter = `✅ ${htmlEscapeText(tag)} · ${capChip}${toolsChip}${elapsedSec.toFixed(1)}s${costChip}`;
+  const mdFooter = `*✅ ${tag} · ${capChip}${toolsChip}${elapsedSec.toFixed(1)}s${costChip}*`;
   return {
     html: truncate(`${htmlBody}\n\n${htmlFooter}`, TELEGRAM_TEXT_MAX),
     markdown: `${mdBody}\n\n${mdFooter}`,
diff --git a/src/instance.ts b/src/instance.ts
index f0f03ec..dcacecb 100644
--- a/src/instance.ts
+++ b/src/instance.ts
@@ -9,10 +9,11 @@
  *
  *   - `SOUL.md` — voice, stance, safety. Read once at boot via `loadSoul`;
  *     hard-fails if missing or empty. Joined into Claude's
- *     `systemPrompt.append` and the local engine's first `system` message.
+ *     `systemPrompt.append` and the engine-slot's first `system` message.
  *     Per-engine capability deltas ("you have tools" / "you don't") stay in
- *     code next to each engine's wiring (see `agent.ts::buildClaudeCapabilityNote`
- *     and `local.ts::buildLocalCapabilityNote`) so SOUL.md stays portable.
+ *     code next to each engine's wiring (see `agent.ts::buildClaudeCapabilityNote`,
+ *     `local-driver.ts::buildLocalCapabilityNote`, and
+ *     `remote-driver.ts::buildRemoteCapabilityNote`) so SOUL.md stays portable.
  *
  *   - `SOLRAC.md` — operator overlay (operator name, channel posture, project
  *     hints). Re-read per turn via `readInstanceMd` so live edits take effect
@@ -36,7 +37,7 @@
  *   their voice edits.
  *
  * Position in the dependency graph:
- *   log → instance → consumed by main, agent, local
+ *   log → instance → consumed by main, agent, engine
  *
  * Exports:
  *   - `INSTANCE_FILE_NAMES` — `{ SOUL: "SOUL.md", SOLRAC: "SOLRAC.md" }`.
@@ -64,7 +65,7 @@
  *   - SOUL.md — canonical default voice (embedded into the binary)
  *   - SOLRAC.md — operator overlay template (embedded into the binary)
  *   - agent.ts::runAgent — Claude path consumer
- *   - local.ts::runLocalTurn — local path consumer
+ *   - engine.ts::runEngineTurn — engine-slot path consumer
  *   - main.ts — boot wires bootstrap + load
  *   - text-modules.d.ts — ambient string type for `*.md` text imports
  */
diff --git a/src/local-driver.test.ts b/src/local-driver.test.ts
index dffd2a2..91c3c27 100644
--- a/src/local-driver.test.ts
+++ b/src/local-driver.test.ts
@@ -1,10 +1,12 @@
 /**
- * @fileoverview Unit tests for `local-driver.ts` — both backends.
+ * @fileoverview Unit tests for `local-driver.ts` — on-host backends only
+ *               (Ollama + LMStudio). OpenRouter tests live in
+ *               `remote-driver.test.ts`.
  * @proves NDJSON and SSE wire-format parsing, partial-line buffering,
  *         multi-event-per-chunk, tool-call arg-delta accumulation,
  *         Gemma-4 dedup, usage-chunk capture, error paths.
  *
- * Both drivers ship with handwritten-fake fetches (no mocking framework,
+ * Drivers ship with handwritten-fake fetches (no mocking framework,
  * per CLAUDE.md Testing Philosophy). Each test constructs a `Response` with
  * a `ReadableStream` body so the driver consumes real chunk boundaries —
  * partial-line / partial-event behavior is exercised by hand-splitting the
@@ -13,11 +15,10 @@
 
 import { describe, expect, test } from "bun:test";
 import {
-  createLmstudioDriver,
-  createOllamaDriver,
-  LocalDriverError,
-  type LocalChatEvent,
-} from "./local-driver.ts";
+  EngineDriverError,
+  type EngineChatEvent,
+} from "./engine-driver.ts";
+import { createLmstudioDriver, createOllamaDriver } from "./local-driver.ts";
 
 // ---------------------------------------------------------------------------
 // Test helpers
@@ -49,9 +50,9 @@ function fakeFetch(
 }
 
 async function collectEvents(
-  iter: AsyncIterable,
-): Promise {
-  const out: LocalChatEvent[] = [];
+  iter: AsyncIterable,
+): Promise {
+  const out: EngineChatEvent[] = [];
   for await (const evt of iter) out.push(evt);
   return out;
 }
@@ -119,7 +120,7 @@ describe("OllamaDriver — streamChat text", () => {
     );
     expect(events).toEqual([
       { kind: "text", delta: "hello" },
-      { kind: "done", inputTokens: 5, outputTokens: 3 },
+      { kind: "done", inputTokens: 5, outputTokens: 3 , costUsd: null },
     ]);
   });
 
@@ -140,9 +141,9 @@ describe("OllamaDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    const texts = events.filter((e): e is LocalChatEvent & { kind: "text" } => e.kind === "text").map((e) => e.delta);
+    const texts = events.filter((e): e is EngineChatEvent & { kind: "text" } => e.kind === "text").map((e) => e.delta);
     expect(texts.join("")).toBe("hello");
-    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: 1, outputTokens: 2 });
+    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: 1, outputTokens: 2 , costUsd: null });
   });
 
   test("tool_calls on final frame produces tool_call events", async () => {
@@ -165,7 +166,7 @@ describe("OllamaDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "what time?" }] }),
     );
-    const toolEvt = events.find((e): e is LocalChatEvent & { kind: "tool_call" } => e.kind === "tool_call");
+    const toolEvt = events.find((e): e is EngineChatEvent & { kind: "tool_call" } => e.kind === "tool_call");
     expect(toolEvt?.call.function.name).toBe("time_now");
     expect(toolEvt?.call.function.arguments).toEqual({ tz: "UTC" });
   });
@@ -203,7 +204,7 @@ describe("OllamaDriver — streamChat text", () => {
 });
 
 describe("OllamaDriver — streamChat errors", () => {
-  test("HTTP 404 → LocalDriverError model_missing with pull hint", async () => {
+  test("HTTP 404 → EngineDriverError model_missing with pull hint", async () => {
     const fetch = fakeFetch(
       () => new Response(JSON.stringify({ error: "model not found" }), { status: 404 }),
     );
@@ -214,13 +215,13 @@ describe("OllamaDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("model_missing");
-      expect((err as LocalDriverError).message).toMatch(/ollama pull gemma3:e4b/);
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("model_missing");
+      expect((err as EngineDriverError).message).toMatch(/ollama pull gemma3:e4b/);
     }
   });
 
-  test("HTTP 500 → LocalDriverError http_error", async () => {
+  test("HTTP 500 → EngineDriverError http_error", async () => {
     const fetch = fakeFetch(() => new Response("oom", { status: 500 }));
     const driver = createOllamaDriver({ url: "http://x", fetch });
     try {
@@ -229,13 +230,13 @@ describe("OllamaDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("http_error");
-      expect((err as LocalDriverError).status).toBe(500);
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("http_error");
+      expect((err as EngineDriverError).status).toBe(500);
     }
   });
 
-  test("network error → LocalDriverError unreachable", async () => {
+  test("network error → EngineDriverError unreachable", async () => {
     const fetch = (() => Promise.reject(new TypeError("fetch failed"))) as unknown as typeof globalThis.fetch;
     const driver = createOllamaDriver({ url: "http://x", fetch });
     try {
@@ -244,12 +245,12 @@ describe("OllamaDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("unreachable");
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("unreachable");
     }
   });
 
-  test("AbortSignal pre-fetch → LocalDriverError timeout", async () => {
+  test("AbortSignal pre-fetch → EngineDriverError timeout", async () => {
     const fetch = ((_url: string, init?: RequestInit) => {
       const e = new Error("aborted");
       e.name = "AbortError";
@@ -270,8 +271,8 @@ describe("OllamaDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("timeout");
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("timeout");
     }
   });
 });
@@ -323,9 +324,9 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    const texts = events.filter((e) => e.kind === "text") as Array;
+    const texts = events.filter((e) => e.kind === "text") as Array;
     expect(texts.map((t) => t.delta).join("")).toBe("hello world");
-    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: null, outputTokens: null });
+    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: null, outputTokens: null , costUsd: null });
   });
 
   test("multiple SSE events in one chunk are all parsed", async () => {
@@ -340,7 +341,7 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    const texts = events.filter((e) => e.kind === "text") as Array;
+    const texts = events.filter((e) => e.kind === "text") as Array;
     expect(texts.map((t) => t.delta).join("")).toBe("abc");
   });
 
@@ -358,7 +359,7 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    const texts = events.filter((e) => e.kind === "text") as Array;
+    const texts = events.filter((e) => e.kind === "text") as Array;
     expect(texts.map((t) => t.delta).join("")).toBe("hello");
   });
 
@@ -371,7 +372,7 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    const text = (events.find((e) => e.kind === "text") as LocalChatEvent & { kind: "text" }).delta;
+    const text = (events.find((e) => e.kind === "text") as EngineChatEvent & { kind: "text" }).delta;
     expect(text).toBe("ok");
   });
 
@@ -387,7 +388,7 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: 12, outputTokens: 4 });
+    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: 12, outputTokens: 4 , costUsd: null });
   });
 
   test("missing usage chunk → null token counts", async () => {
@@ -401,7 +402,7 @@ describe("LmstudioDriver — streamChat text", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
-    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: null, outputTokens: null });
+    expect(events.at(-1)).toEqual({ kind: "done", inputTokens: null, outputTokens: null , costUsd: null });
   });
 });
 
@@ -437,7 +438,7 @@ describe("LmstudioDriver — tool calls", () => {
       driver.streamChat({ model: "m", messages: [{ role: "user", content: "hi" }] }),
     );
     const calls = events.filter((e) => e.kind === "tool_call") as Array<
-      LocalChatEvent & { kind: "tool_call" }
+      EngineChatEvent & { kind: "tool_call" }
     >;
     expect(calls).toHaveLength(1);
     expect(calls[0]!.call.id).toBe("call_abc");
@@ -560,7 +561,7 @@ describe("LmstudioDriver — tool calls", () => {
 });
 
 describe("LmstudioDriver — streamChat errors", () => {
-  test("HTTP 404 → LocalDriverError model_missing", async () => {
+  test("HTTP 404 → EngineDriverError model_missing", async () => {
     const fetch = fakeFetch(
       () =>
         new Response(JSON.stringify({ error: { message: "model not loaded" } }), { status: 404 }),
@@ -572,12 +573,12 @@ describe("LmstudioDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("model_missing");
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("model_missing");
     }
   });
 
-  test("HTTP 500 → LocalDriverError http_error", async () => {
+  test("HTTP 500 → EngineDriverError http_error", async () => {
     const fetch = fakeFetch(() => new Response("oom", { status: 500 }));
     const driver = createLmstudioDriver({ url: "http://x", fetch });
     try {
@@ -586,9 +587,9 @@ describe("LmstudioDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("http_error");
-      expect((err as LocalDriverError).status).toBe(500);
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("http_error");
+      expect((err as EngineDriverError).status).toBe(500);
     }
   });
 
@@ -612,11 +613,11 @@ describe("LmstudioDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("model_missing");
-      expect((err as LocalDriverError).message).toContain("requested-but-not-loaded");
-      expect((err as LocalDriverError).message).toContain("actually-loaded-model");
-      expect((err as LocalDriverError).message).toContain("lms load");
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("model_missing");
+      expect((err as EngineDriverError).message).toContain("requested-but-not-loaded");
+      expect((err as EngineDriverError).message).toContain("actually-loaded-model");
+      expect((err as EngineDriverError).message).toContain("lms load");
     }
   });
 
@@ -630,7 +631,7 @@ describe("LmstudioDriver — streamChat errors", () => {
     const events = await collectEvents(
       driver.streamChat({ model: "qwen", messages: [{ role: "user", content: "hi" }] }),
     );
-    const texts = events.filter((e) => e.kind === "text") as Array;
+    const texts = events.filter((e) => e.kind === "text") as Array;
     expect(texts.map((t) => t.delta).join("")).toBe("hi");
     expect(events.at(-1)?.kind).toBe("done");
   });
@@ -655,7 +656,7 @@ describe("LmstudioDriver — streamChat errors", () => {
         messages: [{ role: "user", content: "hi" }],
       }),
     );
-    const texts = events.filter((e) => e.kind === "text") as Array;
+    const texts = events.filter((e) => e.kind === "text") as Array;
     expect(texts.map((t) => t.delta).join("")).toBe("ok");
     expect(events.at(-1)?.kind).toBe("done");
   });
@@ -675,8 +676,9 @@ describe("LmstudioDriver — streamChat errors", () => {
       );
       throw new Error("expected throw");
     } catch (err) {
-      expect(err).toBeInstanceOf(LocalDriverError);
-      expect((err as LocalDriverError).code).toBe("model_missing");
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("model_missing");
     }
   });
 });
+
diff --git a/src/local-driver.ts b/src/local-driver.ts
index 8a0dea0..d39c433 100644
--- a/src/local-driver.ts
+++ b/src/local-driver.ts
@@ -23,20 +23,25 @@
  *   LMStudio emits a parsable-after-prefix-strip JSON line — what's different?"
  *
  * Position in the dependency graph:
- *   log → local-driver → local, local-tools
+ *   log + engine-driver → local-driver → engine-tools, engine
+ *
+ * Shared types (`EngineBackend`, `EngineDriver`, `EngineChatEvent`,
+ * `EngineDriverError`, …) and the cross-driver helpers (`stableStringify`,
+ * `maybeLogEmptyStream`) live in `engine-driver.ts`. This file imports them
+ * under local-flavored aliases for internal readability — the engine union
+ * has only two backends here, so the local-vs-remote distinction stays
+ * visually clear without re-stating the slot semantics.
  *
  * Exports:
- *   - `LocalBackend` — `"ollama" | "lmstudio"`.
- *   - `LocalChatRole`, `LocalChatMessage`, `LocalToolCallRef`, `LocalToolDef`.
- *   - `LocalChatEvent` — `text | tool_call | done | error`.
- *   - `LocalProbeResult` — `{ ok; reason?; modelMissing? }`.
- *   - `LocalDriver` — interface (`backend`, `probe`, `streamChat`).
- *   - `LocalDriverError` — typed error for connection/HTTP failures.
- *   - `createOllamaDriver(opts)`, `createLmstudioDriver(opts)` — factories.
+ *   - `createOllamaDriver(opts)`, `createLmstudioDriver(opts)`,
+ *     `createLocalDriver(backend, opts)`.
+ *   - `buildLocalCapabilityNote(opts)`, `buildLocalToolCapabilityNote(...)`
+ *     — local-mode framing ("free"). Remote-mode counterparts live in
+ *     `remote-driver.ts`.
  *
  * Key invariants:
  *   - `streamChat` ALWAYS resolves the async iterable, even on errors —
- *     errors surface as `kind: "error"` events OR throw `LocalDriverError`
+ *     errors surface as `kind: "error"` events OR throw `EngineDriverError`
  *     for network-level failures (connection refused, timeout, 4xx/5xx).
  *   - The Ollama driver's tool-call extraction reads `message.tool_calls`
  *     from any frame (Ollama emits them on the final `done:true` frame
@@ -47,7 +52,7 @@
  *   - Tool-call dedup (Gemma-4 workaround) compares stableStringify-ed
  *     `(name, args)` pairs; identical duplicates within one assistant
  *     message are skipped silently.
- *   - `LocalDriverError` carries a `code` discriminant so callers can
+ *   - `EngineDriverError` carries a `code` discriminant so callers can
  *     render different UX for `unreachable` vs `model_missing` vs
  *     `timeout` vs `http_error`.
  *
@@ -63,123 +68,21 @@
  *     a trailing newline — driver tolerates.
  */
 
+import {
+  type EngineChatEvent as LocalChatEvent,
+  type EngineChatMessage as LocalChatMessage,
+  type EngineDriver as LocalDriver,
+  type EngineProbeResult as LocalProbeResult,
+  type EngineToolDef as LocalToolDef,
+  type DriverOpts,
+  EngineDriverError as LocalDriverError,
+  RAW_FRAME_BUFFER_MAX,
+  RAW_FRAME_TRUNC,
+  maybeLogEmptyStream,
+  stableStringify,
+} from "./engine-driver.ts";
 import { log } from "./log.ts";
 
-export type LocalBackend = "ollama" | "lmstudio";
-
-export type LocalChatRole = "system" | "user" | "assistant" | "tool";
-
-/**
- * Reference to one tool call emitted by an assistant message. `id` is set
- * by backends that namespace calls (LMStudio); for Ollama, the consumer
- * synthesizes one (`call__`) so cross-backend message arrays
- * carry a stable identifier.
- */
-export interface LocalToolCallRef {
-  id?: string;
-  function: { name: string; arguments: unknown };
-}
-
-/**
- * Unified chat message shape. Each driver maps to its backend's wire shape:
- *   - Ollama matches tool results by `tool_name`.
- *   - LMStudio matches by `tool_call_id`.
- * Consumers populate both on tool-result messages; drivers pick what they
- * need. Extra fields are harmless on either wire.
- */
-export interface LocalChatMessage {
-  role: LocalChatRole;
-  content: string;
-  tool_calls?: ReadonlyArray;
-  tool_call_id?: string;
-  tool_name?: string;
-}
-
-/**
- * Wire-shape tool definition shared by both backends — Ollama adopted OpenAI's
- * function-calling JSON Schema directly; LMStudio is OpenAI-compatible.
- */
-export interface LocalToolDef {
-  readonly type: "function";
-  readonly function: {
-    readonly name: string;
-    readonly description: string;
-    readonly parameters: Readonly>;
-  };
-}
-
-/**
- * One event from `LocalDriver.streamChat`. Driver consumers iterate until the
- * stream ends or a `done`/`error` event arrives.
- */
-export type LocalChatEvent =
-  | { kind: "text"; delta: string }
-  | { kind: "tool_call"; call: LocalToolCallRef }
-  | { kind: "done"; inputTokens: number | null; outputTokens: number | null }
-  | { kind: "error"; message: string };
-
-export interface LocalProbeResult {
-  ok: boolean;
-  reason?: string;
-  modelMissing?: boolean;
-}
-
-export interface LocalStreamChatOpts {
-  model: string;
-  messages: ReadonlyArray;
-  tools?: ReadonlyArray;
-  signal?: AbortSignal;
-}
-
-export interface LocalDriver {
-  readonly backend: LocalBackend;
-  probe(model: string, signal?: AbortSignal): Promise;
-  streamChat(opts: LocalStreamChatOpts): AsyncIterable;
-}
-
-/**
- * Typed error surface for `streamChat` and `probe`. `code` lets callers
- * render distinct UX for "ollama daemon not running" (`unreachable`) vs
- * "model not pulled" (`model_missing`) without parsing the message.
- */
-export class LocalDriverError extends Error {
-  readonly backend: LocalBackend;
-  readonly code: "unreachable" | "timeout" | "model_missing" | "http_error";
-  readonly status?: number;
-  constructor(
-    backend: LocalBackend,
-    code: "unreachable" | "timeout" | "model_missing" | "http_error",
-    message: string,
-    status?: number,
-  ) {
-    super(message);
-    this.name = "LocalDriverError";
-    this.backend = backend;
-    this.code = code;
-    this.status = status;
-  }
-}
-
-export interface DriverOpts {
-  url: string; // base, no trailing slash
-  fetch?: typeof fetch;
-}
-
-// ---------------------------------------------------------------------------
-// Stable stringify (tool-call dedup key)
-// ---------------------------------------------------------------------------
-
-// Order-insensitive JSON stringify so `{a:1,b:2}` and `{b:2,a:1}` hash to the
-// same dedup key. Used by the LMStudio driver to suppress duplicate tool calls
-// inside one assistant message (Gemma-4 `parallel_tool_calls: false` bug).
-function stableStringify(value: unknown): string {
-  if (value === null || typeof value !== "object") return JSON.stringify(value) ?? "null";
-  if (Array.isArray(value)) return `[${value.map(stableStringify).join(",")}]`;
-  const obj = value as Record;
-  const keys = Object.keys(obj).sort();
-  return `{${keys.map((k) => `${JSON.stringify(k)}:${stableStringify(obj[k])}`).join(",")}}`;
-}
-
 // ---------------------------------------------------------------------------
 // Ollama driver — NDJSON
 // ---------------------------------------------------------------------------
@@ -215,6 +118,7 @@ export function createOllamaDriver(opts: DriverOpts): LocalDriver {
 
   return {
     backend: "ollama",
+    mode: "local",
 
     async probe(model, signal): Promise {
       let res: Response;
@@ -313,7 +217,7 @@ export function createOllamaDriver(opts: DriverOpts): LocalDriver {
             try {
               frame = JSON.parse(line) as OllamaFrame;
             } catch (parseErr) {
-              log.warn("local.ollama_bad_frame", {
+              log.warn("ollama.bad_frame", {
                 error: (parseErr as Error).message,
                 line: line.slice(0, 120),
               });
@@ -358,7 +262,7 @@ export function createOllamaDriver(opts: DriverOpts): LocalDriver {
         throw new LocalDriverError("ollama", "unreachable", `stream failed: ${e.message}`);
       }
 
-      yield { kind: "done", inputTokens, outputTokens };
+      yield { kind: "done", inputTokens, outputTokens, costUsd: null };
     },
   };
 }
@@ -391,7 +295,16 @@ interface LmstudioSseFrame {
   // requested model on the first chunk to catch mid-session model swaps.
   model?: string;
   choices?: ReadonlyArray;
-  usage?: { prompt_tokens?: number; completion_tokens?: number };
+  // OpenAI's streaming usage shape. LMStudio populates prompt/completion
+  // tokens only; OpenRouter also populates `cost` (USD) and `is_byok`. The
+  // shared typedef keeps both drivers' SSE parsers single-source — backends
+  // that don't emit a field simply leave it undefined.
+  usage?: {
+    prompt_tokens?: number;
+    completion_tokens?: number;
+    cost?: number;
+    is_byok?: boolean;
+  };
   // LMStudio inlines errors inside the SSE stream rather than via HTTP status —
   // e.g. context-length overruns arrive here with HTTP 200. Without surfacing
   // this, the stream looks "empty" to the consumer and the user sees the
@@ -405,25 +318,6 @@ interface ToolCallAccumulator {
   argsBuffer: string;
 }
 
-// Caps on the empty-stream diagnostic buffer — bounded so a runaway stream
-// (huge content + zero parsed events somehow) can't blow up the log line.
-const RAW_FRAME_BUFFER_MAX = 30;
-const RAW_FRAME_TRUNC = 400;
-
-function maybeLogEmptyStream(args: {
-  model: string;
-  textCharsEmitted: number;
-  toolCallsEmitted: number;
-  rawFrameBuffer: ReadonlyArray;
-}): void {
-  if (args.textCharsEmitted > 0 || args.toolCallsEmitted > 0) return;
-  log.warn("local.lmstudio_empty_stream", {
-    model: args.model,
-    frameCount: args.rawFrameBuffer.length,
-    frames: args.rawFrameBuffer,
-  });
-}
-
 function lmstudioSerializeMessage(m: LocalChatMessage): Record {
   const out: Record = { role: m.role, content: m.content };
   if (m.tool_calls) {
@@ -450,6 +344,7 @@ export function createLmstudioDriver(opts: DriverOpts): LocalDriver {
 
   return {
     backend: "lmstudio",
+    mode: "local",
 
     async probe(model, signal): Promise {
       let res: Response;
@@ -587,7 +482,7 @@ export function createLmstudioDriver(opts: DriverOpts): LocalDriver {
           }
           const dedupKey = stableStringify({ name: acc.name, args: parsedArgs });
           if (emittedDedup.has(dedupKey)) {
-            log.info("local.lmstudio_tool_call_deduped", { name: acc.name });
+            log.info("lmstudio.tool_call_deduped", { name: acc.name });
             continue;
           }
           emittedDedup.add(dedupKey);
@@ -638,15 +533,16 @@ export function createLmstudioDriver(opts: DriverOpts): LocalDriver {
                 textCharsEmitted,
                 toolCallsEmitted,
                 rawFrameBuffer,
+                logEvent: "lmstudio.empty_stream",
               });
-              yield { kind: "done", inputTokens, outputTokens };
+              yield { kind: "done", inputTokens, outputTokens, costUsd: null };
               return;
             }
             let frame: LmstudioSseFrame;
             try {
               frame = JSON.parse(data) as LmstudioSseFrame;
             } catch (parseErr) {
-              log.warn("local.lmstudio_bad_frame", {
+              log.warn("lmstudio.bad_frame", {
                 error: (parseErr as Error).message,
                 data: data.slice(0, 120),
               });
@@ -743,20 +639,107 @@ export function createLmstudioDriver(opts: DriverOpts): LocalDriver {
         textCharsEmitted,
         toolCallsEmitted,
         rawFrameBuffer,
+        logEvent: "lmstudio.empty_stream",
       });
-      yield { kind: "done", inputTokens, outputTokens };
+      yield { kind: "done", inputTokens, outputTokens, costUsd: null };
     },
   };
 }
 
+// OpenRouter driver moved to `src/remote-driver.ts` in Phase 2 of the
+// engine-driver refactor. Local-mode drivers (Ollama, LMStudio) stay here;
+// remote-mode drivers live there. Boot wiring in `main.ts` dispatches per
+// `LOCAL_ENABLED` vs `REMOTE_ENABLED`. A back-compat re-export of
+// `createOpenrouterDriver` + `OpenrouterDriverOpts` at the top of this file
+// keeps existing tests in `local-driver.test.ts` compiling until Phase 6
+// moves them into `remote-driver.test.ts`.
+
 /**
- * Pick the driver implementation for the configured backend. Centralized so
- * callers (main.ts boot wiring, test harness) don't duplicate the switch.
+ * Pick the local-mode driver implementation for the configured backend.
+ * Centralized so callers (main.ts boot wiring, test harness) don't
+ * duplicate the switch.
+ *
+ * Remote-mode dispatch lives in `remote-driver.ts::createRemoteDriver`.
+ * `main.ts` calls one factory or the other based on `LOCAL_ENABLED` vs
+ * `REMOTE_ENABLED` (mutually exclusive at boot per `config.ts`).
  */
 export function createLocalDriver(
-  backend: LocalBackend,
+  backend: "ollama" | "lmstudio",
   opts: DriverOpts,
 ): LocalDriver {
   if (backend === "ollama") return createOllamaDriver(opts);
   return createLmstudioDriver(opts);
 }
+
+// ---------------------------------------------------------------------------
+// Capability note (local-mode system-prompt clause)
+// ---------------------------------------------------------------------------
+
+/**
+ * Local-mode capability statement appended to SOUL.md before it ships as the
+ * first `system` message. Always frames the cost as "free" because on-host
+ * backends (Ollama, LMStudio) don't bill per token. Remote-mode (OpenRouter)
+ * has a parallel `buildRemoteCapabilityNote` in `remote-driver.ts` that
+ * frames cost as per-token. The runner in `engine.ts` picks the right
+ * builder from `driver.mode`.
+ *
+ * Matrix:
+ *   tools=off, default=local   → "you are the default; for tool-driven work prefix @ or !"
+ *   tools=off, default=Claude  → "you do not have tools; redirect tool requests to @ or !"
+ *   tools=on,  default=local   → "you are the default; you have these tools: ; escalate via @ / !"
+ *   tools=on,  default=Claude  → unreachable (boot validation in config.ts rejects this combo);
+ *                                 falls through to the tools-on default-engine cell defensively.
+ */
+export interface LocalCapabilityNoteOpts {
+  toolsEnabled: boolean;
+  isDefaultEngine: boolean;
+  toolNames: ReadonlyArray;
+}
+
+export function buildLocalCapabilityNote(opts: LocalCapabilityNoteOpts): string {
+  const { toolsEnabled, isDefaultEngine, toolNames } = opts;
+  const costClause = "your replies cost the operator nothing";
+  if (toolsEnabled) {
+    const list = toolNames.join(", ");
+    return (
+      `You are the default chat engine; ${costClause}. ` +
+      `You have these tools available: ${list}. ` +
+      "Call them when the user's request needs information or actions you " +
+      "can't deliver from your training alone (current data, external APIs, " +
+      "operator-authored integrations). Tool results return into your " +
+      "context — never tell the user 'I cannot do that' if a listed tool can. " +
+      "If a request is too complex for these tools or for local reasoning, " +
+      "suggest the user re-send with `@` (Sonnet) or `!` (Opus) for heavier reasoning."
+    );
+  }
+  if (isDefaultEngine) {
+    return (
+      `You are the default chat engine; ${costClause}. ` +
+      "You do not have tools — answer from what you know. " +
+      "If the user asks for something that needs tools (file edits, API calls, " +
+      "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
+      "or `!` (Opus) to escalate to a Claude tier."
+    );
+  }
+  return (
+    "You do not have tools; answer from what you know. " +
+    "If the user asks for something that needs tools (file edits, API calls, " +
+    "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
+    "or `!` (Opus)."
+  );
+}
+
+/**
+ * Convenience for the tools-on path. Defers to `buildLocalCapabilityNote` so
+ * the matrix has a single source of truth.
+ */
+export function buildLocalToolCapabilityNote(
+  toolNames: ReadonlyArray,
+  isDefaultEngine: boolean,
+): string {
+  return buildLocalCapabilityNote({
+    toolsEnabled: true,
+    isDefaultEngine,
+    toolNames,
+  });
+}
diff --git a/src/main.ts b/src/main.ts
index fed64af..b88d81e 100644
--- a/src/main.ts
+++ b/src/main.ts
@@ -78,7 +78,7 @@ import {
   BOT_COMMAND_REGISTRY,
   parseCommand,
   runCommand,
-  type LocalSkillDeps,
+  type EngineSkillDeps,
   type RunCommandDeps,
 } from "./commands.ts";
 import { loadConfig, type Config } from "./config.ts";
@@ -92,11 +92,13 @@ import {
 } from "./instance.ts";
 import { installShutdown } from "./lifecycle.ts";
 import { log } from "./log.ts";
-import { runLocalTurn, type LocalRunDeps } from "./local.ts";
 import {
-  createLocalDriver,
-  type LocalDriver,
-} from "./local-driver.ts";
+  runEngineTurn,
+  type EngineRunDeps,
+} from "./engine.ts";
+import type { EngineDriver } from "./engine-driver.ts";
+import { createLocalDriver } from "./local-driver.ts";
+import { createRemoteDriver } from "./remote-driver.ts";
 import { acquirePidFile, startPolling } from "./poll.ts";
 import {
   createConfirmationBroker,
@@ -173,9 +175,9 @@ interface RunTurnDeps {
     pendingHandles: Map;
   }) => CanUseTool;
   // Present iff `LOCAL_ENABLED=true`. When set, no-prefix messages route to
-  // runLocalTurn instead of runAgent. Both paths share the queue, mutex,
+  // runEngineTurn instead of runAgent. Both paths share the queue, mutex,
   // semaphore, and tracker drain — dispatch happens inside the queued worker.
-  localDeps: LocalRunDeps | null;
+  localDeps: EngineRunDeps | null;
   // PNX-167 — slash command surface. `commandDeps` carries the dispatcher's
   // dependencies (allowlist, queue snapshot, startedAt, etc.) so the
   // command path stays self-contained. `botUsername` is the cached lowercase
@@ -261,7 +263,7 @@ function makeRunTurn(deps: RunTurnDeps): (update: Update) => Promise {
         await deps.tg
           .sendMessage(msg.chat.id, "local engine disabled in this deployment")
           .catch((err) =>
-            log.warn("local.disabled_ack_failed", { error: (err as Error).message }),
+            log.warn("engine.disabled_ack_failed", { error: (err as Error).message }),
           );
         log.info("turn.done", {
           update_id: update.update_id,
@@ -273,7 +275,7 @@ function makeRunTurn(deps: RunTurnDeps): (update: Update) => Promise {
       // Empty body is unreachable on Telegram (the platform rejects empty
       // messages) and the web UI guards against it. Send the user's text
       // straight to the runner.
-      await runLocalTurn(deps.localDeps, {
+      await runEngineTurn(deps.localDeps, {
         chatId: msg.chat.id,
         fromId: msg.from.id,
         updateId: scheduledCtx ? null : update.update_id,
@@ -486,13 +488,18 @@ export function auditQueueFull(update: Update, db: SolracDb, tg: TelegramClient,
 // Operator-readable label for the web UI's default-engine pill. The pill
 // itself ships with the empty `data-prefix=""`, but the title attr is
 // substituted at serve time so the user hovers over a label matching the
-// deploy. Local-engine deploys carry the backend name in parentheses so
-// the operator sees which backend served the turn at a glance.
+// deploy. Engine-slot deploys carry the mode + backend in parentheses
+// (e.g. `local (ollama)`, `remote (openrouter)`) so the operator sees which
+// backend served the turn at a glance.
 function defaultEngineLabel(
   engine: "local" | "primary" | "secondary",
   localBackend: "ollama" | "lmstudio" | null,
+  remoteBackend: "openrouter" | null,
 ): string {
-  if (engine === "local") return `local (${localBackend ?? "?"})`;
+  if (engine === "local") {
+    if (remoteBackend) return `remote (${remoteBackend})`;
+    return `local (${localBackend ?? "?"})`;
+  }
   if (engine === "primary") return "primary Claude (Sonnet)";
   return "secondary Claude (Opus)";
 }
@@ -503,19 +510,19 @@ function defaultEngineLabel(
 // turn will succeed once the daemon is reachable. Delegates the probe to
 // the driver so each backend hits its own probe URL (`/api/tags` for Ollama,
 // `/v1/models` for LMStudio).
-async function probeLocalHealth(driver: LocalDriver, model: string): Promise {
+async function probeEngineHealth(driver: EngineDriver, model: string): Promise {
   const backend = driver.backend;
   try {
     const result = await driver.probe(model, AbortSignal.timeout(5_000));
     if (!result.ok) {
       if (result.modelMissing) {
-        log.warn("local.boot_health_model_missing", {
+        log.warn("engine.boot_health_model_missing", {
           backend,
           model,
           hint: result.reason,
         });
       } else {
-        log.warn("local.boot_health_failed", {
+        log.warn("engine.boot_health_failed", {
           backend,
           model,
           hint: result.reason,
@@ -523,9 +530,9 @@ async function probeLocalHealth(driver: LocalDriver, model: string): Promise {
     localBackend: config.localBackend,
     localModel: config.localModel,
     localUrl: config.localUrl,
+    remoteEnabled: config.remoteEnabled,
+    remoteBackend: config.remoteBackend,
+    remoteModel: config.remoteModel,
+    remoteBaseUrl: config.remoteBaseUrl,
   });
   // One-release-cycle silent-flip guard. Operators upgrading without setting
   // `SOLRAC_DEFAULT_ENGINE` would see no-prefix messages start hitting the
@@ -611,7 +622,7 @@ async function main(): Promise {
     // wins on tool-name collisions so a stale operator copy can't shadow a
     // blessed integration. Tools registered here surface to Claude tiers as
     // `mcp__solrac__`. Local path does NOT see integrations on the
-    // tools-off branch — see local.ts.
+    // tools-off branch — see engine.ts.
     let integrationsMcpServer: McpSdkServerConfigWithInstance | null = null;
     let integrationToolTiers: ReadonlyMap = new Map();
     let integrationConfirmFormatters: ReadonlyMap = new Map();
@@ -655,24 +666,48 @@ async function main(): Promise {
       }
     }
 
-    // Local-engine driver — backend selected per `LOCAL_BACKEND`. Built once
-    // at boot and shared by every consumer (run path, skill path, scheduler).
-    // `null` when the local engine is disabled.
-    const localDriver: LocalDriver | null =
-      config.localEnabled && config.localBackend && config.localModel
-        ? createLocalDriver(config.localBackend, { url: config.localUrl })
-        : null;
+    // Engine-slot driver — backend selected per `LOCAL_BACKEND` (local mode)
+    // OR `REMOTE_BACKEND` (remote mode). The two modes are mutually exclusive
+    // at boot (config.ts validates), so at most one constructor fires. The
+    // resulting driver fills the same `local` engine slot — runner picks the
+    // mode-aware behavior via the `mode` field on EngineRunDeps.
+    let localDriver: EngineDriver | null = null;
+    let localSlotMode: "local" | "remote" = "local";
+    let localSlotModel: string | null = null;
+    let localSlotTimeoutMs = 60_000;
+    let localSlotHistoryLimit = 6;
+    let localSlotMaxToolIterations = 8;
+    if (config.localEnabled && config.localBackend && config.localModel) {
+      localDriver = createLocalDriver(config.localBackend, { url: config.localUrl });
+      localSlotMode = "local";
+      localSlotModel = config.localModel;
+      localSlotTimeoutMs = config.localTimeoutMs;
+      localSlotHistoryLimit = config.localHistoryLimit;
+      localSlotMaxToolIterations = config.localMaxToolIterations;
+    } else if (config.remoteEnabled && config.remoteBackend && config.remoteModel && config.remoteApiKey) {
+      localDriver = createRemoteDriver(config.remoteBackend, {
+        url: config.remoteBaseUrl,
+        apiKey: config.remoteApiKey,
+        referer: config.remoteHttpReferer,
+        title: config.remoteXTitle,
+      });
+      localSlotMode = "remote";
+      localSlotModel = config.remoteModel;
+      localSlotTimeoutMs = config.remoteTimeoutMs;
+      localSlotHistoryLimit = config.remoteHistoryLimit;
+      localSlotMaxToolIterations = config.remoteMaxToolIterations;
+    }
 
     // Skill-side local deps (one-shot, no tool loop, no streaming). Built
     // from config directly (not derived from `localDeps` below) so it's
     // available for `buildSkillTools` before the main `localDeps` is
     // assembled. Both consumers see the same driver instance.
-    const localSkillDeps: LocalSkillDeps | null =
-      localDriver && config.localModel
+    const localSkillDeps: EngineSkillDeps | null =
+      localDriver && localSlotModel
         ? {
             driver: localDriver,
-            model: config.localModel,
-            timeoutMs: config.localTimeoutMs,
+            model: localSlotModel,
+            timeoutMs: localSlotTimeoutMs,
             soul,
           }
         : null;
@@ -717,7 +752,7 @@ async function main(): Promise {
     // a typo broke every module. Fail-soft (start anyway) but make the
     // misconfiguration loud in the boot log.
     if (config.localToolsEnabled && integrationTools.length === 0) {
-      log.warn("local.tools_enabled_but_zero_loaded", {
+      log.warn("engine.tools_enabled_but_zero_loaded", {
         integrationsDir: config.integrationsDir,
         hint: "set SOLRAC_INTEGRATIONS_DIR or add modules under integrations-builtin/",
       });
@@ -754,27 +789,32 @@ async function main(): Promise {
         pendingHandles,
       });
     };
-    // Local-engine deps are constructed once iff the feature is on. When
-    // off, dispatch in makeRunTurn falls through to a "disabled" reply.
+    // Engine-slot deps are constructed once iff the feature is on. When off,
+    // dispatch in makeRunTurn falls through to a "disabled" reply.
     //
     // Tool-loop wiring: when BOTH `localToolsEnabled=true` AND we actually
     // loaded integration tools, surface the tools + tier map + broker into
-    // the deps so `runLocalTurn` dispatches through the tool-loop driver.
+    // the deps so `runEngineTurn` dispatches through the tool-loop driver.
     // When tools are off (or zero loaded), the same deps shape carries
     // `toolEnabled: false` and the single-shot path runs.
+    //
+    // `LOCAL_TOOLS_ENABLED` gates the tool-loop for BOTH local and remote
+    // mode — the env-var name predates the remote mode but the code path is
+    // identical. (Renaming to `BYO_TOOLS_ENABLED` is a v0.7.0-class hard
+    // cutover and out of scope for this PR.)
     const localToolsActive =
       config.localToolsEnabled && integrationTools.length > 0;
     const localIsDefault = config.defaultEngine === "local";
-    const localDeps: LocalRunDeps | null =
-      localDriver && config.localModel
+    const localDeps: EngineRunDeps | null =
+      localDriver && localSlotModel
         ? {
             tg,
             db,
             sessions,
             driver: localDriver,
-            model: config.localModel,
-            timeoutMs: config.localTimeoutMs,
-            historyLimit: config.localHistoryLimit,
+            model: localSlotModel,
+            timeoutMs: localSlotTimeoutMs,
+            historyLimit: localSlotHistoryLimit,
             soul,
             instanceMdPath: solracMdPath,
             isDefaultEngine: localIsDefault,
@@ -782,21 +822,20 @@ async function main(): Promise {
             tools: localToolsActive ? integrationTools : undefined,
             toolTiers: localToolsActive ? integrationToolTiers : undefined,
             broker: localToolsActive ? broker : undefined,
-            maxToolIterations: config.localMaxToolIterations,
+            maxToolIterations: localSlotMaxToolIterations,
           }
         : null;
     if (localDeps && localDriver) {
-      log.info("local.boot", {
+      log.info("engine.boot", {
         backend: localDriver.backend,
-        url: config.localUrl,
-        model: config.localModel,
+        mode: localSlotMode,
+        url: localSlotMode === "local" ? config.localUrl : config.remoteBaseUrl,
+        model: localSlotModel,
         isDefaultEngine: localIsDefault,
         toolsEnabled: localToolsActive,
         toolCount: localToolsActive ? integrationTools.length : 0,
-        maxToolIterations: localToolsActive
-          ? config.localMaxToolIterations
-          : null,
-        timeoutMs: config.localTimeoutMs,
+        maxToolIterations: localToolsActive ? localSlotMaxToolIterations : null,
+        timeoutMs: localSlotTimeoutMs,
       });
     }
     // Attach the tool surface to localSkillDeps AFTER integrationTools/
@@ -808,13 +847,12 @@ async function main(): Promise {
       localSkillDeps.toolTiers = integrationToolTiers;
       localSkillDeps.broker = broker;
     }
-    // The local engine is the recommended default; probe the backend at boot
-    // so operators see a misconfiguration immediately (vs. on first user
-    // turn). Non-fatal: a slow-starting daemon may not be ready yet under
-    // systemd, and crashing Solrac because of a transient probe failure is
-    // worse than logging it.
-    if (localIsDefault && localDeps && localDriver && config.localModel) {
-      void probeLocalHealth(localDriver, config.localModel);
+    // Engine-slot health probe — runs for whichever backend (local or remote)
+    // is wired and selected as the default. Non-fatal: a slow-starting daemon
+    // may not be ready yet under systemd, and a transient OpenRouter network
+    // blip shouldn't crash Solrac.
+    if (localIsDefault && localDeps && localDriver && localSlotModel) {
+      void probeEngineHealth(localDriver, localSlotModel);
     }
     // PNX-167 — boot-time bot identity for `/cmd@` group-chat targeting.
     // Failure is non-fatal: we proceed with `botUsername=null`, which causes
@@ -883,6 +921,9 @@ async function main(): Promise {
       localSkillDeps,
       defaultEngine: config.defaultEngine,
       localToolsEnabled: config.localToolsEnabled,
+      // null when neither LOCAL_ENABLED nor REMOTE_ENABLED — `/help` then
+      // renders the Claude-only engine section without a cost-framing chip.
+      engineSlotMode: localDeps ? localSlotMode : null,
       taskRegistry,
       triggerScheduledTask: (name) =>
         schedulerRef
@@ -899,13 +940,13 @@ async function main(): Promise {
     // events flow through one subscriber set.
     const webClient: WebClient | null = tgWebClient;
     let webCommandDeps: RunCommandDeps | null = null;
-    let webLocalDeps: LocalRunDeps | null = null;
+    let webLocalDeps: EngineRunDeps | null = null;
     if (webClient) {
       // Web-routed / invocations: rewrite the broker so confirm
       // prompts ride the SSE bus rather than Telegram (mirrors the
       // webLocalDeps swap below). `tools` and `toolTiers` are unchanged —
       // only the broker differs per transport.
-      const webLocalSkillDeps: LocalSkillDeps | null = commandDeps.localSkillDeps
+      const webEngineSkillDeps: EngineSkillDeps | null = commandDeps.localSkillDeps
         ? {
             ...commandDeps.localSkillDeps,
             broker:
@@ -917,7 +958,7 @@ async function main(): Promise {
       webCommandDeps = {
         ...commandDeps,
         tg: webClient,
-        localSkillDeps: webLocalSkillDeps,
+        localSkillDeps: webEngineSkillDeps,
       };
       // Local-engine-on-web path needs the web broker (not the Telegram
       // broker) so confirm prompts ride the SSE bus to the operator's
@@ -1007,7 +1048,11 @@ async function main(): Promise {
         token: config.webToken,
         webChatId: config.webChatId,
         webClient,
-        defaultEngineLabel: defaultEngineLabel(config.defaultEngine, config.localBackend),
+        defaultEngineLabel: defaultEngineLabel(
+          config.defaultEngine,
+          config.localBackend,
+          config.remoteBackend,
+        ),
         onMessage: (text) => {
           const id = nextWebUpdateId++;
           const update: Update = {
diff --git a/src/remote-driver.test.ts b/src/remote-driver.test.ts
new file mode 100644
index 0000000..3667a46
--- /dev/null
+++ b/src/remote-driver.test.ts
@@ -0,0 +1,379 @@
+/**
+ * @fileoverview Unit tests for `remote-driver.ts` — OpenRouter.
+ * @proves SSE wire-format parsing (text + tool-call deltas), trailing usage
+ *         chunk capture (the load-bearing `costUsd` path), auth + attribution
+ *         header injection on probe + streamChat, and HTTP error mapping
+ *         (`http_error`, `model_missing`).
+ *
+ * Driver ships with a handwritten-fake fetch (no mocking framework, per
+ * CLAUDE.md Testing Philosophy). Each test constructs a `Response` with a
+ * `ReadableStream` body so the driver consumes real chunk boundaries —
+ * partial-event behavior is exercised by hand-splitting payloads into
+ * multiple `controller.enqueue` calls.
+ */
+
+import { describe, expect, test } from "bun:test";
+import {
+  EngineDriverError,
+  type EngineChatEvent,
+} from "./engine-driver.ts";
+import { createOpenrouterDriver } from "./remote-driver.ts";
+
+// ---------------------------------------------------------------------------
+// Test helpers
+// ---------------------------------------------------------------------------
+
+function streamResponse(chunks: string[], status = 200): Response {
+  const stream = new ReadableStream({
+    start(controller) {
+      const encoder = new TextEncoder();
+      for (const chunk of chunks) controller.enqueue(encoder.encode(chunk));
+      controller.close();
+    },
+  });
+  return new Response(stream, { status });
+}
+
+function jsonResponse(obj: unknown, status = 200): Response {
+  return new Response(JSON.stringify(obj), {
+    status,
+    headers: { "content-type": "application/json" },
+  });
+}
+
+function fakeFetch(
+  impl: (url: string, init?: RequestInit) => Response | Promise,
+): typeof fetch {
+  return ((url: string | URL | Request, init?: RequestInit) =>
+    Promise.resolve(impl(String(url), init))) as unknown as typeof fetch;
+}
+
+async function collectEvents(
+  iter: AsyncIterable,
+): Promise {
+  const out: EngineChatEvent[] = [];
+  for await (const evt of iter) out.push(evt);
+  return out;
+}
+
+// ---------------------------------------------------------------------------
+// OpenrouterDriver — probe
+// ---------------------------------------------------------------------------
+
+describe("OpenrouterDriver — probe", () => {
+  test("model present in data[] → ok; auth + attribution headers sent", async () => {
+    let observedAuth = "";
+    let observedReferer = "";
+    let observedTitle = "";
+    const fetch = fakeFetch((url, init) => {
+      expect(url).toBe("https://openrouter.ai/api/v1/models");
+      const headers = (init?.headers ?? {}) as Record;
+      observedAuth = headers.authorization ?? "";
+      observedReferer = headers["http-referer"] ?? "";
+      observedTitle = headers["x-title"] ?? "";
+      return jsonResponse({
+        data: [
+          { id: "anthropic/claude-3.5-sonnet" },
+          { id: "openai/gpt-4o-mini" },
+        ],
+      });
+    });
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      referer: "https://example.com",
+      title: "test-title",
+      fetch,
+    });
+    const result = await driver.probe("anthropic/claude-3.5-sonnet");
+    expect(result.ok).toBe(true);
+    expect(observedAuth).toBe("Bearer sk-or-test-key");
+    expect(observedReferer).toBe("https://example.com");
+    expect(observedTitle).toBe("test-title");
+  });
+
+  test("model absent → modelMissing with actionable hint", async () => {
+    const fetch = fakeFetch(() =>
+      jsonResponse({ data: [{ id: "openai/gpt-4o-mini" }] }),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      fetch,
+    });
+    const result = await driver.probe("anthropic/claude-3.5-sonnet");
+    expect(result.ok).toBe(false);
+    expect(result.modelMissing).toBe(true);
+    expect(result.reason).toMatch(/openrouter\.ai\/models/);
+  });
+
+  test("HTTP 401 → auth_failed reason", async () => {
+    const fetch = fakeFetch(() => new Response("unauthorized", { status: 401 }));
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "bad",
+      fetch,
+    });
+    const result = await driver.probe("anthropic/claude-3.5-sonnet");
+    expect(result.ok).toBe(false);
+    expect(result.reason).toMatch(/auth_failed/);
+  });
+
+  test("network error → ok:false unreachable", async () => {
+    const fetch = (() =>
+      Promise.reject(new TypeError("fetch failed"))) as unknown as typeof globalThis.fetch;
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      fetch,
+    });
+    const result = await driver.probe("anthropic/claude-3.5-sonnet");
+    expect(result.ok).toBe(false);
+    expect(result.reason).toMatch(/unreachable/);
+  });
+});
+
+// ---------------------------------------------------------------------------
+// OpenrouterDriver — streamChat (the load-bearing path: cost capture)
+// ---------------------------------------------------------------------------
+
+describe("OpenrouterDriver — streamChat", () => {
+  test("trailing usage chunk populates costUsd alongside tokens", async () => {
+    // The whole point of the OpenRouter driver — without this, cost cap
+    // accounting silently treats remote turns as free (post-COALESCE).
+    const fetch = fakeFetch(
+      () =>
+        streamResponse([
+          'data: {"choices":[{"delta":{"content":"hello"}}]}\n\n',
+          'data: {"choices":[{"delta":{},"finish_reason":"stop"}]}\n\n',
+          'data: {"usage":{"prompt_tokens":12,"completion_tokens":3,"total_tokens":15,"cost":0.00007,"is_byok":false}}\n\n',
+          "data: [DONE]\n\n",
+        ]),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      fetch,
+    });
+    const events = await collectEvents(
+      driver.streamChat({
+        model: "anthropic/claude-3.5-sonnet",
+        messages: [{ role: "user", content: "hi" }],
+      }),
+    );
+    const last = events.at(-1);
+    expect(last).toEqual({
+      kind: "done",
+      inputTokens: 12,
+      outputTokens: 3,
+      costUsd: 0.00007,
+    });
+  });
+
+  test("usage chunk without cost → costUsd stays null (NOT 0)", async () => {
+    // Defensive case: if OpenRouter ever stops emitting cost, the runner must
+    // see `null` (so it logs remote.cost_missing) rather than `0` (which would
+    // silently bypass the cap query's COALESCE(SUM(cost_usd), 0)).
+    const fetch = fakeFetch(
+      () =>
+        streamResponse([
+          'data: {"choices":[{"delta":{"content":"ok"}}]}\n\n',
+          'data: {"choices":[{"delta":{},"finish_reason":"stop"}]}\n\n',
+          'data: {"usage":{"prompt_tokens":5,"completion_tokens":1}}\n\n',
+          "data: [DONE]\n\n",
+        ]),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      fetch,
+    });
+    const events = await collectEvents(
+      driver.streamChat({
+        model: "anthropic/claude-3.5-sonnet",
+        messages: [{ role: "user", content: "hi" }],
+      }),
+    );
+    const last = events.at(-1);
+    expect(last).toEqual({
+      kind: "done",
+      inputTokens: 5,
+      outputTokens: 1,
+      costUsd: null,
+    });
+  });
+
+  test("model slug with / flows through unchanged", async () => {
+    // OpenRouter slugs are /; verify the slash doesn't trip
+    // any parser between the request body and the audit-tag composition.
+    let observedModel: unknown = null;
+    const fetch = fakeFetch((url, init) => {
+      const body = JSON.parse((init?.body as string) ?? "{}");
+      observedModel = body.model;
+      return streamResponse([
+        'data: {"choices":[{"delta":{"content":"x"}}]}\n\n',
+        'data: {"choices":[{"delta":{},"finish_reason":"stop"}]}\n\n',
+        'data: {"usage":{"prompt_tokens":1,"completion_tokens":1,"cost":0.000001}}\n\n',
+        "data: [DONE]\n\n",
+      ]);
+    });
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "k",
+      fetch,
+    });
+    await collectEvents(
+      driver.streamChat({
+        model: "meta-llama/llama-3.3-70b-instruct",
+        messages: [{ role: "user", content: "hi" }],
+      }),
+    );
+    expect(observedModel).toBe("meta-llama/llama-3.3-70b-instruct");
+  });
+
+  test("auth + attribution headers ride every streamChat request", async () => {
+    let observedAuth = "";
+    let observedReferer = "";
+    let observedTitle = "";
+    const fetch = fakeFetch((url, init) => {
+      expect(url).toBe("https://openrouter.ai/api/v1/chat/completions");
+      const headers = (init?.headers ?? {}) as Record;
+      observedAuth = headers.authorization ?? "";
+      observedReferer = headers["http-referer"] ?? "";
+      observedTitle = headers["x-title"] ?? "";
+      return streamResponse(["data: [DONE]\n\n"]);
+    });
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "sk-or-test-key",
+      referer: "https://solrac.dev",
+      title: "solrac-prod",
+      fetch,
+    });
+    await collectEvents(
+      driver.streamChat({
+        model: "anthropic/claude-3.5-sonnet",
+        messages: [{ role: "user", content: "hi" }],
+      }),
+    );
+    expect(observedAuth).toBe("Bearer sk-or-test-key");
+    expect(observedReferer).toBe("https://solrac.dev");
+    expect(observedTitle).toBe("solrac-prod");
+  });
+
+  test("HTTP 401 → EngineDriverError http_error with REMOTE_API_KEY hint", async () => {
+    const fetch = fakeFetch(
+      () =>
+        new Response(
+          JSON.stringify({ error: { message: "Invalid API key", code: 401 } }),
+          { status: 401 },
+        ),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "bad",
+      fetch,
+    });
+    try {
+      await collectEvents(
+        driver.streamChat({
+          model: "anthropic/claude-3.5-sonnet",
+          messages: [{ role: "user", content: "hi" }],
+        }),
+      );
+      throw new Error("expected throw");
+    } catch (err) {
+      expect(err).toBeInstanceOf(EngineDriverError);
+      const drvErr = err as EngineDriverError;
+      expect(drvErr.code).toBe("http_error");
+      expect(drvErr.status).toBe(401);
+      expect(drvErr.message).toMatch(/REMOTE_API_KEY/);
+    }
+  });
+
+  test("HTTP 404 → EngineDriverError model_missing", async () => {
+    const fetch = fakeFetch(
+      () =>
+        new Response(
+          JSON.stringify({ error: { message: "Model not found: foo/bar" } }),
+          { status: 404 },
+        ),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "k",
+      fetch,
+    });
+    try {
+      await collectEvents(
+        driver.streamChat({
+          model: "foo/bar",
+          messages: [{ role: "user", content: "hi" }],
+        }),
+      );
+      throw new Error("expected throw");
+    } catch (err) {
+      expect(err).toBeInstanceOf(EngineDriverError);
+      expect((err as EngineDriverError).code).toBe("model_missing");
+    }
+  });
+
+  test("inline error frame → kind:error event terminates stream", async () => {
+    const fetch = fakeFetch(
+      () =>
+        streamResponse([
+          'data: {"error":{"message":"context length exceeded"}}\n\n',
+          "data: [DONE]\n\n",
+        ]),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "k",
+      fetch,
+    });
+    const events = await collectEvents(
+      driver.streamChat({
+        model: "anthropic/claude-3.5-sonnet",
+        messages: [{ role: "user", content: "hi" }],
+      }),
+    );
+    expect(events.length).toBe(1);
+    expect(events[0]).toEqual({
+      kind: "error",
+      message: "context length exceeded",
+    });
+  });
+
+  test("tool_call SSE chunks accumulate then emit on finish_reason", async () => {
+    // OpenAI tool-call deltas split function.arguments across multiple chunks.
+    // Verify the OpenRouter driver assembles them like the LMStudio driver does.
+    const fetch = fakeFetch(
+      () =>
+        streamResponse([
+          'data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_1","function":{"name":"time_now","arguments":"{\\"tz\\""}}]}}]}\n\n',
+          'data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\\"UTC\\"}"}}]}}]}\n\n',
+          'data: {"choices":[{"delta":{},"finish_reason":"tool_calls"}]}\n\n',
+          'data: {"usage":{"prompt_tokens":20,"completion_tokens":5,"cost":0.0001}}\n\n',
+          "data: [DONE]\n\n",
+        ]),
+    );
+    const driver = createOpenrouterDriver({
+      url: "https://openrouter.ai/api/v1",
+      apiKey: "k",
+      fetch,
+    });
+    const events = await collectEvents(
+      driver.streamChat({
+        model: "anthropic/claude-3.5-sonnet",
+        messages: [{ role: "user", content: "what time is it?" }],
+      }),
+    );
+    const toolCalls = events.filter((e) => e.kind === "tool_call") as Array<
+      EngineChatEvent & { kind: "tool_call" }
+    >;
+    expect(toolCalls.length).toBe(1);
+    expect(toolCalls[0]?.call.function.name).toBe("time_now");
+    expect(toolCalls[0]?.call.function.arguments).toEqual({ tz: "UTC" });
+    expect(toolCalls[0]?.call.id).toBe("call_1");
+  });
+});
diff --git a/src/remote-driver.ts b/src/remote-driver.ts
new file mode 100644
index 0000000..5bf8c97
--- /dev/null
+++ b/src/remote-driver.ts
@@ -0,0 +1,576 @@
+/**
+ * @fileoverview Engine-slot driver for hosted/remote LLM providers — currently
+ *               OpenRouter; designed so future remote backends (Together,
+ *               Groq, Fireworks, …) slot in as additional factories.
+ * @purpose Hide every wire-format detail of remote OpenAI-compatible APIs
+ *          behind the normalized `EngineChatEvent` stream so the runner in
+ *          `engine.ts` consumes one shape regardless of whether the engine
+ *          slot is filled by an on-host daemon or a hosted provider.
+ *
+ * Today: one implementation.
+ *   - `OpenrouterDriver` — `POST ${baseUrl}/chat/completions` SSE (`data:`
+ *     events, `[DONE]` terminator). Probe: `GET ${baseUrl}/models`. Auth via
+ *     `Authorization: Bearer `. Attribution via `HTTP-Referer` +
+ *     `X-Title`. Captures `usage.cost` (USD) from the trailing chunk so the
+ *     runner writes a real per-turn cost to `audit.cost_usd`, gating the
+ *     same per-chat + global hourly cost caps as Anthropic burn.
+ *
+ * Why a dedicated file (vs sharing the LMStudio SSE driver):
+ *   The bodies are 95% identical but each backend has quirks that have
+ *   already drifted (LMStudio silent-substitution, Gemma-4 dedup, harmony
+ *   tokens) and will keep drifting. Duplication scopes regression risk
+ *   per-backend; the cross-cutting concern (cost capture) lives in a single
+ *   line per driver — easy to keep in sync, hard to accidentally couple.
+ *
+ * Differences from the LMStudio driver in `local-driver.ts`:
+ *   - Endpoint is `${baseUrl}/chat/completions` (OpenRouter's base URL
+ *     already contains `/api/v1`; LMStudio's base URL does not).
+ *   - `Authorization: Bearer ` is required.
+ *   - `HTTP-Referer` + `X-Title` are recommended attribution headers;
+ *     OpenRouter uses them for analytics + listing in the model-usage
+ *     leaderboard.
+ *   - No `parallel_tool_calls: false` flag — OpenRouter routes to the actual
+ *     provider's API, which handles parallel calls correctly (the Gemma-4
+ *     bug was specifically LMStudio's adapter).
+ *   - No silent model substitution — OpenRouter returns HTTP 4xx with an
+ *     error message when the model slug is unknown, so the first-chunk model
+ *     check is skipped (it would also break legitimate provider-fallback
+ *     responses where `chunk.model` is the served model, not the requested
+ *     one).
+ *   - `usage.cost` (USD) is included in the trailing chunk automatically —
+ *     no opt-in required (the historical `usage: { include: true }` /
+ *     `stream_options: { include_usage: true }` flags are deprecated and
+ *     have no effect as of 2026 per OpenRouter docs). Captured and emitted
+ *     via the `done` event so the runner can write a real cost to
+ *     `audit.cost_usd`.
+ *   - Probe path is `${baseUrl}/models` (no `/v1` prefix doubled).
+ *
+ * Position in the dependency graph:
+ *   log + engine-driver → remote-driver → engine-tools, engine
+ *
+ * Exports:
+ *   - `OpenrouterDriverOpts` — factory options (extends `DriverOpts` with
+ *     `apiKey`, optional `referer`, optional `title`).
+ *   - `createOpenrouterDriver(opts)` — OpenRouter factory.
+ *   - `createRemoteDriver(backend, opts)` — dispatch factory; mirrors
+ *     `createLocalDriver` in shape so boot wiring stays symmetric.
+ *
+ * Key invariants:
+ *   - Each driver sets `mode: "remote"` so the runner writes the
+ *     `remote::` audit-tag prefix and captures cost.
+ *   - The OpenRouter driver populates `EngineChatEvent.done.costUsd` from
+ *     `usage.cost` in the trailing SSE chunk. If the field is absent (the
+ *     model omitted it, or OpenRouter changed shape), `costUsd` stays
+ *     `null` — the runner writes `audit.cost_usd = null` plus a
+ *     `remote.cost_missing` warn, NOT `0` (a `0` would silently bypass the
+ *     `COALESCE(SUM(cost_usd), 0)` cost cap).
+ */
+
+import {
+  type DriverOpts,
+  type EngineBackend,
+  type EngineChatEvent,
+  type EngineChatMessage,
+  type EngineDriver,
+  type EngineProbeResult,
+  EngineDriverError,
+  RAW_FRAME_BUFFER_MAX,
+  RAW_FRAME_TRUNC,
+  maybeLogEmptyStream,
+  stableStringify,
+} from "./engine-driver.ts";
+import { log } from "./log.ts";
+
+/**
+ * OpenRouter-specific factory options. Required auth + recommended
+ * attribution headers. Distinct from `DriverOpts` so callers can't
+ * accidentally pass an Ollama/LMStudio config to the openrouter factory.
+ */
+export interface OpenrouterDriverOpts extends DriverOpts {
+  apiKey: string;
+  // OpenRouter attribution headers — recommended (not required). Optional
+  // here so tests don't need to set them; main.ts boot wiring supplies
+  // operator-configurable defaults.
+  referer?: string;
+  title?: string;
+}
+
+// ---------------------------------------------------------------------------
+// OpenRouter SSE frame shapes
+// ---------------------------------------------------------------------------
+//
+// Cloned from `LmstudioSseFrame` in local-driver.ts intentionally — each
+// backend's quirks have drifted before and will drift again (e.g. provider-
+// fallback metadata, BYOK signaling). Keeping the typedef per-driver scopes
+// future churn to one file at a time.
+
+interface OpenrouterSseToolCallDelta {
+  index?: number;
+  id?: string;
+  type?: string;
+  function?: { name?: string; arguments?: string };
+}
+
+interface OpenrouterSseChoice {
+  index?: number;
+  delta?: {
+    role?: string;
+    content?: string | null;
+    tool_calls?: ReadonlyArray;
+  };
+  finish_reason?: string | null;
+}
+
+interface OpenrouterSseFrame {
+  // OpenAI streaming includes the model id on every chunk. OpenRouter echoes
+  // the served-model id (which differs from the requested one under provider-
+  // fallback). Driver does NOT compare this against the requested model —
+  // legitimate fallbacks would otherwise look like substitution.
+  model?: string;
+  choices?: ReadonlyArray;
+  // OpenAI's streaming usage shape. OpenRouter populates `cost` (USD) and
+  // `is_byok` in addition to the standard token counts; the runner reads
+  // `cost` to write `audit.cost_usd`.
+  usage?: {
+    prompt_tokens?: number;
+    completion_tokens?: number;
+    cost?: number;
+    is_byok?: boolean;
+  };
+  // OpenRouter inlines errors inside the SSE stream rather than via HTTP
+  // status when the upstream provider returns mid-stream. Surface to the
+  // consumer as a normalized `error` event.
+  error?: { message?: string } | string;
+}
+
+interface ToolCallAccumulator {
+  id?: string;
+  name: string;
+  argsBuffer: string;
+}
+
+function openrouterSerializeMessage(m: EngineChatMessage): Record {
+  // Identical to LMStudio's serializer — OpenRouter is OpenAI-compatible at
+  // the message-shape level. Duplicated rather than shared because the
+  // shapes can diverge per-backend in the future (e.g. provider-specific
+  // message metadata) and a shared helper would invite premature unification.
+  const out: Record = { role: m.role, content: m.content };
+  if (m.tool_calls) {
+    out.tool_calls = m.tool_calls.map((tc, idx) => ({
+      id: tc.id ?? `call_${idx}`,
+      type: "function",
+      function: {
+        name: tc.function.name,
+        arguments:
+          typeof tc.function.arguments === "string"
+            ? tc.function.arguments
+            : JSON.stringify(tc.function.arguments ?? {}),
+      },
+    }));
+  }
+  if (m.tool_call_id) out.tool_call_id = m.tool_call_id;
+  return out;
+}
+
+export function createOpenrouterDriver(opts: OpenrouterDriverOpts): EngineDriver {
+  const fetchImpl = opts.fetch ?? globalThis.fetch;
+  const url = opts.url;
+  const apiKey = opts.apiKey;
+  // Attribution headers — recommended by OpenRouter docs; not required, but
+  // omitting them means our usage shows as "anonymous" in OpenRouter's model
+  // leaderboards. Defaults keep dev-mode boots from needing extra env wiring.
+  const referer = opts.referer ?? "https://github.com/cjus/solrac";
+  const title = opts.title ?? "solrac";
+
+  function authHeaders(): Record {
+    return {
+      authorization: `Bearer ${apiKey}`,
+      "http-referer": referer,
+      "x-title": title,
+    };
+  }
+
+  return {
+    backend: "openrouter",
+    mode: "remote",
+
+    async probe(model, signal): Promise {
+      let res: Response;
+      try {
+        res = await fetchImpl(`${url}/models`, {
+          signal,
+          headers: authHeaders(),
+        });
+      } catch (err) {
+        return { ok: false, reason: `unreachable: ${(err as Error).message}` };
+      }
+      if (!res.ok) {
+        // 401 here usually means a bad API key; surface as `auth_failed` text
+        // so the operator's boot log makes the fix obvious.
+        if (res.status === 401) {
+          return { ok: false, reason: `auth_failed: invalid REMOTE_API_KEY (HTTP 401)` };
+        }
+        return { ok: false, reason: `probe HTTP ${res.status}` };
+      }
+      const body = (await res.json().catch(() => null)) as
+        | { data?: ReadonlyArray<{ id?: string }> }
+        | null;
+      const models = body?.data ?? [];
+      const found = models.some((m) => m?.id === model);
+      if (!found) {
+        return {
+          ok: false,
+          modelMissing: true,
+          reason: `model ${model} not listed on OpenRouter — check the slug at https://openrouter.ai/models`,
+        };
+      }
+      return { ok: true };
+    },
+
+    async *streamChat(opts): AsyncIterable {
+      const requestBody: Record = {
+        model: opts.model,
+        messages: opts.messages.map(openrouterSerializeMessage),
+        stream: true,
+      };
+      if (opts.tools && opts.tools.length > 0) {
+        requestBody.tools = opts.tools;
+      }
+
+      let res: Response;
+      try {
+        res = await fetchImpl(`${url}/chat/completions`, {
+          method: "POST",
+          headers: {
+            "content-type": "application/json",
+            ...authHeaders(),
+          },
+          body: JSON.stringify(requestBody),
+          signal: opts.signal,
+        });
+      } catch (err) {
+        const e = err as Error;
+        if (e.name === "AbortError") {
+          throw new EngineDriverError("openrouter", "timeout", "request aborted");
+        }
+        throw new EngineDriverError("openrouter", "unreachable", `unreachable: ${url}`);
+      }
+
+      if (!res.ok) {
+        const bodyText = await res.text().catch(() => "");
+        let parsed: { error?: { message?: string; code?: number } | string } = {};
+        try {
+          parsed = JSON.parse(bodyText) as {
+            error?: { message?: string; code?: number } | string;
+          };
+        } catch {
+          // not JSON
+        }
+        const errObj = parsed.error;
+        const errMsg =
+          typeof errObj === "string"
+            ? errObj
+            : (errObj?.message ?? (bodyText.slice(0, 200) || res.statusText));
+        if (res.status === 401) {
+          throw new EngineDriverError(
+            "openrouter",
+            "http_error",
+            `auth failed (HTTP 401): ${errMsg} — check REMOTE_API_KEY`,
+            401,
+          );
+        }
+        // OpenRouter returns 404 (or a 4xx with "model" in the message) when
+        // the requested slug is unknown. Surface as `model_missing` so the
+        // boot probe + per-turn render share the same diagnostic.
+        if (
+          res.status === 404 ||
+          (typeof errMsg === "string" && /model.*not.*(found|exist|available)/i.test(errMsg))
+        ) {
+          throw new EngineDriverError(
+            "openrouter",
+            "model_missing",
+            `model not available on OpenRouter: ${opts.model} — ${errMsg}`,
+            res.status,
+          );
+        }
+        throw new EngineDriverError(
+          "openrouter",
+          "http_error",
+          `HTTP ${res.status} ${errMsg}`,
+          res.status,
+        );
+      }
+      if (!res.body) {
+        throw new EngineDriverError("openrouter", "http_error", "empty body");
+      }
+
+      const reader = res.body.getReader();
+      const decoder = new TextDecoder();
+      let buffer = "";
+      let inputTokens: number | null = null;
+      let outputTokens: number | null = null;
+      // OpenRouter populates `usage.cost` (USD) in the trailing usage chunk,
+      // independently of token counts. `null` here means the field never
+      // arrived; the runner treats `null` as "cost unknown" (NOT zero) and
+      // writes `audit.cost_usd = null` plus a `remote.cost_missing` warn so
+      // operators see the issue. Without that distinction, a missing field
+      // would slip past the cap query's `COALESCE(SUM(cost_usd), 0)`.
+      let costUsd: number | null = null;
+      const toolAccum = new Map();
+      const emittedDedup = new Set();
+      const rawFrameBuffer: string[] = [];
+      let textCharsEmitted = 0;
+      let toolCallsEmitted = 0;
+
+      function emitAccumulated(): EngineChatEvent[] {
+        const events: EngineChatEvent[] = [];
+        const indices = [...toolAccum.keys()].sort((a, b) => a - b);
+        for (const i of indices) {
+          const acc = toolAccum.get(i);
+          if (!acc) continue;
+          let parsedArgs: unknown;
+          try {
+            parsedArgs = acc.argsBuffer === "" ? {} : JSON.parse(acc.argsBuffer);
+          } catch {
+            parsedArgs = acc.argsBuffer;
+          }
+          const dedupKey = stableStringify({ name: acc.name, args: parsedArgs });
+          if (emittedDedup.has(dedupKey)) {
+            log.info("openrouter.tool_call_deduped", { name: acc.name });
+            continue;
+          }
+          emittedDedup.add(dedupKey);
+          events.push({
+            kind: "tool_call",
+            call: {
+              id: acc.id,
+              function: { name: acc.name, arguments: parsedArgs },
+            },
+          });
+        }
+        toolAccum.clear();
+        return events;
+      }
+
+      try {
+        while (true) {
+          const { done, value } = await reader.read();
+          if (done) break;
+          buffer += decoder.decode(value, { stream: true });
+          buffer = buffer.replace(/\r\n/g, "\n");
+          let evtEnd: number;
+          while ((evtEnd = buffer.indexOf("\n\n")) !== -1) {
+            const rawEvent = buffer.slice(0, evtEnd);
+            buffer = buffer.slice(evtEnd + 2);
+            const dataLines: string[] = [];
+            for (const line of rawEvent.split("\n")) {
+              if (line.startsWith("data:")) {
+                dataLines.push(line.slice(5).replace(/^ /, ""));
+              }
+            }
+            if (dataLines.length === 0) continue;
+            const data = dataLines.join("\n");
+            // OpenRouter sometimes emits keep-alive SSE comments (`: ...`) on
+            // long-running streams. Those have no `data:` line; the dataLines
+            // filter above already drops them. Belt-and-suspenders: skip empty
+            // payloads.
+            if (data === "") continue;
+            if (rawFrameBuffer.length < RAW_FRAME_BUFFER_MAX) {
+              rawFrameBuffer.push(data.slice(0, RAW_FRAME_TRUNC));
+            }
+            if (data === "[DONE]") {
+              const pending = emitAccumulated();
+              toolCallsEmitted += pending.length;
+              for (const evt of pending) yield evt;
+              maybeLogEmptyStream({
+                model: opts.model,
+                textCharsEmitted,
+                toolCallsEmitted,
+                rawFrameBuffer,
+                logEvent: "openrouter.empty_stream",
+              });
+              yield { kind: "done", inputTokens, outputTokens, costUsd };
+              return;
+            }
+            let frame: OpenrouterSseFrame;
+            try {
+              frame = JSON.parse(data) as OpenrouterSseFrame;
+            } catch (parseErr) {
+              log.warn("openrouter.bad_frame", {
+                error: (parseErr as Error).message,
+                data: data.slice(0, 120),
+              });
+              continue;
+            }
+            if (frame.error !== undefined) {
+              const msg =
+                typeof frame.error === "string"
+                  ? frame.error
+                  : (frame.error.message ?? "openrouter returned an error frame");
+              yield { kind: "error", message: msg };
+              return;
+            }
+            // Capture usage (tokens + cost) from any chunk that carries it.
+            // OpenRouter sends it on a dedicated trailing chunk AFTER the
+            // finish_reason chunk; the parser tolerates either position.
+            if (frame.usage) {
+              if (typeof frame.usage.prompt_tokens === "number") {
+                inputTokens = frame.usage.prompt_tokens;
+              }
+              if (typeof frame.usage.completion_tokens === "number") {
+                outputTokens = frame.usage.completion_tokens;
+              }
+              if (typeof frame.usage.cost === "number") {
+                costUsd = frame.usage.cost;
+              }
+            }
+            const choices = frame.choices;
+            if (!Array.isArray(choices) || choices.length === 0) continue;
+            const choice = choices[0]!;
+            const delta = choice.delta;
+            if (delta) {
+              if (typeof delta.content === "string" && delta.content.length > 0) {
+                textCharsEmitted += delta.content.length;
+                yield { kind: "text", delta: delta.content };
+              }
+              if (Array.isArray(delta.tool_calls)) {
+                for (const tc of delta.tool_calls) {
+                  const idx = typeof tc.index === "number" ? tc.index : 0;
+                  let acc = toolAccum.get(idx);
+                  if (!acc) {
+                    acc = { name: "", argsBuffer: "" };
+                    toolAccum.set(idx, acc);
+                  }
+                  if (typeof tc.id === "string") acc.id = tc.id;
+                  if (tc.function?.name) acc.name = tc.function.name;
+                  if (typeof tc.function?.arguments === "string") {
+                    acc.argsBuffer += tc.function.arguments;
+                  }
+                }
+              }
+            }
+            if (choice.finish_reason) {
+              const pending = emitAccumulated();
+              toolCallsEmitted += pending.length;
+              for (const evt of pending) yield evt;
+            }
+          }
+        }
+      } catch (err) {
+        if (err instanceof EngineDriverError) throw err;
+        const e = err as Error;
+        if (e.name === "AbortError") {
+          throw new EngineDriverError("openrouter", "timeout", "stream aborted");
+        }
+        throw new EngineDriverError(
+          "openrouter",
+          "unreachable",
+          `stream failed: ${e.message}`,
+        );
+      }
+
+      // Stream ended without a `[DONE]` line. Flush pending tool calls and
+      // emit done with whatever usage we captured.
+      const pending = emitAccumulated();
+      toolCallsEmitted += pending.length;
+      for (const evt of pending) yield evt;
+      maybeLogEmptyStream({
+        model: opts.model,
+        textCharsEmitted,
+        toolCallsEmitted,
+        rawFrameBuffer,
+        logEvent: "openrouter.empty_stream",
+      });
+      yield { kind: "done", inputTokens, outputTokens, costUsd };
+    },
+  };
+}
+
+/**
+ * Pick the driver implementation for the configured remote backend. Mirrors
+ * `createLocalDriver` in `local-driver.ts` so boot wiring in `main.ts` stays
+ * symmetric across modes.
+ *
+ * Today only `openrouter` is supported. Future remote providers (Together,
+ * Groq, Fireworks, …) add new factories above and a new branch here.
+ */
+export function createRemoteDriver(
+  backend: EngineBackend,
+  opts: OpenrouterDriverOpts,
+): EngineDriver {
+  if (backend === "openrouter") return createOpenrouterDriver(opts);
+  throw new Error(
+    `createRemoteDriver: unsupported backend "${backend}" (expected "openrouter")`,
+  );
+}
+
+// ---------------------------------------------------------------------------
+// Capability note (remote-mode system-prompt clause)
+// ---------------------------------------------------------------------------
+
+/**
+ * Remote-mode capability statement appended to SOUL.md before it ships as
+ * the first `system` message. Always frames the cost as per-token via
+ * OpenRouter, encouraging the model to be concise (the existing per-chat +
+ * global hourly cost caps gate burn — this is the operator-friendly nudge
+ * in the prompt itself). Local-mode (Ollama / LMStudio) has a parallel
+ * `buildLocalCapabilityNote` in `local-driver.ts`. The runner in `engine.ts`
+ * picks the right builder from `driver.mode`.
+ *
+ * Matrix mirrors `LocalCapabilityNoteOpts`; the only diff is the cost
+ * clause. Kept as two separate builders (rather than one with a `mode`
+ * parameter) so each builder reads as a coherent story for its mode.
+ */
+export interface RemoteCapabilityNoteOpts {
+  toolsEnabled: boolean;
+  isDefaultEngine: boolean;
+  toolNames: ReadonlyArray;
+}
+
+export function buildRemoteCapabilityNote(opts: RemoteCapabilityNoteOpts): string {
+  const { toolsEnabled, isDefaultEngine, toolNames } = opts;
+  const costClause = "your replies cost the operator per-token via OpenRouter, so be concise";
+  if (toolsEnabled) {
+    const list = toolNames.join(", ");
+    return (
+      `You are the default chat engine; ${costClause}. ` +
+      `You have these tools available: ${list}. ` +
+      "Call them when the user's request needs information or actions you " +
+      "can't deliver from your training alone (current data, external APIs, " +
+      "operator-authored integrations). Tool results return into your " +
+      "context — never tell the user 'I cannot do that' if a listed tool can. " +
+      "If a request is too complex for these tools or for local reasoning, " +
+      "suggest the user re-send with `@` (Sonnet) or `!` (Opus) for heavier reasoning."
+    );
+  }
+  if (isDefaultEngine) {
+    return (
+      `You are the default chat engine; ${costClause}. ` +
+      "You do not have tools — answer from what you know. " +
+      "If the user asks for something that needs tools (file edits, API calls, " +
+      "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
+      "or `!` (Opus) to escalate to a Claude tier."
+    );
+  }
+  return (
+    "You do not have tools; answer from what you know. " +
+    "If the user asks for something that needs tools (file edits, API calls, " +
+    "web fetches), tell them to re-send the message prefixed with `@` (Sonnet) " +
+    "or `!` (Opus)."
+  );
+}
+
+/**
+ * Convenience for the tools-on path. Defers to `buildRemoteCapabilityNote`
+ * so the matrix has a single source of truth.
+ */
+export function buildRemoteToolCapabilityNote(
+  toolNames: ReadonlyArray,
+  isDefaultEngine: boolean,
+): string {
+  return buildRemoteCapabilityNote({
+    toolsEnabled: true,
+    isDefaultEngine,
+    toolNames,
+  });
+}
diff --git a/src/skill-tools.test.ts b/src/skill-tools.test.ts
index d223a0e..6cd2eff 100644
--- a/src/skill-tools.test.ts
+++ b/src/skill-tools.test.ts
@@ -2,22 +2,23 @@
  * @fileoverview Unit tests for skill-as-tool dispatcher.
  * @proves Tool definition shape, naming, registry filtering, audit-row tag.
  *
- * Wire-format edge cases live in `local-driver.test.ts`. Tool-loop logic
- * lives in `local-tools.test.ts`. This file scopes to skill-tools shape
- * + filtering invariants that survive the driver abstraction.
+ * Wire-format edge cases live in `local-driver.test.ts` /
+ * `remote-driver.test.ts`. Tool-loop logic lives in `engine-tools.test.ts`.
+ * This file scopes to skill-tools shape + filtering invariants that survive
+ * the driver abstraction.
  */
 
 import { afterEach, beforeEach, describe, expect, test } from "bun:test";
 import { mkdtempSync, rmSync } from "node:fs";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
-import type { LocalSkillDeps } from "./commands.ts";
+import type { EngineSkillDeps } from "./commands.ts";
 import { openDb, type SolracDb } from "./db.ts";
 import {
-  type LocalChatEvent,
-  type LocalDriver,
-  type LocalStreamChatOpts,
-} from "./local-driver.ts";
+  type EngineChatEvent,
+  type EngineDriver,
+  type EngineStreamChatOpts,
+} from "./engine-driver.ts";
 import {
   buildSkillErrorPayload,
   buildSkillTools,
@@ -78,19 +79,20 @@ function makeRegistry(skills: ReadonlyArray): SkillRegistry {
   }) as unknown as SkillRegistry;
 }
 
-function noopDriver(): LocalDriver {
+function noopDriver(): EngineDriver {
   return {
     backend: "ollama",
+    mode: "local",
     async probe() {
       return { ok: true };
     },
-    async *streamChat(_opts: LocalStreamChatOpts): AsyncIterable {
-      yield { kind: "done", inputTokens: null, outputTokens: null };
+    async *streamChat(_opts: EngineStreamChatOpts): AsyncIterable {
+      yield { kind: "done", inputTokens: null, outputTokens: null , costUsd: null };
     },
   };
 }
 
-function makeDeps(): LocalSkillDeps {
+function makeDeps(): EngineSkillDeps {
   return {
     driver: noopDriver(),
     model: "test-m",
diff --git a/src/skill-tools.ts b/src/skill-tools.ts
index 008d1fb..3546d1a 100644
--- a/src/skill-tools.ts
+++ b/src/skill-tools.ts
@@ -43,7 +43,7 @@
  *
  * Cross-references:
  *   - src/commands.ts::runSkillBare — pure execution helper, recursion-safe
- *   - src/local.ts::runLocalTurnWithTools — wraps loop in skillToolCtx.run
+ *   - src/engine.ts::runEngineTurnWithTools — wraps loop in skillToolCtx.run
  *   - docs/USAGE.md#skills-as-tools — operator-facing docs
  */
 
@@ -53,7 +53,7 @@ import type { SdkMcpToolDefinition } from "@anthropic-ai/claude-agent-sdk";
 import { z } from "zod";
 import {
   runSkillBare,
-  type LocalSkillDeps,
+  type EngineSkillDeps,
 } from "./commands.ts";
 import type { SolracDb } from "./db.ts";
 import { log } from "./log.ts";
@@ -84,7 +84,7 @@ export const skillToolCtx = new AsyncLocalStorage();
 // The leading `skills` segment is the synthetic-integration namespace; the
 // trailing `` matches the operator's `name:` frontmatter. The
 // `mcp__solrac__` prefix the policy layer expects is added by
-// `local-tools.ts::executeToolCall` when reconstructing the full name.
+// `engine-tools.ts::executeToolCall` when reconstructing the full name.
 export const SKILL_TOOL_PREFIX = "skills__";
 
 export function skillToolName(skillName: string): string {
@@ -139,7 +139,7 @@ export interface BuildSkillToolsDeps {
   // skills (the tool-eligible filter below catches the contradiction). When
   // null and the registry contains tool-eligible skills, we log + return
   // empty rather than crash.
-  readonly localSkillDeps: LocalSkillDeps | null;
+  readonly localSkillDeps: EngineSkillDeps | null;
 }
 
 /**
@@ -179,7 +179,7 @@ export function buildSkillTools(
 function buildOneSkillTool(
   skill: Skill,
   db: SolracDb,
-  local: LocalSkillDeps,
+  local: EngineSkillDeps,
 ): SdkMcpToolDefinition {
   // The `args` schema mirrors the only template variable supported by skill
   // bodies (`{{args}}`). We expose it as a single string parameter rather
diff --git a/test/smokes/local.ts b/test/smokes/local.ts
index b5b0bb6..cf6597f 100644
--- a/test/smokes/local.ts
+++ b/test/smokes/local.ts
@@ -1,9 +1,9 @@
-// Live smoke: drives runLocalTurn against a real local backend (Ollama or
-// LMStudio) with a stub Telegram client. Proves the whole pipeline end-to-end
-// without touching the live bot or .env:
+// Live smoke: drives runEngineTurn against a real engine-slot backend (Ollama,
+// LMStudio, or OpenRouter) with a stub Telegram client. Proves the whole
+// pipeline end-to-end without touching the live bot or .env:
 //   1. Driver event-stream parsing matches what the real backend emits.
-//   2. Audit row finalizes correctly (model='local::',
-//      cost_usd=0, tokens populated).
+//   2. Audit row finalizes correctly (model='::',
+//      cost_usd matches the mode: 0 for local, > 0 for remote, tokens populated).
 //   3. History reconstruction (turn 2 sees turn 1's prompt+response).
 //   4. Telegram render path (stub-then-edit) produces sensible final text.
 //   5. (tools-on) A time_now tool call round-trips via runToolLoop; audit
@@ -15,16 +15,17 @@
 //   LOCAL_BACKEND=ollama LOCAL_MODEL=gemma4:e4b npm run smoke:local
 //   LOCAL_BACKEND=lmstudio LOCAL_MODEL=qwen2.5-7b npm run smoke:local
 //
+// OpenRouter (remote) mode — requires REMOTE_API_KEY; charges per-token:
+//   LOCAL_BACKEND=openrouter LOCAL_MODEL=openai/gpt-4o-mini REMOTE_API_KEY=sk-or-… npm run smoke:local
+//
 // To exercise the tools-on path:
 //   LOCAL_TOOLS_ENABLED=true npm run smoke:local
 
 import type { Message } from "@grammyjs/types";
-import { runLocalTurn } from "../../src/local.ts";
-import {
-  createLocalDriver,
-  type LocalBackend,
-  type LocalDriver,
-} from "../../src/local-driver.ts";
+import { runEngineTurn } from "../../src/engine.ts";
+import type { EngineBackend, EngineDriver } from "../../src/engine-driver.ts";
+import { createLocalDriver } from "../../src/local-driver.ts";
+import { createRemoteDriver } from "../../src/remote-driver.ts";
 import type { ConfirmationBroker } from "../../src/policy.ts";
 import type { SdkMcpToolDefinition } from "@anthropic-ai/claude-agent-sdk";
 import timeIntegration from "../../src/integrations-builtin/time/index.ts";
@@ -33,20 +34,50 @@ import type { TelegramClient } from "../../src/telegram.ts";
 import { openTestDb, reportAndExit, type Phase } from "./harness.ts";
 
 const BACKEND = parseBackend(process.env.LOCAL_BACKEND);
+// Remote mode kicks in only when LOCAL_BACKEND=openrouter. The same env knob
+// covers all three backends to keep the smoke surface small.
+const MODE: "local" | "remote" = BACKEND === "openrouter" ? "remote" : "local";
+const REMOTE_API_KEY = process.env.REMOTE_API_KEY ?? "";
 const URL =
   process.env.LOCAL_URL ??
-  (BACKEND === "lmstudio" ? "http://localhost:1234" : "http://localhost:11434");
+  (BACKEND === "openrouter"
+    ? "https://openrouter.ai/api/v1"
+    : BACKEND === "lmstudio"
+      ? "http://localhost:1234"
+      : "http://localhost:11434");
 const MODEL =
-  process.env.LOCAL_MODEL ?? (BACKEND === "lmstudio" ? "qwen2.5-7b" : "gemma4:e4b");
+  process.env.LOCAL_MODEL ??
+  (BACKEND === "openrouter"
+    ? "openai/gpt-4o-mini"
+    : BACKEND === "lmstudio"
+      ? "qwen2.5-7b"
+      : "gemma4:e4b");
 
-function parseBackend(raw: string | undefined): LocalBackend {
+function parseBackend(raw: string | undefined): EngineBackend {
   if (raw === "lmstudio") return "lmstudio";
+  if (raw === "openrouter") return "openrouter";
   return "ollama";
 }
 
 // Backend-specific hint substring expected in error messages for the
 // bad-model phase. Each backend has its own "model missing" copy.
-const PULL_HINT = BACKEND === "lmstudio" ? "lms load" : "ollama pull";
+const PULL_HINT =
+  BACKEND === "openrouter"
+    ? "openrouter"
+    : BACKEND === "lmstudio"
+      ? "lms load"
+      : "ollama pull";
+
+// Hard-skip OpenRouter without a key — the smoke makes real billed API
+// calls. Print a clear message + exit 0 so CI / sweep runs don't break.
+if (BACKEND === "openrouter" && !REMOTE_API_KEY) {
+  // eslint-disable-next-line no-console
+  console.log(
+    "local-smoke: BACKEND=openrouter requested but REMOTE_API_KEY is not set. " +
+      "Skipping (this smoke makes real billed API calls; set REMOTE_API_KEY to run).",
+  );
+  process.exit(0);
+}
 
 interface CapturedTg extends TelegramClient {
   sent: { chatId: number; text: string }[];
@@ -82,7 +113,10 @@ async function main(): Promise {
 
   const { db } = await openTestDb("local-smoke");
   const tg = makeCapturedTg();
-  const driver: LocalDriver = createLocalDriver(BACKEND, { url: URL });
+  const driver: EngineDriver =
+    BACKEND === "openrouter"
+      ? createRemoteDriver(BACKEND, { url: URL, apiKey: REMOTE_API_KEY })
+      : createLocalDriver(BACKEND, { url: URL });
   const deps = {
     tg,
     db,
@@ -99,11 +133,11 @@ async function main(): Promise {
   const phases: Phase[] = [];
   const CHAT = 8001;
   const FROM = 9001;
-  const EXPECTED_MODEL_TAG = `local:${BACKEND}:${MODEL}`;
+  const EXPECTED_MODEL_TAG = `${MODE}:${BACKEND}:${MODEL}`;
 
   // ── Turn 1: cold call. No history, just system + user. ────────────────────
   const t0 = Date.now();
-  await runLocalTurn(deps, {
+  await runEngineTurn(deps, {
     chatId: CHAT,
     fromId: FROM,
     updateId: 1001,
@@ -135,16 +169,26 @@ async function main(): Promise {
     pass: row1.status === "ok",
   });
   phases.push({
-    name: "turn 1: model column tagged local::",
+    name: `turn 1: model column tagged ${MODE}::`,
     expected: `model=${EXPECTED_MODEL_TAG}`,
     actual: `model=${row1.model}`,
     pass: row1.model === EXPECTED_MODEL_TAG,
   });
+  // Cost expectation forks on mode:
+  //   local  → exactly 0 (on-host backends are free).
+  //   remote → > 0 (OpenRouter bills per turn; the streaming usage.cost
+  //            field flowed through resolveAuditCost into audit.cost_usd).
   phases.push({
-    name: "turn 1: cost_usd is 0 (local engine is free)",
-    expected: "cost_usd=0",
+    name:
+      MODE === "remote"
+        ? "turn 1: cost_usd > 0 (remote mode captures OpenRouter cost)"
+        : "turn 1: cost_usd is 0 (local engine is free)",
+    expected: MODE === "remote" ? "cost_usd > 0" : "cost_usd=0",
     actual: `cost_usd=${row1.cost_usd}`,
-    pass: row1.cost_usd === 0,
+    pass:
+      MODE === "remote"
+        ? typeof row1.cost_usd === "number" && row1.cost_usd > 0
+        : row1.cost_usd === 0,
   });
   phases.push({
     name: "turn 1: agent_session_id null (local engine is sessionless)",
@@ -202,7 +246,7 @@ async function main(): Promise {
 
   // ── Turn 2: follow-up. recentChatTurns now returns turn 1. ────────────────
   const tg2 = makeCapturedTg();
-  await runLocalTurn(
+  await runEngineTurn(
     { ...deps, tg: tg2 },
     {
       chatId: CHAT,
@@ -221,7 +265,7 @@ async function main(): Promise {
     pass: row2.status === "ok",
   });
   phases.push({
-    name: "turn 2: model column tagged local::",
+    name: `turn 2: model column tagged ${MODE}::`,
     expected: `model=${EXPECTED_MODEL_TAG}`,
     actual: `model=${row2.model}`,
     pass: row2.model === EXPECTED_MODEL_TAG,
@@ -240,7 +284,7 @@ async function main(): Promise {
 
   // ── Error path: bad model name → audit row status='error' with hint ───────
   const tg3 = makeCapturedTg();
-  await runLocalTurn(
+  await runEngineTurn(
     { ...deps, tg: tg3, model: "definitely-not-a-real-model" },
     {
       chatId: CHAT,
@@ -291,7 +335,7 @@ async function main(): Promise {
     };
 
     const tg4 = makeCapturedTg();
-    await runLocalTurn(
+    await runEngineTurn(
       {
         ...deps,
         tg: tg4,
@@ -327,7 +371,7 @@ async function main(): Promise {
       pass: row4.status === "ok",
     });
     phases.push({
-      name: "tools-on: model column tagged local::",
+      name: `tools-on: model column tagged ${MODE}::`,
       expected: `model=${EXPECTED_MODEL_TAG}`,
       actual: `model=${row4.model}`,
       pass: row4.model === EXPECTED_MODEL_TAG,