ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source

## Background

Joel called out a pattern in our inference layer (2026-04-17 chat session):

> "context window limits are defined BY the model as are features such as audio or vision. this is probably why those attempts also failed"
> "if you need a var require it / pass the entire struct around / has all the info you need / or grab it"
> "make it declarative"

Today we have parallel model-info plumbing:
- `system/shared/ModelContextWindows.ts` — TS lookup tables (`getContextWindow`, `getInferenceSpeed`, `isSlowLocalModel`, `getLatencyAwareTokenLimit`)
- `workers/continuum-core/src/ai/types.rs::ModelInfo` — Rust struct with `Option<>` on `max_output_tokens` and `cost_per_1k_tokens`
- 21 hardcoded `ModelInfo {…}` constructions across `openai_adapter.rs`, `candle_adapter.rs`, `anthropic_adapter.rs`, `embedding.rs` — each adapter maintains a *static* catalog instead of querying its source
- `system/core/src/models/mod.rs` — yet another parallel `Option<u32> max_output_tokens` definition

Symptoms this caused (visible on M5 PR #914 verification today):
- `ChatRAGBuilder` computed `totalBudget = floor(contextWindow × 0.75)`. For Qwen3.5-4b's 262k window = 196k tokens. RAG actually filled ~14k per request → llama-server allocated full 262k KV cache per persona slot → `com.docker.llama-server` 20.87 GB resident on M5, 44 GB total vs 32 GB physical = swap.
- Vision/audio attempts have failed silently when the hardcoded TS table claims a model supports a capability the actual model doesn't.
- `getInferenceSpeed` is a TS const — fundamentally can't reflect what's measured at runtime.

## Scope

Single coherent refactor, ~25 files, its own branch. Not to be sprinkled into other PRs.

### 1. `ModelMetadata` (replaces `ModelInfo`), all fields required

```rust
#[derive(Debug, Clone, Serialize, Deserialize, TS)]
#[ts(export, export_to = "../../../shared/generated/ai/ModelMetadata.ts")]
#[serde(rename_all = "camelCase")]
pub struct ModelMetadata {
    pub id: String,
    pub name: String,
    pub provider: String,
    pub capabilities: Vec<ModelCapability>,
    pub context_window: u32,
    pub max_output_tokens: u32,
    pub cost_per_1k_tokens: CostPer1kTokens,  // local = {0,0}
    pub tokens_per_second: f32,
    pub supports_streaming: bool,
    pub supports_tools: bool,
}
```

No `Option<>`. Local-cost = {0,0} is still a declaration, not an absence.

### 2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

- **DMR**: `GET http://localhost:12434/engines/v1/models` returns the live catalog. `docker model inspect <id>` exposes GGUF metadata for fields the catalog doesn't.
- **OpenAI / Anthropic / DeepSeek / etc.**: their `/v1/models` endpoint. Cache at adapter `initialize()`.
- **Candle**: GGUF metadata directly from the loaded file.

Delete the 21 hardcoded literals.

### 3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

```rust
fn model_metadata(&self, model_id: &str) -> Option<ModelMetadata>;  // None ONLY when not in adapter's live catalog
```

### 4. Thread `ModelMetadata` through the chain

- `PersonaResponseGenerator` receives `ModelMetadata` at request entry.
- `ChatRAGBuilder.buildContext(model: ModelMetadata, …)` reads `model.context_window`, `model.tokens_per_second`, `model.capabilities` directly.
- Vision attachment, tool injection — gated by `model.capabilities` and `model.supports_tools`.

### 5. Delete the lookup-helper layer

- `system/shared/ModelContextWindows.ts` — fully deletable.
- `system/core/src/models/mod.rs` — collapse into `ai/types.rs`.

## Acceptance

- `grep -r "Option<u32>" workers/continuum-core/src/ai/` returns zero hits.
- `grep -rn "ModelInfo {" workers/continuum-core/src/` only matches `ai/types.rs` (the definition itself).
- `system/shared/ModelContextWindows.ts` deleted.
- `ChatRAGBuilder` and `PersonaResponseGenerator` take `ModelMetadata`; never reconstruct it from loose strings.
- Live test on M5: persona chat sends prompts that respect `model.context_window` AND the latency budget derived from `model.tokens_per_second`. KV cache pressure drops from 20+ GB to single-GB range.

## Why separate

Touching 21+ adapter sites + consumer chain + IPC export + TS plumbing has to land atomically. Half of it sprinkled into other PRs leaves the codebase worse than it started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917

Background

Scope

1. `ModelMetadata` (replaces `ModelInfo`), all fields required

2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

4. Thread `ModelMetadata` through the chain

5. Delete the lookup-helper layer

Acceptance

Why separate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917

Description

Background

Scope

1. ModelMetadata (replaces ModelInfo), all fields required

2. Adapters query their source, not hardcoded vec![ModelInfo {…}]

3. AIProviderAdapter::model_metadata(model_id) returns the full struct

4. Thread ModelMetadata through the chain

5. Delete the lookup-helper layer

Acceptance

Why separate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `ModelMetadata` (replaces `ModelInfo`), all fields required

2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

4. Thread `ModelMetadata` through the chain