Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,63 @@

## v0.7.4 - 2026-05-06

### Chat experience (the headline)

**Phase 1 — UX foundations**
- Syntax highlighting in code blocks, in-thread search, conversation export, real cancel (mid-stream abort), reasoning-effort levels.
- Reasoning panel: collapsible streaming preview, fixed first-paragraph gap.

**Phase 2.0 — perf surface + watchdogs**
- Prompt-processing feedback + TTFT (time-to-first-token) live indicator.
- Prompt-eval timeout, memory gate, runaway guards (token rate floor, repetition guard), panic + thermal banners, image/video gates that block kicking off a generation when VRAM/RAM headroom is unsafe.

**Phase 2.1 — refactor**
- Decomposed monolithic `ChatTab.tsx` into `ChatSidebar` / `ChatHeader` / `ChatThread` / `ChatComposer`.

**Phase 2.2 — sampler control**
- Full sampler exposure: `top_p`, `top_k`, `min_p`, `repeat_penalty`, `seed`, `mirostat`, `reasoning_effort`.
- JSON-schema constrained-output opt-in (`json_schema` field).

**Phase 2.4 / 2.5 — message-tree workflows**
- Conversation branching: fork from any assistant message into a sibling thread.
- In-thread compare: render sibling variants side-by-side under the assistant bubble.

**Phase 2.6 / 2.7 — context & prompts**
- Cross-platform RAG: semantic embedding via `llama-embedding` + cosine retrieval over local docs.
- Prompt presets + variables: fill-form before "Use in Chat" so reusable prompts can take inputs.

**Phase 2.8 — structured tool output**
- Tool call results render as table / code / markdown / image based on returned shape, not raw JSON.

**Phase 2.10 — MCP client**
- Stdio JSON-RPC transport + tool adapter so any local MCP server is callable from chat. Provenance shown per tool result.

**Phase 2.11 / 2.12 — model-aware composer**
- Typed capability declarations (vision / tools / json_schema / reasoning) surface as badges in every model picker.
- Composer auto-gating (e.g. attach-image button hidden when active model has no vision).
- Mid-thread model swap with one-turn override (try a different model for a single response, then revert).

**Phase 2.13 — OpenAI-compatible server**
- Full sampler chain + embeddings parity. Apps that talk to `/v1/chat/completions` no longer lose advanced sampler params on the way through.

**Phase 2.14 — catalog browser**
- VRAM-fit hints on every Discover variant card so you see at a glance what'll actually run on your machine.

**Phase 3.x — substrate transparency**
- KV strategy chip in composer: per-turn cache override (native / chaosengine / rotorquant / turboquant / triattention) without touching launch settings.
- DDTree accepted-token overlay: substrate truth view of which speculative draft tokens were accepted.
- Logprobs viz (advanced-mode gated): per-message confidence summary, MLX logprobs streaming passthrough.
- Substrate routing inspector: per-turn badge above the metrics row showing which engine + binary served the response.
- Per-turn host strip: cross-platform perf telemetry (CPU / GPU / RAM / temp).
- Delve mode: critic-pass on assistant messages.
- Workspace knowledge stacks: shared RAG corpus across sessions.
- Chat-template inspection: detect Gemma + ChatML quirks, llama.cpp chat-template fix.

**Vision / multimodal**
- `--mmproj` wired for llama.cpp vision with sibling detection + `visionEnabled` flag flip.
- `visionEnabled` flag gates image attach across all runtimes.
- mlx-vlm torchvision dep added for Qwen2.5-VL processor build.

### Cache strategies & generation quality (FU-015 → FU-021, FU-026)
- **First Block Cache** (cross-platform diffusion cache hook, registry id `fbcache`) backed by `diffusers.hooks.apply_first_block_cache`. Applies to image + video DiTs (FLUX, SD3.5, Wan2.1/2.2, HunyuanVideo, LTX-Video, CogVideoX, Mochi). Default threshold 0.12 (≈1.8× speedup on FLUX.1-dev with imperceptible drift). Closes the FU-007 Wan TeaCache deferral by replacing per-model vendoring with a model-agnostic hook.
- **TaylorSeer / MagCache / PyramidAttentionBroadcast / FasterCache** strategies wired against the diffusers 0.38 native `enable_cache(<Config>)` API (registry ids `taylorseer`, `magcache`, `pab`, `fastercache`). MagCache is FLUX-only without calibration UX; other DiTs raise a "calibration required" message.
Expand Down