From 0aaabc03eb043242dcf0ac0d035f8d91c1d0a3f3 Mon Sep 17 00:00:00 2001 From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com> Date: Wed, 6 May 2026 20:16:33 +0100 Subject: [PATCH] CHANGELOG v0.7.4: add chat uplift + multimodal sections The original v0.7.4 entry focused on FU-### IDs and missed the chat uplift work that was actually the headline of the release - Phases 1 through 3.x landed via PR #32 and shipped a major chat experience overhaul (sampler exposure, conversation branching, in-thread compare, RAG, MCP client, capability badges, mid-thread model swap, OpenAI- compatible server, KV strategy chip, substrate routing inspector, Delve mode, workspace knowledge stacks). Backfill those sections at the top of the v0.7.4 entry so the changelog matches the GitHub release notes. Also document the vision / multimodal work (mmproj wiring, vision flag gating across runtimes, mlx-vlm torchvision dep). --- CHANGELOG.md | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9d94f23..5ef7acb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,63 @@ ## v0.7.4 - 2026-05-06 +### Chat experience (the headline) + +**Phase 1 — UX foundations** +- Syntax highlighting in code blocks, in-thread search, conversation export, real cancel (mid-stream abort), reasoning-effort levels. +- Reasoning panel: collapsible streaming preview, fixed first-paragraph gap. + +**Phase 2.0 — perf surface + watchdogs** +- Prompt-processing feedback + TTFT (time-to-first-token) live indicator. +- Prompt-eval timeout, memory gate, runaway guards (token rate floor, repetition guard), panic + thermal banners, image/video gates that block kicking off a generation when VRAM/RAM headroom is unsafe. + +**Phase 2.1 — refactor** +- Decomposed monolithic `ChatTab.tsx` into `ChatSidebar` / `ChatHeader` / `ChatThread` / `ChatComposer`. + +**Phase 2.2 — sampler control** +- Full sampler exposure: `top_p`, `top_k`, `min_p`, `repeat_penalty`, `seed`, `mirostat`, `reasoning_effort`. +- JSON-schema constrained-output opt-in (`json_schema` field). + +**Phase 2.4 / 2.5 — message-tree workflows** +- Conversation branching: fork from any assistant message into a sibling thread. +- In-thread compare: render sibling variants side-by-side under the assistant bubble. + +**Phase 2.6 / 2.7 — context & prompts** +- Cross-platform RAG: semantic embedding via `llama-embedding` + cosine retrieval over local docs. +- Prompt presets + variables: fill-form before "Use in Chat" so reusable prompts can take inputs. + +**Phase 2.8 — structured tool output** +- Tool call results render as table / code / markdown / image based on returned shape, not raw JSON. + +**Phase 2.10 — MCP client** +- Stdio JSON-RPC transport + tool adapter so any local MCP server is callable from chat. Provenance shown per tool result. + +**Phase 2.11 / 2.12 — model-aware composer** +- Typed capability declarations (vision / tools / json_schema / reasoning) surface as badges in every model picker. +- Composer auto-gating (e.g. attach-image button hidden when active model has no vision). +- Mid-thread model swap with one-turn override (try a different model for a single response, then revert). + +**Phase 2.13 — OpenAI-compatible server** +- Full sampler chain + embeddings parity. Apps that talk to `/v1/chat/completions` no longer lose advanced sampler params on the way through. + +**Phase 2.14 — catalog browser** +- VRAM-fit hints on every Discover variant card so you see at a glance what'll actually run on your machine. + +**Phase 3.x — substrate transparency** +- KV strategy chip in composer: per-turn cache override (native / chaosengine / rotorquant / turboquant / triattention) without touching launch settings. +- DDTree accepted-token overlay: substrate truth view of which speculative draft tokens were accepted. +- Logprobs viz (advanced-mode gated): per-message confidence summary, MLX logprobs streaming passthrough. +- Substrate routing inspector: per-turn badge above the metrics row showing which engine + binary served the response. +- Per-turn host strip: cross-platform perf telemetry (CPU / GPU / RAM / temp). +- Delve mode: critic-pass on assistant messages. +- Workspace knowledge stacks: shared RAG corpus across sessions. +- Chat-template inspection: detect Gemma + ChatML quirks, llama.cpp chat-template fix. + +**Vision / multimodal** +- `--mmproj` wired for llama.cpp vision with sibling detection + `visionEnabled` flag flip. +- `visionEnabled` flag gates image attach across all runtimes. +- mlx-vlm torchvision dep added for Qwen2.5-VL processor build. + ### Cache strategies & generation quality (FU-015 → FU-021, FU-026) - **First Block Cache** (cross-platform diffusion cache hook, registry id `fbcache`) backed by `diffusers.hooks.apply_first_block_cache`. Applies to image + video DiTs (FLUX, SD3.5, Wan2.1/2.2, HunyuanVideo, LTX-Video, CogVideoX, Mochi). Default threshold 0.12 (≈1.8× speedup on FLUX.1-dev with imperceptible drift). Closes the FU-007 Wan TeaCache deferral by replacing per-model vendoring with a model-agnostic hook. - **TaylorSeer / MagCache / PyramidAttentionBroadcast / FasterCache** strategies wired against the diffusers 0.38 native `enable_cache()` API (registry ids `taylorseer`, `magcache`, `pab`, `fastercache`). MagCache is FLUX-only without calibration UX; other DiTs raise a "calibration required" message.