Releases: cryptopoly/ChaosEngineAI
ChaosEngineAI v0.7.4
v0.7.4 — chat uplift + image/video gen polish
Chat experience (the headline)
Phase 1 — UX foundations
- Syntax highlighting in code blocks, in-thread search, conversation export, real cancel (mid-stream abort), reasoning-effort levels.
- Reasoning panel: collapsible streaming preview, fixed first-paragraph gap.
Phase 2.0 — perf surface
- Prompt-processing feedback + TTFT (time-to-first-token) live indicator.
- Watchdogs: prompt-eval timeout, memory gate, runaway guards (token rate floor, repetition guard), panic + thermal banners, image/video gates that block kicking off a generation when VRAM/RAM headroom is unsafe.
Phase 2.1 — refactor
- Decomposed monolithic
ChatTab.tsxintoChatSidebar/ChatHeader/ChatThread/ChatComposer.
Phase 2.2 — sampler control
- Full sampler exposure:
top_p,top_k,min_p,repeat_penalty,seed,mirostat,reasoning_effort. - JSON-schema constrained-output opt-in (
json_schemafield).
Phase 2.4–2.5 — message-tree workflows
- Conversation branching: fork from any assistant message into a sibling thread.
- In-thread compare: render sibling variants side-by-side under the assistant bubble.
Phase 2.6–2.7 — context & prompts
- Cross-platform RAG: semantic embedding via
llama-embedding+ cosine retrieval over local docs. - Prompt presets + variables: fill-form before "Use in Chat" so reusable prompts can take inputs.
Phase 2.8 — structured tool output
- Tool call results render as table / code / markdown / image based on returned shape, not raw JSON.
Phase 2.10 — MCP client
- Stdio JSON-RPC transport + tool adapter so any local MCP server is callable from chat. Provenance shown per tool result.
Phase 2.11–2.12 — model-aware composer
- Typed capability declarations (vision / tools / json_schema / reasoning) surface as badges in every model picker.
- Composer auto-gating (e.g. attach-image button hidden when active model has no vision).
- Mid-thread model swap with one-turn override (try a different model for a single response, then revert).
Phase 2.13 — OpenAI-compatible server
- Full sampler chain + embeddings parity. Apps that talk to
/v1/chat/completionsno longer lose advanced sampler params on the way through.
Phase 2.14 — catalog browser
- VRAM-fit hints on every Discover variant card so you see at a glance what'll actually run on your machine.
Phase 3.x — substrate transparency
- KV strategy chip in composer: per-turn cache override (native / chaosengine / rotorquant / turboquant / triattention) without touching launch settings.
- DDTree accepted-token overlay: substrate truth view of which speculative draft tokens were accepted.
- Logprobs viz (advanced-mode gated): per-message confidence summary, MLX logprobs streaming passthrough.
- Substrate routing inspector: per-turn badge above the metrics row showing which engine + binary served the response.
- Per-turn host strip: cross-platform perf telemetry (CPU / GPU / RAM / temp).
- Delve mode: critic-pass on assistant messages.
- Workspace knowledge stacks: shared RAG corpus across sessions.
- Chat-template inspection: detect Gemma + ChatML quirks, llama.cpp chat-template fix.
Image generation
- First Block Cache cross-platform diffusion cache hook (
diffusers.hooks.apply_first_block_cache). Default threshold 0.12, ≈1.8× speedup on FLUX.1-dev with imperceptible drift. Replaces the per-model TeaCache vendoring deferral. - TaylorSeer / MagCache / PyramidAttentionBroadcast / FasterCache strategies wired against diffusers 0.38 native API.
- SDXL VAE fp16 fix on MPS / CUDA — keeps SDXL on Apple Silicon in fp16 instead of the slow fp32 fallback.
- Distill LoRA support — Hyper-SD-8step + Turbo-Alpha for FLUX.1-dev.
- AYS (Align Your Steps) sampler for SD/SDXL.
- CFG decay parity with the video runtime (opt-in
cfgDecayfield). - Live denoise thumbnails via
callback_on_step_end— TAESD/TAEHV preview VAE swap decodes per-step latents into ≤192 px PNG thumbnails streamed to the UI. Handles 4D(B, C, H, W)and FLUX's packed 3D(B, seq_len, 64)shapes. - MLX-native LLM prompt enhancer (Apple Silicon) —
mlx-community/Qwen2.5-0.5B-Instruct-4bitrewrites your prompt into the active DiT's training distribution. Per-family system prompts for FLUX / Wan / LTX / HunyuanVideo / SDXL / SD3. - Vision attach gating —
visionEnabledflag gates image attach across all runtimes;--mmprojwired for llama.cpp vision with sibling detection.
Video generation
- mlx-video Wan runtime end-to-end (Apple Silicon):
- One-shot convert pipeline for
Wan-AI/Wan2.{1-T2V-1.3B,1-T2V-14B,2-TI2V-5B,2-T2V-A14B,2-I2V-A14B}— wrapspython -m mlx_video.models.wan_2.convertsubprocess. - Runtime routing through
mlx_video_runtime.pywith Wan-shaped CLI (--model-dir,--guide-scale,--scheduler). - GUI install panel under Video Discover with per-repo install buttons + live install log.
- Live Wan2.1 MLX smoke validated: 19.6s end-to-end at 480×272, 5 frames, 4 steps.
- One-shot convert pipeline for
- Distill transformer support for Wan 2.2 A14B I2V (lightx2v 4-step, bf16 + fp8_e4m3 variants) — full transformer swap via
_swap_distill_transformers. - STG (Spatial Temporal Guidance) slider wired through to mlx-video subprocess for LTX-2.
- CogVideoX footprints right-sized + T5EncoderModel error diagnosed.
CUDA quantization (Windows / Linux foundation)
- Nunchaku / SVDQuant transformer load (FLUX.1-dev + FLUX.1-schnell svdq-int4 variants). Preferred over NF4/int8wo when
nunchaku>=1.2.1is installed. - FP8 layerwise casting for non-FLUX DiTs (Wan / Qwen-Image / SD3 / LTX). Family-correct fp8 dtype (E5M2 for HunyuanVideo, E4M3 elsewhere). Compute capability gate refuses pre-Ada GPUs (SM <8.9).
- NVIDIA/kvpress install action staged (
kvpress>=0.5.3registered). - Studio FP8 layerwise toggle in both Image + Video Studio.
Speculative decoding
- dflash-mlx pin bump f825ffb → 8d8545d (v0.1.4.1 → v0.1.5.1) —
target_opsadapter pattern, draft model quantization with Metal MMA kernels, branchless Metal kernels, fused draft KV projections.
Windows / CUDA stability
- PowerShell ports of
build-llama-turbo+build-sdcpp. - MSVC + CUDA detection helpers: accept VS Build Tools installs that report
isComplete=0, appendversion=toCMAKE_GENERATOR_INSTANCEfor unregistered installs, fix CUDA-integration elevated copy + invalidate stale CMake cache, auto-sync CUDA VS integration before cmake configure. - Install CUDA torch: self-debugging button with expandable per-attempt log + Restart prompt.
- Windows CUDA detection fix + post-install runtime probe.
- Preserve Windows GPU runtime on uninstall + lock extras path.
- Video Studio dropping GPU warning now surfaces inline Install button.
- T5 lazy-import diagnostic on generate paths (catches missing-dep failures before kicking off long generations).
Studio polish
- Restored pre-aec1975 card layout for Image / Video Discover + My Models. Dropped the duplicate Wan panel.
- KV cache chip filter harmonized with launch-settings modal so toggle states stay consistent across surfaces.
- Chat cache-fit warning VRAM-aware on CUDA hosts.
- Surfaced CPU torch on CUDA host. Raised chat default
maxTokensto 4096. - Fixed Studio cache preview returning 0 GB on chat model selection.
- Hide MLX-only catalog variants on non-Apple platforms.
- Qwen 3.6 catalog entry.
Test infrastructure
backend_service/runtime_paths.py— append extras tosys.pathinstead ofinsert(1, ...). Repo-local adapter shims (notablyturboquant_mlx) keep import authority across pytest, dev.venv, and Tauri-bundled launches. Was also a latent runtime bug masking the shim's adapter hooks after Setup → Install turboquant-mlx-full on the desktop app.
Bundles below: macOS aarch64 dmg + Linux x64 AppImage / deb + Windows x64 setup.exe. latest.json for Tauri auto-updater.
ChaosEngineAI v0.7.2
ChaosEngineAI v0.7.2 packages everything that landed since v0.5.2 — 133 commits over two weeks. First stable release of the video generation line, alongside major work on diffusion cache compression, the Apple Silicon mlx-video runtime, the Windows installer pipeline, and a brand-new in-app Diagnostics panel. Includes ten post-tag smoke-test fixes (PRs #21–#30) that landed before the final rebuild.
Highlights
- Video Studio — full video generation tab with model picker, prompt input, runtime status overlay, real-time progress, and cancellable generation.
- Multi-engine video catalog — Wan 2.1 T2V (1.3B / 14B), Wan 2.2, Lightricks LTX-Video 2.0/2.0-distilled/2.3/2.3-distilled, HunyuanVideo, CogVideoX, Mochi, plus FLUX.1 for image.
- TeaCache diffusion cache compression — five DiT families wired (FLUX, HunyuanVideo, LTX-Video, CogVideoX, Mochi) with per-model rescale coefficients.
- In-app Diagnostics panel under Settings, with per-section error fallback and platform-aware repair actions.
- Windows installer pipeline stabilised — embedded Python sidecar + llama-server now ship in the NSIS bundle; PowerShell 5.1 build path hardened; CUDA torch DLL lock fixed; GGUF downloads scoped to lightweight base + selected quant.
- macOS Python framework switched to python-build-standalone (Astral) for reliable framework relocation.
Smoke-test fixes (post-tag, in this rebuild)
- #21 — MLX-only catalog variants (
FLUX.1 Dev · mflux, LTX-2 MLX entries) hidden on Linux / Windows where mlx is not installable. Newbackend_service/helpers/platform_filter.py+ 15 unit tests. - #22 — Windows CUDA detection no longer reports system RAM as VRAM. Reads
torch.cuda.get_device_properties(0).total_memoryfirst; falls back tonvidia-smi; returnsvram_total_gb=Nonewhen neither answers. Image-runtime probe callsimportlib.invalidate_caches()so newly-installed GPU bundle packages are picked up without a full backend restart. - #23 — Tauri NSIS installer hook (
src-tauri/installer.nsh) documents the contract that%LOCALAPPDATA%\ChaosEngineAI\extras\cp{maj}{min}\site-packagesmust survive uninstall + reinstall. InstallLogPanel CSS establishes a stacking context so the streaming pip output renders above the Prompt + Recent Outputs cards on Windows during a long GPU bundle install. - #24 — CI test reliability:
imageDiscoverMemoryEstimatedescribe block pinsnavigator.userAgentData.platformtomacOSso the MPS-calibrated expectations pass on Linux runners. - #25 — torch.cuda probe now runs in a short-lived subprocess so the backend never holds locks on
torch/lib/*.dll. Without this, pip's--targetinstall of a fresh torch fails with[WinError 5] Access is denied. InstallLogPanel background switched to opaquevar(--surface)+contain: layoutso it can never visually overlap sibling Prompt / Recent Outputs cards. - #26 —
test_gpu.pyacceptsvram_total_gb=Noneas the no-GPU response (orphaned PR #24 fix landed direct). - #27 / #28 — Tauri 2.10.3 → 2.11.0; tauri-build 2.5.6 → 2.6.0 (Dependabot Cargo bumps + Windows package version alignment).
- #29 — HunyuanVideo NF4 + 1280×720 × 33 frames CUDA estimate no longer crosses the danger ratio. Discount sliced attention to 60% of the dense fp16 8-slab estimate when CUDA + runtime override are present (real fp8-KV / attention-slicing behaviour).
- #30 —
test_legacy_tauri_app_data_extras_are_import_candidatespatchessys.platform = "win32"so the Linux CI runner exercises theLOCALAPPDATAbranch.
Video Generation
New: Video Studio
- New Video Studio tab with model picker, prompt input, runtime status overlay, and live progress.
- Cancellable image and video generation across the app.
- Async library scan with persisted cache (
library_cache.json) — startup no longer blocks on filesystem walks. - Tooltip portal — info hovers now render via a top-level portal with viewport-clamped placement.
- Platform-aware mlx-video reinstall button in Diagnostics.
- Auto-recover Video Studio after a backend sidecar crash.
- Variant-specific video download status (sibling Q4/Q6/Q8 rows no longer marked active just because they share repos).
- Cleaned-up Video Studio runtime action layout; GPU runtime installs now recover from CPU-only torch wheels.
New: Video catalog
- Wan 2.1 T2V 1.3B / 14B (with Wan 2.2 catalog fix).
- Lightricks LTX-Video 2.0 / 2.0-distilled / 2.3 / 2.3-distilled (via mlx-video on Apple Silicon).
- HunyuanVideo.
- CogVideoX.
- Mochi.
- FLUX.1 (image).
New: Apple Silicon mlx-video LTX-2 engine
- LTX-2 (
prince-canuma/LTX-2-{distilled,dev,2.3-distilled,2.3-dev}) routed through a subprocess engine —backend_service/mlx_video_runtime.py. - Spatial upscaler resolver, distilled-pipeline accounting,
dist-infocleanup. - Real LTX-2 module path + corrected CLI flags.
- LTX-2 MLX download false-positive fix + LTX prompt-length hint.
New: stable-diffusion.cpp engine scaffold (cross-platform)
- Binary staging in
scripts/stage-runtime.mjs; path resolution insrc-tauri/src/lib.rs(resolve_sd_cpp+CHAOSENGINE_SDCPP_BIN_DIR). - Engine class
SdCppVideoEngineinbackend_service/sdcpp_video_runtime.py. - Manager exposes
sdcpp_video_capabilities()so Setup / Studio can surface staging state.
New: LongLive scaffold (CUDA only)
- Real-time causal long video generation for Wan 2.1 T2V 1.3B.
- Install path — collapsible terminal panel for streaming progress, hang fix on Windows.
- Discover Install CTA in Video Studio.
Quality + safety
- Phase E1 — auto-enhance short video prompts with model-tuned suffixes.
- Phase E2 — CFG decay schedule + extend prompt enhancer to LTX-2 family.
- Three Phase E2 regressions caught on real Mac runs.
- Tuned video safety estimator + LTX-2 subprocess error visibility.
- Improved video gen quality across LTX / Wan / HunyuanVideo + new engine scaffolds.
- Scale video gen safety by device memory and show Studio capacity.
- Detect corrupt diffusers snapshots, add output-folder pickers, MPS-safe defaults.
- In-app install for mp4 encoder deps; isolated video output dir in test harness.
- HunyuanVideo NF4 + 1280×720 × 33 frames now lands on caution rather than danger on a 4090; long-clip danger warnings still trigger for genuinely risky configs.
Cache compression for diffusion DiTs (TeaCache)
- TeaCache integration with vendored
teacache_forwardpatches undercache_compression/_teacache_patches/. - Five model families wired: FLUX, HunyuanVideo, LTX-Video, CogVideoX, Mochi.
- Per-model rescale coefficients pulled from upstream calibration tables.
- Quality knob
rel_l1_thresh(default 0.4). - TeaCache strategies are filtered out of the LLM RuntimeControls picker via the
appliesTodomain field.
Memory + warm pool
- Memory-budgeted warm pool + library-only chat picker.
- Improved model catalog UX and memory estimates.
- Base LLM library RAM estimate on real on-disk size.
- Include model footprint in video gen memory estimate.
- Fix TurboQuant install detection + warm pool memory guard.
- Show release dates and accurate on-disk sizes across model listings.
Diagnostics + Settings UI
- New in-app Diagnostics panel under Settings with per-section error fallback.
- Reject
load_modelfor models not on disk; disable chat Send until loaded. - Rename dashboard engine label from "No backend" to "Idle".
- Split Settings page into logical sections with sub-navigation.
- Drop the side-menu Settings layout, always use the tab bar.
- Redesign the Storage section and surface resolved paths.
- Diagnostics: catch
find_specon missing namespace + scroll body. - Fix Diagnostics panel scroll on tall viewports + long path wrapping.
- Tabs-mode sidebar, SubtabBar, and Settings → Appearance toggle.
- Group sidebar tabs into Models / Images / Benchmarks / Tools with SVG icons.
Windows pipeline
- Ship embedded Python + llama-server in the installer — no source checkout needed at first launch.
- Fix Windows restart deadlock.
- PowerShell 5.1 hardening — strip non-ASCII, drop
&&, avoid backslash-before-paren, strip apostrophes / ampersands. - Stop the video runtime probe timing out on Windows.
- Stabilise the Windows dev/build loop and surface CUDA-vs-CPU torch detection.
- Unblock Windows video / image studios during first-boot torch import.
- Warn on CPU fallback and stop Video Studio stalling on Windows.
- Stop
build.ps1dying ongit checkout's success message. - Fix LongLive install on Windows.
- Real CUDA VRAM detection via
torch.cuda(was reporting 12 GB on a 24 GB RTX 4090). - Image-runtime probe re-checks importable packages after a GPU bundle install.
- CUDA torch clobber + DLL lock during GPU bundle install fixed (subprocess probe).
- Persist installed runtime path across backend restarts; auto-restart when required.
- Logical install package counts reported correctly.
- GGUF video downloads fetch only the lightweight base pipeline + the selected quantized transformer file.
- Windows test + runtime edge cases for directory sizing, library fingerprints, path formatting, optional GGUF import failures.
- Tauri 2.10.3 → 2.11.0; tauri-build 2.5.6 → 2.6.0 alignment.
macOS Python framework
- Switched to python-build-standalone (Astral) so
@loader_path/@rpathreferences are baked in — avoids theinstall_name_toolregression with Xcode 16.4 +actions/setup-pythonthat previously crashed at launch withLibrary not loaded: Python.framework. - Persistent extras dir namespaced by Python ABI ...
ChaosEngineAI v0.5.2
ChaosEngineAI v0.5.2 packages everything that landed since the first v0.5.0 release, with a focus on stability, inference improvements, release hardening, and cleaner operator-facing behavior.
Highlights
- Improved backend and engine reliability with fixes for backend load failures, auth token cache poisoning, reasoning split handling, MLX profile application, DFlash resolution, engine issues, and stale orphaned-worker notifications.
- Expanded inference functionality with a CLI inference runner, broader inference and model updates, benchmark model-selection and caching fixes, and additional inference test coverage.
- Hardened release and tooling workflows with cross-platform updater artifact fixes, stronger sidecar/local auth surfaces, Image Studio installed-model filtering, manual-dispatch CI build gating, documentation/discovery updates, and release workflow improvements.
Commit Summary Since v0.5.0
1d50fb5Initial release.0e63748Fixed reasoning split, MLX profile application, and DFlash resolution.245cddfApplied broader binary-related fixes and cleanup.6f0a158Fixed engine issues.8138b18Added inference tests.b030283Added the CLI inference runner.b39c0bfImproved inference flows and related tests.e9039fcUpdated models.b70a10bFixed test modules.f3d2c3eFixed benchmark model selection, caching strategies, and DFlash behavior.d4c00b8Improved discovery behavior and updated the README.db54bd4Addressed the Copilot licensing/auth error path.7eef544Hardened sidecar auth and local tool surfaces.a189863Hardened the release workflow during thev0.5.1bump.a9d619aFixed cross-platform updater release artifacts.3100386Bumped tov0.5.2and fixed backend load failures plus auth token cache poisoning.78a729cLimited Image Studio to installed models and made the CI build manual-dispatch only.d041efaStopped stale or false-positive orphaned-worker notifications.
ChaosEngineAI Launch v0.5.0
Welcome to ChaosEngineAI