Skip to content

Releases: cryptopoly/ChaosEngineAI

ChaosEngineAI v0.7.4

06 May 18:47
cfa1d53

Choose a tag to compare

v0.7.4 — chat uplift + image/video gen polish

Chat experience (the headline)

Phase 1 — UX foundations

  • Syntax highlighting in code blocks, in-thread search, conversation export, real cancel (mid-stream abort), reasoning-effort levels.
  • Reasoning panel: collapsible streaming preview, fixed first-paragraph gap.

Phase 2.0 — perf surface

  • Prompt-processing feedback + TTFT (time-to-first-token) live indicator.
  • Watchdogs: prompt-eval timeout, memory gate, runaway guards (token rate floor, repetition guard), panic + thermal banners, image/video gates that block kicking off a generation when VRAM/RAM headroom is unsafe.

Phase 2.1 — refactor

  • Decomposed monolithic ChatTab.tsx into ChatSidebar / ChatHeader / ChatThread / ChatComposer.

Phase 2.2 — sampler control

  • Full sampler exposure: top_p, top_k, min_p, repeat_penalty, seed, mirostat, reasoning_effort.
  • JSON-schema constrained-output opt-in (json_schema field).

Phase 2.4–2.5 — message-tree workflows

  • Conversation branching: fork from any assistant message into a sibling thread.
  • In-thread compare: render sibling variants side-by-side under the assistant bubble.

Phase 2.6–2.7 — context & prompts

  • Cross-platform RAG: semantic embedding via llama-embedding + cosine retrieval over local docs.
  • Prompt presets + variables: fill-form before "Use in Chat" so reusable prompts can take inputs.

Phase 2.8 — structured tool output

  • Tool call results render as table / code / markdown / image based on returned shape, not raw JSON.

Phase 2.10 — MCP client

  • Stdio JSON-RPC transport + tool adapter so any local MCP server is callable from chat. Provenance shown per tool result.

Phase 2.11–2.12 — model-aware composer

  • Typed capability declarations (vision / tools / json_schema / reasoning) surface as badges in every model picker.
  • Composer auto-gating (e.g. attach-image button hidden when active model has no vision).
  • Mid-thread model swap with one-turn override (try a different model for a single response, then revert).

Phase 2.13 — OpenAI-compatible server

  • Full sampler chain + embeddings parity. Apps that talk to /v1/chat/completions no longer lose advanced sampler params on the way through.

Phase 2.14 — catalog browser

  • VRAM-fit hints on every Discover variant card so you see at a glance what'll actually run on your machine.

Phase 3.x — substrate transparency

  • KV strategy chip in composer: per-turn cache override (native / chaosengine / rotorquant / turboquant / triattention) without touching launch settings.
  • DDTree accepted-token overlay: substrate truth view of which speculative draft tokens were accepted.
  • Logprobs viz (advanced-mode gated): per-message confidence summary, MLX logprobs streaming passthrough.
  • Substrate routing inspector: per-turn badge above the metrics row showing which engine + binary served the response.
  • Per-turn host strip: cross-platform perf telemetry (CPU / GPU / RAM / temp).
  • Delve mode: critic-pass on assistant messages.
  • Workspace knowledge stacks: shared RAG corpus across sessions.
  • Chat-template inspection: detect Gemma + ChatML quirks, llama.cpp chat-template fix.

Image generation

  • First Block Cache cross-platform diffusion cache hook (diffusers.hooks.apply_first_block_cache). Default threshold 0.12, ≈1.8× speedup on FLUX.1-dev with imperceptible drift. Replaces the per-model TeaCache vendoring deferral.
  • TaylorSeer / MagCache / PyramidAttentionBroadcast / FasterCache strategies wired against diffusers 0.38 native API.
  • SDXL VAE fp16 fix on MPS / CUDA — keeps SDXL on Apple Silicon in fp16 instead of the slow fp32 fallback.
  • Distill LoRA support — Hyper-SD-8step + Turbo-Alpha for FLUX.1-dev.
  • AYS (Align Your Steps) sampler for SD/SDXL.
  • CFG decay parity with the video runtime (opt-in cfgDecay field).
  • Live denoise thumbnails via callback_on_step_end — TAESD/TAEHV preview VAE swap decodes per-step latents into ≤192 px PNG thumbnails streamed to the UI. Handles 4D (B, C, H, W) and FLUX's packed 3D (B, seq_len, 64) shapes.
  • MLX-native LLM prompt enhancer (Apple Silicon) — mlx-community/Qwen2.5-0.5B-Instruct-4bit rewrites your prompt into the active DiT's training distribution. Per-family system prompts for FLUX / Wan / LTX / HunyuanVideo / SDXL / SD3.
  • Vision attach gatingvisionEnabled flag gates image attach across all runtimes; --mmproj wired for llama.cpp vision with sibling detection.

Video generation

  • mlx-video Wan runtime end-to-end (Apple Silicon):
    • One-shot convert pipeline for Wan-AI/Wan2.{1-T2V-1.3B,1-T2V-14B,2-TI2V-5B,2-T2V-A14B,2-I2V-A14B} — wraps python -m mlx_video.models.wan_2.convert subprocess.
    • Runtime routing through mlx_video_runtime.py with Wan-shaped CLI (--model-dir, --guide-scale, --scheduler).
    • GUI install panel under Video Discover with per-repo install buttons + live install log.
    • Live Wan2.1 MLX smoke validated: 19.6s end-to-end at 480×272, 5 frames, 4 steps.
  • Distill transformer support for Wan 2.2 A14B I2V (lightx2v 4-step, bf16 + fp8_e4m3 variants) — full transformer swap via _swap_distill_transformers.
  • STG (Spatial Temporal Guidance) slider wired through to mlx-video subprocess for LTX-2.
  • CogVideoX footprints right-sized + T5EncoderModel error diagnosed.

CUDA quantization (Windows / Linux foundation)

  • Nunchaku / SVDQuant transformer load (FLUX.1-dev + FLUX.1-schnell svdq-int4 variants). Preferred over NF4/int8wo when nunchaku>=1.2.1 is installed.
  • FP8 layerwise casting for non-FLUX DiTs (Wan / Qwen-Image / SD3 / LTX). Family-correct fp8 dtype (E5M2 for HunyuanVideo, E4M3 elsewhere). Compute capability gate refuses pre-Ada GPUs (SM <8.9).
  • NVIDIA/kvpress install action staged (kvpress>=0.5.3 registered).
  • Studio FP8 layerwise toggle in both Image + Video Studio.

Speculative decoding

  • dflash-mlx pin bump f825ffb → 8d8545d (v0.1.4.1 → v0.1.5.1) — target_ops adapter pattern, draft model quantization with Metal MMA kernels, branchless Metal kernels, fused draft KV projections.

Windows / CUDA stability

  • PowerShell ports of build-llama-turbo + build-sdcpp.
  • MSVC + CUDA detection helpers: accept VS Build Tools installs that report isComplete=0, append version= to CMAKE_GENERATOR_INSTANCE for unregistered installs, fix CUDA-integration elevated copy + invalidate stale CMake cache, auto-sync CUDA VS integration before cmake configure.
  • Install CUDA torch: self-debugging button with expandable per-attempt log + Restart prompt.
  • Windows CUDA detection fix + post-install runtime probe.
  • Preserve Windows GPU runtime on uninstall + lock extras path.
  • Video Studio dropping GPU warning now surfaces inline Install button.
  • T5 lazy-import diagnostic on generate paths (catches missing-dep failures before kicking off long generations).

Studio polish

  • Restored pre-aec1975 card layout for Image / Video Discover + My Models. Dropped the duplicate Wan panel.
  • KV cache chip filter harmonized with launch-settings modal so toggle states stay consistent across surfaces.
  • Chat cache-fit warning VRAM-aware on CUDA hosts.
  • Surfaced CPU torch on CUDA host. Raised chat default maxTokens to 4096.
  • Fixed Studio cache preview returning 0 GB on chat model selection.
  • Hide MLX-only catalog variants on non-Apple platforms.
  • Qwen 3.6 catalog entry.

Test infrastructure

  • backend_service/runtime_paths.py — append extras to sys.path instead of insert(1, ...). Repo-local adapter shims (notably turboquant_mlx) keep import authority across pytest, dev .venv, and Tauri-bundled launches. Was also a latent runtime bug masking the shim's adapter hooks after Setup → Install turboquant-mlx-full on the desktop app.

Bundles below: macOS aarch64 dmg + Linux x64 AppImage / deb + Windows x64 setup.exe. latest.json for Tauri auto-updater.

ChaosEngineAI v0.7.2

02 May 22:28
bc5c3b0

Choose a tag to compare

ChaosEngineAI v0.7.2 packages everything that landed since v0.5.2 — 133 commits over two weeks. First stable release of the video generation line, alongside major work on diffusion cache compression, the Apple Silicon mlx-video runtime, the Windows installer pipeline, and a brand-new in-app Diagnostics panel. Includes ten post-tag smoke-test fixes (PRs #21#30) that landed before the final rebuild.

Highlights

  • Video Studio — full video generation tab with model picker, prompt input, runtime status overlay, real-time progress, and cancellable generation.
  • Multi-engine video catalog — Wan 2.1 T2V (1.3B / 14B), Wan 2.2, Lightricks LTX-Video 2.0/2.0-distilled/2.3/2.3-distilled, HunyuanVideo, CogVideoX, Mochi, plus FLUX.1 for image.
  • TeaCache diffusion cache compression — five DiT families wired (FLUX, HunyuanVideo, LTX-Video, CogVideoX, Mochi) with per-model rescale coefficients.
  • In-app Diagnostics panel under Settings, with per-section error fallback and platform-aware repair actions.
  • Windows installer pipeline stabilised — embedded Python sidecar + llama-server now ship in the NSIS bundle; PowerShell 5.1 build path hardened; CUDA torch DLL lock fixed; GGUF downloads scoped to lightweight base + selected quant.
  • macOS Python framework switched to python-build-standalone (Astral) for reliable framework relocation.

Smoke-test fixes (post-tag, in this rebuild)

  • #21 — MLX-only catalog variants (FLUX.1 Dev · mflux, LTX-2 MLX entries) hidden on Linux / Windows where mlx is not installable. New backend_service/helpers/platform_filter.py + 15 unit tests.
  • #22 — Windows CUDA detection no longer reports system RAM as VRAM. Reads torch.cuda.get_device_properties(0).total_memory first; falls back to nvidia-smi; returns vram_total_gb=None when neither answers. Image-runtime probe calls importlib.invalidate_caches() so newly-installed GPU bundle packages are picked up without a full backend restart.
  • #23 — Tauri NSIS installer hook (src-tauri/installer.nsh) documents the contract that %LOCALAPPDATA%\ChaosEngineAI\extras\cp{maj}{min}\site-packages must survive uninstall + reinstall. InstallLogPanel CSS establishes a stacking context so the streaming pip output renders above the Prompt + Recent Outputs cards on Windows during a long GPU bundle install.
  • #24 — CI test reliability: imageDiscoverMemoryEstimate describe block pins navigator.userAgentData.platform to macOS so the MPS-calibrated expectations pass on Linux runners.
  • #25 — torch.cuda probe now runs in a short-lived subprocess so the backend never holds locks on torch/lib/*.dll. Without this, pip's --target install of a fresh torch fails with [WinError 5] Access is denied. InstallLogPanel background switched to opaque var(--surface) + contain: layout so it can never visually overlap sibling Prompt / Recent Outputs cards.
  • #26test_gpu.py accepts vram_total_gb=None as the no-GPU response (orphaned PR #24 fix landed direct).
  • #27 / #28 — Tauri 2.10.3 → 2.11.0; tauri-build 2.5.6 → 2.6.0 (Dependabot Cargo bumps + Windows package version alignment).
  • #29 — HunyuanVideo NF4 + 1280×720 × 33 frames CUDA estimate no longer crosses the danger ratio. Discount sliced attention to 60% of the dense fp16 8-slab estimate when CUDA + runtime override are present (real fp8-KV / attention-slicing behaviour).
  • #30test_legacy_tauri_app_data_extras_are_import_candidates patches sys.platform = "win32" so the Linux CI runner exercises the LOCALAPPDATA branch.

Video Generation

New: Video Studio

  • New Video Studio tab with model picker, prompt input, runtime status overlay, and live progress.
  • Cancellable image and video generation across the app.
  • Async library scan with persisted cache (library_cache.json) — startup no longer blocks on filesystem walks.
  • Tooltip portal — info hovers now render via a top-level portal with viewport-clamped placement.
  • Platform-aware mlx-video reinstall button in Diagnostics.
  • Auto-recover Video Studio after a backend sidecar crash.
  • Variant-specific video download status (sibling Q4/Q6/Q8 rows no longer marked active just because they share repos).
  • Cleaned-up Video Studio runtime action layout; GPU runtime installs now recover from CPU-only torch wheels.

New: Video catalog

  • Wan 2.1 T2V 1.3B / 14B (with Wan 2.2 catalog fix).
  • Lightricks LTX-Video 2.0 / 2.0-distilled / 2.3 / 2.3-distilled (via mlx-video on Apple Silicon).
  • HunyuanVideo.
  • CogVideoX.
  • Mochi.
  • FLUX.1 (image).

New: Apple Silicon mlx-video LTX-2 engine

  • LTX-2 (prince-canuma/LTX-2-{distilled,dev,2.3-distilled,2.3-dev}) routed through a subprocess engine — backend_service/mlx_video_runtime.py.
  • Spatial upscaler resolver, distilled-pipeline accounting, dist-info cleanup.
  • Real LTX-2 module path + corrected CLI flags.
  • LTX-2 MLX download false-positive fix + LTX prompt-length hint.

New: stable-diffusion.cpp engine scaffold (cross-platform)

New: LongLive scaffold (CUDA only)

  • Real-time causal long video generation for Wan 2.1 T2V 1.3B.
  • Install path — collapsible terminal panel for streaming progress, hang fix on Windows.
  • Discover Install CTA in Video Studio.

Quality + safety

  • Phase E1 — auto-enhance short video prompts with model-tuned suffixes.
  • Phase E2 — CFG decay schedule + extend prompt enhancer to LTX-2 family.
  • Three Phase E2 regressions caught on real Mac runs.
  • Tuned video safety estimator + LTX-2 subprocess error visibility.
  • Improved video gen quality across LTX / Wan / HunyuanVideo + new engine scaffolds.
  • Scale video gen safety by device memory and show Studio capacity.
  • Detect corrupt diffusers snapshots, add output-folder pickers, MPS-safe defaults.
  • In-app install for mp4 encoder deps; isolated video output dir in test harness.
  • HunyuanVideo NF4 + 1280×720 × 33 frames now lands on caution rather than danger on a 4090; long-clip danger warnings still trigger for genuinely risky configs.

Cache compression for diffusion DiTs (TeaCache)

  • TeaCache integration with vendored teacache_forward patches under cache_compression/_teacache_patches/.
  • Five model families wired: FLUX, HunyuanVideo, LTX-Video, CogVideoX, Mochi.
  • Per-model rescale coefficients pulled from upstream calibration tables.
  • Quality knob rel_l1_thresh (default 0.4).
  • TeaCache strategies are filtered out of the LLM RuntimeControls picker via the appliesTo domain field.

Memory + warm pool

  • Memory-budgeted warm pool + library-only chat picker.
  • Improved model catalog UX and memory estimates.
  • Base LLM library RAM estimate on real on-disk size.
  • Include model footprint in video gen memory estimate.
  • Fix TurboQuant install detection + warm pool memory guard.
  • Show release dates and accurate on-disk sizes across model listings.

Diagnostics + Settings UI

  • New in-app Diagnostics panel under Settings with per-section error fallback.
  • Reject load_model for models not on disk; disable chat Send until loaded.
  • Rename dashboard engine label from "No backend" to "Idle".
  • Split Settings page into logical sections with sub-navigation.
  • Drop the side-menu Settings layout, always use the tab bar.
  • Redesign the Storage section and surface resolved paths.
  • Diagnostics: catch find_spec on missing namespace + scroll body.
  • Fix Diagnostics panel scroll on tall viewports + long path wrapping.
  • Tabs-mode sidebar, SubtabBar, and Settings → Appearance toggle.
  • Group sidebar tabs into Models / Images / Benchmarks / Tools with SVG icons.

Windows pipeline

  • Ship embedded Python + llama-server in the installer — no source checkout needed at first launch.
  • Fix Windows restart deadlock.
  • PowerShell 5.1 hardening — strip non-ASCII, drop &&, avoid backslash-before-paren, strip apostrophes / ampersands.
  • Stop the video runtime probe timing out on Windows.
  • Stabilise the Windows dev/build loop and surface CUDA-vs-CPU torch detection.
  • Unblock Windows video / image studios during first-boot torch import.
  • Warn on CPU fallback and stop Video Studio stalling on Windows.
  • Stop build.ps1 dying on git checkout's success message.
  • Fix LongLive install on Windows.
  • Real CUDA VRAM detection via torch.cuda (was reporting 12 GB on a 24 GB RTX 4090).
  • Image-runtime probe re-checks importable packages after a GPU bundle install.
  • CUDA torch clobber + DLL lock during GPU bundle install fixed (subprocess probe).
  • Persist installed runtime path across backend restarts; auto-restart when required.
  • Logical install package counts reported correctly.
  • GGUF video downloads fetch only the lightweight base pipeline + the selected quantized transformer file.
  • Windows test + runtime edge cases for directory sizing, library fingerprints, path formatting, optional GGUF import failures.
  • Tauri 2.10.3 → 2.11.0; tauri-build 2.5.6 → 2.6.0 alignment.

macOS Python framework

  • Switched to python-build-standalone (Astral) so @loader_path / @rpath references are baked in — avoids the install_name_tool regression with Xcode 16.4 + actions/setup-python that previously crashed at launch with Library not loaded: Python.framework.
  • Persistent extras dir namespaced by Python ABI ...
Read more

ChaosEngineAI v0.5.2

18 Apr 16:31

Choose a tag to compare

ChaosEngineAI v0.5.2 packages everything that landed since the first v0.5.0 release, with a focus on stability, inference improvements, release hardening, and cleaner operator-facing behavior.

Highlights

  • Improved backend and engine reliability with fixes for backend load failures, auth token cache poisoning, reasoning split handling, MLX profile application, DFlash resolution, engine issues, and stale orphaned-worker notifications.
  • Expanded inference functionality with a CLI inference runner, broader inference and model updates, benchmark model-selection and caching fixes, and additional inference test coverage.
  • Hardened release and tooling workflows with cross-platform updater artifact fixes, stronger sidecar/local auth surfaces, Image Studio installed-model filtering, manual-dispatch CI build gating, documentation/discovery updates, and release workflow improvements.

Commit Summary Since v0.5.0

  • 1d50fb5 Initial release.
  • 0e63748 Fixed reasoning split, MLX profile application, and DFlash resolution.
  • 245cddf Applied broader binary-related fixes and cleanup.
  • 6f0a158 Fixed engine issues.
  • 8138b18 Added inference tests.
  • b030283 Added the CLI inference runner.
  • b39c0bf Improved inference flows and related tests.
  • e9039fc Updated models.
  • b70a10b Fixed test modules.
  • f3d2c3e Fixed benchmark model selection, caching strategies, and DFlash behavior.
  • d4c00b8 Improved discovery behavior and updated the README.
  • db54bd4 Addressed the Copilot licensing/auth error path.
  • 7eef544 Hardened sidecar auth and local tool surfaces.
  • a189863 Hardened the release workflow during the v0.5.1 bump.
  • a9d619a Fixed cross-platform updater release artifacts.
  • 3100386 Bumped to v0.5.2 and fixed backend load failures plus auth token cache poisoning.
  • 78a729c Limited Image Studio to installed models and made the CI build manual-dispatch only.
  • d041efa Stopped stale or false-positive orphaned-worker notifications.

ChaosEngineAI Launch v0.5.0

15 Apr 19:21

Choose a tag to compare

Welcome to ChaosEngineAI