Optimize ACE-Step: NLC VAE, compiled decode, LoRA support by fspecii · Pull Request #498 · Blaizzy/mlx-audio

fspecii · 2026-02-15T20:21:19Z

Summary

VAE rewrite to NLC format: Replace WeightNormConv1d/WeightNormConvTranspose1d with nn.Conv1d and FastConvTranspose1d. Weight-norm parameters (weight_g, weight_v) are fused into regular weights at load time via sanitize(), eliminating per-forward-pass normalization overhead.
Compiled VAE decode: mx.compile(model.vae.decode) with auto-conversion to mlx_weights.safetensors on first load (skips PT→MLX conversion on subsequent runs).
mx.fast.scaled_dot_product_attention: Replaces manual attention (matmul → mask → softmax → matmul) with MLX's fused kernel.
Simplified turbo diffusion: Single-pass inference without CFG/APG (turbo model was distilled without guidance). Removes ~80 lines of unused guidance code.
LoRA adapter support: load_lora() / unload_lora() with weight fusion (W + scale * (alpha/r) * B @ A), base weight backup/restore for hot-swapping adapters.
Quantized 5Hz LM variants: Added 0.6B-8bit and 0.6B-4bit model IDs for lower-memory language model inference.
Music metadata: bpm, keyscale, timesignature parameters forwarded to prompt formatting.
Model loading: custom_loading class attribute + acestep remapping for clean integration with mlx_audio.utils.base_load_model.

Test plan

Generate 30s instrumental with --model ACE-Step/ACE-Step1.5 — verified output WAV
Generate 30s track with lyrics and vocal_language param
Generate with bpm/keyscale/timesignature metadata params
Verify mlx_audio.tts.load() API path works with custom_loading
Generate 60s track — verified correct duration and stereo output
Verify other TTS models (Kokoro, Spark) unaffected by utils changes
Test with LoRA adapter loading/unloading
Test quantized LM variants (0.6B-8bit, 0.6B-4bit)
Test audio-to-audio tasks (cover, extract, complete)

- Rewrite VAE to native NLC format with nn.Conv1d and FastConvTranspose1d, fusing weight-norm (g*v/||v||) at load time instead of every forward pass - Replace manual attention with mx.fast.scaled_dot_product_attention - Simplify turbo diffusion to single-pass (no CFG/APG) matching upstream behavior - Add compiled VAE decode (mx.compile) with auto-conversion to mlx_weights.safetensors - Add LoRA adapter support (load/unload with weight fusion) - Add quantized 5Hz LM variants (0.6B-8bit, 0.6B-4bit) - Add music metadata params (bpm, keyscale, timesignature) - Add acestep model remapping and custom_loading support in utils

lucasnewman · 2026-02-23T16:47:33Z

I don't see the weight norm conv changes -- did those get excluded? Also, it looks like you removed cfg entirely? Was it not needed even for existing models? If you could explain the intent/goal of the changes vs just the mechanical pieces, it would be helpful context.

Blaizzy · 2026-02-23T16:50:23Z

@lucasnewman

This PR will be closed in favour of #499.

I will add @fspecii as a contributor there.

fspecii force-pushed the pc/add-ace branch from be85471 to 51fb527 Compare February 15, 2026 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize ACE-Step: NLC VAE, compiled decode, LoRA support#498

Optimize ACE-Step: NLC VAE, compiled decode, LoRA support#498
fspecii wants to merge 1 commit intoBlaizzy:pc/add-acefrom
fspecii:pc/add-ace

fspecii commented Feb 15, 2026 •

edited

Loading

Uh oh!

lucasnewman commented Feb 23, 2026

Uh oh!

Blaizzy commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fspecii commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

lucasnewman commented Feb 23, 2026

Uh oh!

Blaizzy commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fspecii commented Feb 15, 2026 •

edited

Loading