feat(tts): Add batched ICL generation to improve speed for Qwen3-TTS by HFrost0 · Pull Request #644 · Blaizzy/mlx-audio

HFrost0 · 2026-04-10T20:05:02Z

Context

This PR addresses slow generation speeds when doing voice cloning (ICL) on long texts for Qwen3-TTS.

Previously, handling long text meant choosing between two bad workarounds:

Passing everything at once: Often resulted in OOMs or model hallucinations as the context grew too long.
Splitting the text externally: Running generate sequentially for each chunk avoided OOMs, but was extremely slow because the heavily-parameterized reference audio and speaker embeddings had to be recalculated for every chunk.

Description

This PR introduces batched ICL generation to fix these issues natively:

Batched Parallel Generation:
The text is now split internally via split_pattern and evaluated in parallel. The reference audio features (ICLSharedEmbeddings) are computed exactly once and shared across the batch using left-padding. Each segment's decode follows the same ref-code-prepend + proportional-trim approach as single-segment ICL, ensuring consistent acoustic quality. This completely removes the redundant decoding overhead and provides a massive speedup.

Changes in the codebase

Extracted reference audio processing into _prepare_icl_shared_context and created ICLSharedEmbeddings to manage shared parameters.
Implemented _batch_generate_icl for executing parallel batched generations.
Updated generate() to split text and route multi-segment ICL to the batch path, while preserving single-segment streaming support.

Changes outside the codebase

None

Additional information - Benchmark

A quick speed test on an M3 Max doing voice cloning for an 8-segment long text (~225 Chinese characters):

Generation Mode	Latency
Sequential (Old)	~47.5 s
Batched (This PR)	~12.4 s

Speedup: 3.8× (results are M3 Max specific; other Apple Silicon may vary)

⚠️ Behavioral Change — Feedback Requested

On main, the ICL path ignores split_pattern entirely — it always passes the full text to _generate_icl as a single segment. This PR changes that: ICL now respects split_pattern (default "\n"), which means:

Multi-segment output: If the input text contains \n, it will be split and processed in batch. Users who previously received a single GenerationResult will now receive N results. Downstream code that assumes a single result may need updating.
Streaming silently degrades: When stream=True is combined with multi-segment ICL, streaming is not supported. A warning is emitted, but the user gets non-streaming batch output instead of the real-time chunks they might expect.

Options to discuss:

(A) Keep current behavior (this PR): ICL always respects split_pattern. Simple, consistent, and fast — but a breaking change for users with newlines in text.
(B) Only split when explicitly requested: Default to no splitting for ICL (e.g. treat split_pattern as non-applicable for ICL unless some flag is set). Fully backward compatible but requires an API change to opt into batching.
(C) Fallback for streaming: When stream=True with multiple segments, fall back to sequential _generate_icl per segment (preserves streaming, loses batch speedup). Batch only when stream=False.

Happy to adjust based on your preference.

Checklist

Tests added/updated
Documentation updated
Issue referenced (e.g., "Closes #...")

…arallel reference audio encoding and batched input embedding construction.

feat: add support for batch ICL generation in Qwen3-TTS by enabling p…

85313d0

…arallel reference audio encoding and batched input embedding construction.

HFrost0 changed the title ~~feat(tts): Add batched ICL generation to improve speed and align decoding logic with official Qwen3-TTS~~ feat(tts): Add batched ICL generation to improve speed for Qwen3-TTS Apr 10, 2026

HFrost0 marked this pull request as draft April 11, 2026 03:13

Merge branch 'main' into feat_qwen3tts_batch_icl2

2084510

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tts): Add batched ICL generation to improve speed for Qwen3-TTS#644

feat(tts): Add batched ICL generation to improve speed for Qwen3-TTS#644
HFrost0 wants to merge 2 commits intoBlaizzy:mainfrom
HFrost0:feat_qwen3tts_batch_icl2

HFrost0 commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

HFrost0 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information - Benchmark

⚠️ Behavioral Change — Feedback Requested

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HFrost0 commented Apr 10, 2026 •

edited

Loading