Add continuous batching support for Qwen3 TTS to the server by lucasnewman · Pull Request #674 · Blaizzy/mlx-audio

lucasnewman · 2026-04-23T23:26:37Z

This implements vLLM Omni-style stepwise continuous batching for the Qwen3 TTS model (which is currently the only model we already have batch support for). I get nice parallelism when using 8 concurrent requests across 30 requests sent 300ms apart on my M5 Max:

Throughput
  Requests: 30/30 succeeded
  Client wall time: 21.70s
  Service window: 21.70s
  Requests/sec: 1.38
  Input chars/sec: 115.0
  Response bytes/sec: 378.02 KiB/s
  Audio throughput: 8.06x realtime (174.96s audio)
  End-to-end RTF: 0.12
  Parallelism factor: 7.07x

Latency
  min/mean/median/p95/max: 2.94s / 5.11s / 5.30s / 5.99s / 6.11s

Audio
  Decoded files: 30
  Sample rates: 24000
  Total samples: 4199040

It currently doesn't support streaming since the patch is already somewhat large and I want to stage in the complexity, but I'll look at both that and adding batch support for other models as follow-ons.

Add continuous batching support for Qwen3 TTS.

9666a4e

lucasnewman requested a review from Blaizzy April 23, 2026 23:26

Merge branch 'main' into qwen3-continuous-batching

5f64c41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add continuous batching support for Qwen3 TTS to the server#674

Add continuous batching support for Qwen3 TTS to the server#674
lucasnewman wants to merge 2 commits intoBlaizzy:mainfrom
lucasnewman:qwen3-continuous-batching

lucasnewman commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lucasnewman commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lucasnewman commented Apr 23, 2026 •

edited

Loading