Skip to content

Add continuous batching support for Qwen3 TTS to the server#674

Open
lucasnewman wants to merge 2 commits intoBlaizzy:mainfrom
lucasnewman:qwen3-continuous-batching
Open

Add continuous batching support for Qwen3 TTS to the server#674
lucasnewman wants to merge 2 commits intoBlaizzy:mainfrom
lucasnewman:qwen3-continuous-batching

Conversation

@lucasnewman
Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman commented Apr 23, 2026

This implements vLLM Omni-style stepwise continuous batching for the Qwen3 TTS model (which is currently the only model we already have batch support for). I get nice parallelism when using 8 concurrent requests across 30 requests sent 300ms apart on my M5 Max:

Throughput
  Requests: 30/30 succeeded
  Client wall time: 21.70s
  Service window: 21.70s
  Requests/sec: 1.38
  Input chars/sec: 115.0
  Response bytes/sec: 378.02 KiB/s
  Audio throughput: 8.06x realtime (174.96s audio)
  End-to-end RTF: 0.12
  Parallelism factor: 7.07x

Latency
  min/mean/median/p95/max: 2.94s / 5.11s / 5.30s / 5.99s / 6.11s

Audio
  Decoded files: 30
  Sample rates: 24000
  Total samples: 4199040

It currently doesn't support streaming since the patch is already somewhat large and I want to stage in the complexity, but I'll look at both that and adding batch support for other models as follow-ons.

@lucasnewman lucasnewman requested a review from Blaizzy April 23, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant