Skip to content

docs(qwen3-tts): add real-time streaming tuning guide#646

Draft
akaiHuang wants to merge 2 commits intoBlaizzy:mainfrom
akaiHuang:codex/qwen3-tts-realtime-tuning-docs
Draft

docs(qwen3-tts): add real-time streaming tuning guide#646
akaiHuang wants to merge 2 commits intoBlaizzy:mainfrom
akaiHuang:codex/qwen3-tts-realtime-tuning-docs

Conversation

@akaiHuang
Copy link
Copy Markdown

@akaiHuang akaiHuang commented Apr 11, 2026

Summary

Add a practical real-time tuning section for Qwen3-TTS streaming in the model docs.

What this adds

  • Explains the streaming_interval = chunk_tokens / 12.5 relation for Qwen3-TTS codec rate.
  • Adds a small tuning matrix (chunk_tokens 2/4/6/8) with trade-offs for latency vs smoothness.
  • Recommends stream=True, streaming_interval=0.32 as a pragmatic default for real-time usage, with guidance to move to 0.48 when smoother chunks are needed.

Why

Users deploying live translation / assistant pipelines often need concrete first-pass values rather than only API descriptions. This doc update shortens trial-and-error and aligns expectations around chunk boundaries vs latency.

Scope

Docs-only change. No runtime behavior changes.

@akaiHuang
Copy link
Copy Markdown
Author

Hi @Prince_Canuma @Blaizzy,

I've added a practical real-time streaming tuning guide for Qwen3-TTS in the documentation.

This is based on extensive benchmarking on M1 Max (4-bit). It shows:

  • How to tune chunk_tokens / streaming_interval
  • The trade-off between first-chunk latency and audio smoothness
  • Recommended settings for real-time use cases (live translation, voice agents, P2P translator, etc.)

The guide turns the previous ~4.31s first-chunk into a tunable 0.317s–0.546s range.

Would love your feedback on whether this is useful and if any adjustments are needed before marking it ready for review.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants