Stream TTS audio playback for immediate speech output by alexkroman · Pull Request #221 · AssemblyAI/cli

alexkroman · 2026-06-17T15:19:18Z

Enable incremental audio playback during text-to-speech synthesis instead of buffering the entire clip before playing. This allows speech to start on the first audio frame rather than waiting for synthesis to complete, significantly improving perceived latency for long-form content.

Changes

New PcmPlayer class (aai_cli/tts/audio.py): A context-manager-based incremental PCM player that opens the audio device lazily on the first feed() call and reuses it for all subsequent chunks. Handles graceful shutdown (drain on normal exit, abort on error/Ctrl-C) and wraps device failures in clean CLIError messages. Includes comprehensive docstring explaining streaming behavior and error handling.
Refactored play_pcm() function: Simplified to a thin convenience wrapper over PcmPlayer for callers that already hold the complete PCM buffer (multi-voice dialogue path). Updated docstring to clarify its role.
Streaming callback in session.synthesize() (aai_cli/tts/session.py): Added optional on_audio(chunk, sample_rate) callback parameter that receives each decoded Audio frame as it arrives from the server. New _consume_audio_frame() helper handles frame decoding, buffering, and callback invocation. Full PCM is still accumulated and returned for compatibility.
Single-voice synthesis streaming (aai_cli/commands/speak/_exec.py): When no --out file is specified, _speak_single() now wires PcmPlayer.feed as the on_audio callback, enabling immediate playback. Buffered playback via play_pcm() is preserved for the --out path (file output).
Test coverage: Added comprehensive tests for PcmPlayer behavior (device lifecycle, chunk handling, error cases, no-op when unused) and synthesize() streaming callback. Updated speak command tests to verify streaming playback via the callback mechanism.

Implementation details

Audio is written in bounded chunks (_PLAYBACK_CHUNK_BYTES) so Ctrl-C interrupts promptly between writes rather than blocking on a large write.
Device opening is deferred until the first feed() call, allowing the sample rate from the server's Begin frame to be used.
On error or Ctrl-C, the stream is aborted (buffered frames discarded) for immediate stop; on normal exit it drains gracefully.
The on_audio callback is optional and additive—omitting it preserves the existing buffered behavior for backward compatibility.

https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b

Single-voice `assembly speak` (default playback, no --out) now plays each TTS Audio frame the moment it arrives instead of buffering the whole synthesis first, so speech starts on the first frame — the win for a long `assembly speak --url <page>`. - session.synthesize gains an on_audio(chunk, sample_rate) callback, invoked per Audio frame; the full PCM is still accumulated and returned. - audio.PcmPlayer is an incremental context-manager player that opens the device lazily on the first chunk (the rate is only known at Begin) and drains on normal exit / aborts on Ctrl-C. play_pcm now delegates to it. - --out (needs the full buffer) and the multi-voice dialogue path stay buffered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b

alexkroman enabled auto-merge June 17, 2026 15:19

alexkroman disabled auto-merge June 17, 2026 15:27

alexkroman force-pushed the claude/peaceful-edison-bbue6j branch from 8fbf1bf to 3d070b4 Compare June 17, 2026 17:08

alexkroman enabled auto-merge June 17, 2026 17:20

alexkroman added this pull request to the merge queue Jun 17, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 17, 2026

alexkroman mentioned this pull request Jun 17, 2026

test(config): fix the concurrent-writers test hang/flake (test-only, no prod change) #223

Merged

alexkroman added this pull request to the merge queue Jun 17, 2026

alexkroman removed this pull request from the merge queue due to a manual request Jun 17, 2026

alexkroman force-pushed the claude/peaceful-edison-bbue6j branch from 3d070b4 to e8c0912 Compare June 17, 2026 19:50

Merge branch 'main' into claude/peaceful-edison-bbue6j

56b1927

alexkroman enabled auto-merge June 17, 2026 20:21

alexkroman disabled auto-merge June 17, 2026 20:39

Merge branch 'main' into claude/peaceful-edison-bbue6j

1ef6651

alexkroman enabled auto-merge June 17, 2026 20:56

alexkroman added this pull request to the merge queue Jun 17, 2026

Merged via the queue into main with commit 4ee44f8 Jun 17, 2026
19 checks passed

alexkroman deleted the claude/peaceful-edison-bbue6j branch June 17, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream TTS audio playback for immediate speech output#221

Stream TTS audio playback for immediate speech output#221
alexkroman merged 3 commits into
mainfrom
claude/peaceful-edison-bbue6j

alexkroman commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 17, 2026

Changes

Implementation details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants