Stream TTS audio playback for immediate speech output#221
Merged
Conversation
8fbf1bf to
3d070b4
Compare
Single-voice `assembly speak` (default playback, no --out) now plays each TTS Audio frame the moment it arrives instead of buffering the whole synthesis first, so speech starts on the first frame — the win for a long `assembly speak --url <page>`. - session.synthesize gains an on_audio(chunk, sample_rate) callback, invoked per Audio frame; the full PCM is still accumulated and returned. - audio.PcmPlayer is an incremental context-manager player that opens the device lazily on the first chunk (the rate is only known at Begin) and drains on normal exit / aborts on Ctrl-C. play_pcm now delegates to it. - --out (needs the full buffer) and the multi-voice dialogue path stay buffered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b
3d070b4 to
e8c0912
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable incremental audio playback during text-to-speech synthesis instead of buffering the entire clip before playing. This allows speech to start on the first audio frame rather than waiting for synthesis to complete, significantly improving perceived latency for long-form content.
Changes
New
PcmPlayerclass (aai_cli/tts/audio.py): A context-manager-based incremental PCM player that opens the audio device lazily on the firstfeed()call and reuses it for all subsequent chunks. Handles graceful shutdown (drain on normal exit, abort on error/Ctrl-C) and wraps device failures in cleanCLIErrormessages. Includes comprehensive docstring explaining streaming behavior and error handling.Refactored
play_pcm()function: Simplified to a thin convenience wrapper overPcmPlayerfor callers that already hold the complete PCM buffer (multi-voice dialogue path). Updated docstring to clarify its role.Streaming callback in
session.synthesize()(aai_cli/tts/session.py): Added optionalon_audio(chunk, sample_rate)callback parameter that receives each decoded Audio frame as it arrives from the server. New_consume_audio_frame()helper handles frame decoding, buffering, and callback invocation. Full PCM is still accumulated and returned for compatibility.Single-voice synthesis streaming (
aai_cli/commands/speak/_exec.py): When no--outfile is specified,_speak_single()now wiresPcmPlayer.feedas theon_audiocallback, enabling immediate playback. Buffered playback viaplay_pcm()is preserved for the--outpath (file output).Test coverage: Added comprehensive tests for
PcmPlayerbehavior (device lifecycle, chunk handling, error cases, no-op when unused) andsynthesize()streaming callback. Updatedspeakcommand tests to verify streaming playback via the callback mechanism.Implementation details
_PLAYBACK_CHUNK_BYTES) so Ctrl-C interrupts promptly between writes rather than blocking on a large write.feed()call, allowing the sample rate from the server's Begin frame to be used.on_audiocallback is optional and additive—omitting it preserves the existing buffered behavior for backward compatibility.https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b