Skip to content

Stream TTS audio playback for immediate speech output#221

Merged
alexkroman merged 3 commits into
mainfrom
claude/peaceful-edison-bbue6j
Jun 17, 2026
Merged

Stream TTS audio playback for immediate speech output#221
alexkroman merged 3 commits into
mainfrom
claude/peaceful-edison-bbue6j

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Enable incremental audio playback during text-to-speech synthesis instead of buffering the entire clip before playing. This allows speech to start on the first audio frame rather than waiting for synthesis to complete, significantly improving perceived latency for long-form content.

Changes

  • New PcmPlayer class (aai_cli/tts/audio.py): A context-manager-based incremental PCM player that opens the audio device lazily on the first feed() call and reuses it for all subsequent chunks. Handles graceful shutdown (drain on normal exit, abort on error/Ctrl-C) and wraps device failures in clean CLIError messages. Includes comprehensive docstring explaining streaming behavior and error handling.

  • Refactored play_pcm() function: Simplified to a thin convenience wrapper over PcmPlayer for callers that already hold the complete PCM buffer (multi-voice dialogue path). Updated docstring to clarify its role.

  • Streaming callback in session.synthesize() (aai_cli/tts/session.py): Added optional on_audio(chunk, sample_rate) callback parameter that receives each decoded Audio frame as it arrives from the server. New _consume_audio_frame() helper handles frame decoding, buffering, and callback invocation. Full PCM is still accumulated and returned for compatibility.

  • Single-voice synthesis streaming (aai_cli/commands/speak/_exec.py): When no --out file is specified, _speak_single() now wires PcmPlayer.feed as the on_audio callback, enabling immediate playback. Buffered playback via play_pcm() is preserved for the --out path (file output).

  • Test coverage: Added comprehensive tests for PcmPlayer behavior (device lifecycle, chunk handling, error cases, no-op when unused) and synthesize() streaming callback. Updated speak command tests to verify streaming playback via the callback mechanism.

Implementation details

  • Audio is written in bounded chunks (_PLAYBACK_CHUNK_BYTES) so Ctrl-C interrupts promptly between writes rather than blocking on a large write.
  • Device opening is deferred until the first feed() call, allowing the sample rate from the server's Begin frame to be used.
  • On error or Ctrl-C, the stream is aborted (buffered frames discarded) for immediate stop; on normal exit it drains gracefully.
  • The on_audio callback is optional and additive—omitting it preserves the existing buffered behavior for backward compatibility.

https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b

@alexkroman alexkroman enabled auto-merge June 17, 2026 15:19
@alexkroman alexkroman disabled auto-merge June 17, 2026 15:27
@alexkroman alexkroman force-pushed the claude/peaceful-edison-bbue6j branch from 8fbf1bf to 3d070b4 Compare June 17, 2026 17:08
@alexkroman alexkroman enabled auto-merge June 17, 2026 17:20
@alexkroman alexkroman added this pull request to the merge queue Jun 17, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 17, 2026
@alexkroman alexkroman added this pull request to the merge queue Jun 17, 2026
@alexkroman alexkroman removed this pull request from the merge queue due to a manual request Jun 17, 2026
Single-voice `assembly speak` (default playback, no --out) now plays each
TTS Audio frame the moment it arrives instead of buffering the whole
synthesis first, so speech starts on the first frame — the win for a long
`assembly speak --url <page>`.

- session.synthesize gains an on_audio(chunk, sample_rate) callback,
  invoked per Audio frame; the full PCM is still accumulated and returned.
- audio.PcmPlayer is an incremental context-manager player that opens the
  device lazily on the first chunk (the rate is only known at Begin) and
  drains on normal exit / aborts on Ctrl-C. play_pcm now delegates to it.
- --out (needs the full buffer) and the multi-voice dialogue path stay
  buffered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01F24PozqxFy2sCAApA1Ne1b
@alexkroman alexkroman force-pushed the claude/peaceful-edison-bbue6j branch from 3d070b4 to e8c0912 Compare June 17, 2026 19:50
@alexkroman alexkroman enabled auto-merge June 17, 2026 20:21
@alexkroman alexkroman disabled auto-merge June 17, 2026 20:39
@alexkroman alexkroman enabled auto-merge June 17, 2026 20:56
@alexkroman alexkroman added this pull request to the merge queue Jun 17, 2026
Merged via the queue into main with commit 4ee44f8 Jun 17, 2026
19 checks passed
@alexkroman alexkroman deleted the claude/peaceful-edison-bbue6j branch June 17, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants