Chunk speak text and disable TTS keepalive deadline for long --url input#231
Merged
Conversation
`assembly speak --url <pdf>` failed with "TTS session failed: sent 1011 (internal error) keepalive ping timeout" on long documents. Two causes, both fixed: - The entire extracted document was sent in a single PocketTTS `Generate` frame. PocketTTS is a streaming model meant to be fed incrementally; a whole paper stalls the server so it stops answering the websocket keepalive ping, and the client closes the still-alive socket with 1011. `synthesize_chunked` now splits the text into sentence-aligned chunks (packed to a safe char budget; an over-long terminator-less PDF blob is sliced) and synthesizes one connection per chunk — the same one-sentence-per-connection pattern agent-cascade already uses. Audio also starts on the first chunk instead of after the whole document. - Independently, disable websockets' 20s keepalive pong deadline on the TTS socket (`ping_timeout=None`). `_RECV_TIMEOUT_SECONDS` (60s) is already the liveness authority per frame, so a server slow to emit the first frame under load is no longer killed prematurely. Also splits the oversized tts session test module along its natural seam (single-synthesis vs dialogue) via a shared `_tts_session_helpers` module to stay under the 500-line file gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
assembly speak --url https://arxiv.org/pdf/...(a long PDF/article) failed with:The "no close frame received" means the client (websockets) gave up on a still-alive socket, not that the server crashed.
Root cause
Two independent contributors, both fixed here:
The whole document was sent in a single PocketTTS
Generateframe. PocketTTS is a streaming model meant to be fed incrementally; a whole paper stalls the server long enough that it stops answering the websocket keepalive ping, and the client closes the socket with code 1011.websockets' default 20s keepalive pong deadline is too aggressive for this workload. A server slow to emit the first Audio frame under load gets killed before producing anything.
Fix
session.synthesize_chunkedsplits the text into sentence-aligned chunks packed to a safe char budget and synthesizes one connection per chunk — the same one-sentence-per-connection patternagent-cascadealready uses. An over-long, terminator-less PDF blob is hard-sliced so no singleGeneratecan blow past the server's input ceiling. Bonus: audio now starts on the first chunk instead of after the whole document.ping_timeout=Noneon the TTS socket disables the redundant pong deadline._RECV_TIMEOUT_SECONDS(60s) is already the per-frame liveness authority, so a slow-but-alive server is no longer killed; a genuinely dead connection still fails cleanly.Notes
aai_cli/tts/text.pyholds the puresplit_sentences/chunk_texthelpers (Rich-free, unit-tested).test_tts_session.pywas split along its natural seam (single-synthesis vs dialogue) via a sharedtests/_tts_session_helpers.pyto stay under the 500-line file gate.scripts/check.shgate passes (coverage, 100% patch coverage, mutation gate, build).🤖 Generated with Claude Code