Switch speak/dub default voices to the requested language's native voice#136
Merged
Conversation
Each streaming-TTS voice speaks exactly one language, but `assembly speak` and `assembly dub` always defaulted to the English voices (jane / the English rotation) regardless of the requested language — a German dub came out in jane's voice. A new aai_cli/tts/voices.py maps every voice to its language (giovanni=it, lola=es, juergen=de, rafael=pt, estelle=fr, the rest en). With no explicit --voice, both commands now rotate through the requested language's native voices — most languages ship exactly one, so the language alone selects the voice. English keeps the curated multi-speaker rotation, a language without a catalog voice falls back to it, and an explicit --voice (bare or SPEAKER=VOICE) still always wins. https://claude.ai/code/session_01PPkdXahnabDwMBwCWcSEGA
alexkroman
pushed a commit
that referenced
this pull request
Jun 13, 2026
…mediafile refactor Reconciles the shared-scaffolding refactor with three upstream changes: the language-native voice rotation (#136), the dub --video flag and the new caption command (#139), and the exec-module splits (#138). - run_dub keeps upstream's YouTube-download branch and _dub_and_emit split, with this branch's early validation (--voice parse, URL echo, out-path checks) threaded through; the parsed --voice pair rides in a frozen _VoicePlan. - caption_exec now uses the shared mediafile helpers too (it had copied the same scaffolding), which also gives caption the upfront out-path validation, the samefile self-overwrite guard, the transcript status check, and the './-' ffmpeg path hardening. - mediafile grows the caption-shaped pieces: validate_out (hoisted from dub_exec), a general resolve_transcript (diarized variant delegates), a kind= parameter for validate_local_media, and a suggestion override for ffmpeg_failure. - test_dub_pipeline's YouTube-source tests move to test_dub_sources.py to stay under the 500-line file gate. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Each streaming-TTS voice speaks exactly one language, but
assembly speakand
assembly dubalways defaulted to the English voices (jane / theEnglish rotation) regardless of the requested language — a German dub came
out in jane's voice.
A new aai_cli/tts/voices.py maps every voice to its language (giovanni=it,
lola=es, juergen=de, rafael=pt, estelle=fr, the rest en). With no explicit
--voice, both commands now rotate through the requested language's native
voices — most languages ship exactly one, so the language alone selects the
voice. English keeps the curated multi-speaker rotation, a language without
a catalog voice falls back to it, and an explicit --voice (bare or
SPEAKER=VOICE) still always wins.
https://claude.ai/code/session_01PPkdXahnabDwMBwCWcSEGA