Add assembly dub command for video/audio dubbing#135
Merged
Conversation
One command that runs the whole platform end to end: the media is transcribed with diarized utterance timestamps, each utterance is translated to the target language by an LLM Gateway model, each translation is synthesized with streaming TTS (rotation voice per speaker, --voice/SPEAKER=VOICE overrides like speak), the segments are laid on a silence timeline at their original start times, and ffmpeg swaps the new track over the original media with the video stream copied untouched (-map 0:v? -c:v copy, so audio-only input works too). Usage: assembly --sandbox dub talk.mp4 --lang de - --lang takes an ISO code (mapped to a language name) or a name as-is - --transcript-id reuses an existing diarized transcript - default output <name>.dub.<lang><ext>; --out overrides (and refuses to overwrite the input) - sandbox-only, like speak: streaming TTS has no production host yet Follows the options/run split (commands/dub.py parses argv into a frozen DubOptions; dub_exec.run_dub does the work), with the LLM, TTS, and ffmpeg boundaries seamed for hermetic tests. https://claude.ai/code/session_01Mcran5xqMHcrt4RUxSHrkX
Union resolutions: dictate + dub both land in the importlinter contracts, the run help-group, and the README features list; the run-group help snapshots were regenerated (not hand-merged) on top of main's copy. https://claude.ai/code/session_01Mcran5xqMHcrt4RUxSHrkX
CI renders CliRunner output with color, so style codes interleave inside
flag names ("--lang") and the human summary line, breaking substring
asserts that pass locally without color. Strip SGR sequences first via a
shared plain() helper, the same convention test_help_rendering and the
clip suite use.
https://claude.ai/code/session_01Mcran5xqMHcrt4RUxSHrkX
alexkroman
pushed a commit
that referenced
this pull request
Jun 12, 2026
All confirmed findings from the dub (#135) code review: - Self-overwrite guard now also catches the same file under another spelling (samefile when --out exists): on case-insensitive filesystems (macOS APFS) `--out TALK.MP4` against talk.mp4 passed the path comparison and ffmpeg corrupted the input. - Fresh transcriptions auto-detect the source language (dub input is typically not English, which is the API default); a new --source-lang flag pins it instead. - --out viability is validated before the billed pipeline: existing directory, missing parent directory, and missing file extension (ffmpeg picks the container from it) now fail upfront, and a language that slugs to nothing (e.g. 中文) asks for an explicit --out instead of colliding every such dub onto "<stem>.dub..<ext>". - --voice is parsed before any billed work, and SPEAKER=VOICE pins for speakers absent from the diarized transcript warn instead of being dropped silently (mirrors assembly speak). - A --transcript-id that is queued/processing/errored is rejected with the real reason (shared resolve_diarized_transcript, so clip gets the same fix) instead of a misleading "no utterances" error. - Translations truncated at max_tokens (finish_reason length/max_tokens) raise instead of dubbing speech that stops mid-sentence. - The success line escapes user-controlled --lang/--voice text (an embedded "[/]" crashed with MarkupError after the dub succeeded). - URLs are rejected with the URL echoed intact (Path() collapsed "s3://…" to "s3:/…") and a download hint. - ffmpeg output paths starting with "-" are passed as "./-…" so they can't be parsed as ffmpeg options (clip's cut destinations too). https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the
assembly dubcommand, a complete end-to-end dubbing pipeline that transcribes media with speaker diarization, translates utterances via LLM Gateway, synthesizes translations with streaming TTS (one voice per speaker), and muxes the dubbed audio back into the original media using ffmpeg.Key Changes
New module
aai_cli/dub_exec.py: Core dubbing pipeline orchestration--transcript-id)New command module
aai_cli/commands/dub.py: CLI interfaceMEDIA(local file),--lang(target language)--transcript-id(reuse transcript),--voice(voice assignment),--model/--max-tokens(LLM config),--out(output path)Comprehensive test suite:
tests/test_dub_exec.py: Pure helpers (language resolution, output naming, timeline assembly, utterance extraction) and validation ordertests/test_dub_pipeline.py: End-to-end faked pipeline runs with mocked transcription, translation, TTS, and ffmpegtests/test_dub_command.py: Argv parsing and flag mappingtests/_dub_helpers.py: Shared test fixtures and fake boundary recordersIntegration updates:
dubcommand in main app and help group ordering.importlinter)Implementation Details
dialoguemodule logic for voice rotation and speaker-to-voice mapping (bare--voiceapplies to all speakers;SPEAKER=VOICEpins individuals)https://claude.ai/code/session_01Mcran5xqMHcrt4RUxSHrkX