Deduplicate clip/dub media scaffolding into shared mediafile module by alexkroman · Pull Request #137 · AssemblyAI/cli

alexkroman · 2026-06-12T23:25:33Z

The dub command (#135) copied clip_exec's local-media validation, ffmpeg discovery/invocation, ffmpeg-failure mapping, and diarized-transcript resolution near-verbatim, and speak_exec's sandbox guard — and the new caption command (#139) shipped a third copy of the same scaffolding. This PR hoists it all into one shared module and fixes the correctness bugs a review of #135 surfaced.

Deduplication (aai_cli/mediafile.py)

validate_local_media, validate_out, require_ffmpeg, run_ffmpeg, ffmpeg_failure, path_arg, and resolve_transcript/resolve_diarized_transcript, used by clip_exec, dub_exec, and caption_exec (parameterized by command/purpose strings).
tts/session.require_available(command): one sandbox guard for speak and dub.
dub_exec: API key resolved once per run (keyring IPC each call), single-use generator folded away, and assemble_timeline returns its bytearray instead of copying the whole dubbed track.

Bug fixes (all commands sharing the helpers get them)

Self-overwrite guard now catches the same file under another spelling (--out TALK.MP4 on case-insensitive macOS APFS corrupted the input).
Fresh dub transcriptions auto-detect the source language (dub input is typically not English, the API default); new --source-lang flag pins it.
--out viability (existing dir, missing parent, no extension) and --voice syntax validated before the billed pipeline; unsluggable languages (e.g. 中文) ask for an explicit --out instead of colliding on <stem>.dub..<ext>.
Queued/processing/errored --transcript-ids are rejected with the real reason instead of a misleading "no utterances" error.
Translations truncated at max-tokens raise instead of dubbing speech that stops mid-sentence; --voice pins for absent speakers warn instead of vanishing.
The success line escapes user-controlled --lang/--voice text (an embedded [/] crashed with MarkupError after the dub succeeded).
Bucket URLs are rejected with the URL echoed intact (Path() collapsed s3://… to s3:/…); ffmpeg output paths starting with - are passed as ./-….

Merged with main's native-voice rotation (#136), --video/caption (#139), and exec splits (#138); the YouTube-source dub tests moved to tests/test_dub_sources.py to stay under the 500-line gate.

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

The dub command (#135) copied clip_exec's local-media validation, ffmpeg discovery/invocation, ffmpeg-failure mapping, and diarized-transcript resolution near-verbatim, and speak_exec's sandbox guard. Hoist the shared scaffolding so the two commands can't drift apart: - aai_cli/mediafile.py: validate_local_media, require_ffmpeg, run_ffmpeg, ffmpeg_failure, resolve_diarized_transcript, used by both clip_exec and dub_exec (parameterized by command/purpose strings). - tts/session.require_available(command): one sandbox guard for speak and dub (speak's message now also names streaming TTS as the reason). - dub_exec: resolve the API key once per run instead of twice (config.resolve_api_key hits keyring IPC each call), pass the already-computed transcript id into _utterances_of, fold the single-use `starts` generator into the zip comprehension, and return assemble_timeline's bytearray directly instead of copying the whole dubbed track into bytes (write_wav accepts any buffer). Tests: ffmpeg fakes now patch mediafile.run_ffmpeg; the duplicated plain() ANSI-stripper in _dub_helpers imports from _clip_helpers; new dub status-message test; suggestion asserts pin the per-command parameterization of the shared helpers. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

All confirmed findings from the dub (#135) code review: - Self-overwrite guard now also catches the same file under another spelling (samefile when --out exists): on case-insensitive filesystems (macOS APFS) `--out TALK.MP4` against talk.mp4 passed the path comparison and ffmpeg corrupted the input. - Fresh transcriptions auto-detect the source language (dub input is typically not English, which is the API default); a new --source-lang flag pins it instead. - --out viability is validated before the billed pipeline: existing directory, missing parent directory, and missing file extension (ffmpeg picks the container from it) now fail upfront, and a language that slugs to nothing (e.g. 中文) asks for an explicit --out instead of colliding every such dub onto "<stem>.dub..<ext>". - --voice is parsed before any billed work, and SPEAKER=VOICE pins for speakers absent from the diarized transcript warn instead of being dropped silently (mirrors assembly speak). - A --transcript-id that is queued/processing/errored is rejected with the real reason (shared resolve_diarized_transcript, so clip gets the same fix) instead of a misleading "no utterances" error. - Translations truncated at max_tokens (finish_reason length/max_tokens) raise instead of dubbing speech that stops mid-sentence. - The success line escapes user-controlled --lang/--voice text (an embedded "[/]" crashed with MarkupError after the dub succeeded). - URLs are rejected with the URL echoed intact (Path() collapsed "s3://…" to "s3:/…") and a download hint. - ffmpeg output paths starting with "-" are passed as "./-…" so they can't be parsed as ffmpeg options (clip's cut destinations too). https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

…mediafile refactor Reconciles the shared-scaffolding refactor with three upstream changes: the language-native voice rotation (#136), the dub --video flag and the new caption command (#139), and the exec-module splits (#138). - run_dub keeps upstream's YouTube-download branch and _dub_and_emit split, with this branch's early validation (--voice parse, URL echo, out-path checks) threaded through; the parsed --voice pair rides in a frozen _VoicePlan. - caption_exec now uses the shared mediafile helpers too (it had copied the same scaffolding), which also gives caption the upfront out-path validation, the samefile self-overwrite guard, the transcript status check, and the './-' ffmpeg path hardening. - mediafile grows the caption-shaped pieces: validate_out (hoisted from dub_exec), a general resolve_transcript (diarized variant delegates), a kind= parameter for validate_local_media, and a suggestion override for ffmpeg_failure. - test_dub_pipeline's YouTube-source tests move to test_dub_sources.py to stay under the 500-line file gate. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

Keeps mediafile.require_ffmpeg in run_dub's download branch, ports the --download-sections fixture/tests into tests/test_dub_sources.py (where the YouTube-source dub tests moved), and regenerates the run-group help snapshot. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

alexkroman enabled auto-merge June 12, 2026 23:25

claude added 3 commits June 12, 2026 23:43

alexkroman added this pull request to the merge queue Jun 13, 2026

Merged via the queue into main with commit 637af5f Jun 13, 2026
15 checks passed

alexkroman deleted the claude/kind-mendel-jw53hz branch June 13, 2026 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate clip/dub media scaffolding into shared mediafile module#137

Deduplicate clip/dub media scaffolding into shared mediafile module#137
alexkroman merged 4 commits into
mainfrom
claude/kind-mendel-jw53hz

alexkroman commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexkroman commented Jun 12, 2026 •

edited

Loading