Deduplicate clip/dub media scaffolding into shared mediafile module#137
Merged
Conversation
The dub command (#135) copied clip_exec's local-media validation, ffmpeg discovery/invocation, ffmpeg-failure mapping, and diarized-transcript resolution near-verbatim, and speak_exec's sandbox guard. Hoist the shared scaffolding so the two commands can't drift apart: - aai_cli/mediafile.py: validate_local_media, require_ffmpeg, run_ffmpeg, ffmpeg_failure, resolve_diarized_transcript, used by both clip_exec and dub_exec (parameterized by command/purpose strings). - tts/session.require_available(command): one sandbox guard for speak and dub (speak's message now also names streaming TTS as the reason). - dub_exec: resolve the API key once per run instead of twice (config.resolve_api_key hits keyring IPC each call), pass the already-computed transcript id into _utterances_of, fold the single-use `starts` generator into the zip comprehension, and return assemble_timeline's bytearray directly instead of copying the whole dubbed track into bytes (write_wav accepts any buffer). Tests: ffmpeg fakes now patch mediafile.run_ffmpeg; the duplicated plain() ANSI-stripper in _dub_helpers imports from _clip_helpers; new dub status-message test; suggestion asserts pin the per-command parameterization of the shared helpers. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
All confirmed findings from the dub (#135) code review: - Self-overwrite guard now also catches the same file under another spelling (samefile when --out exists): on case-insensitive filesystems (macOS APFS) `--out TALK.MP4` against talk.mp4 passed the path comparison and ffmpeg corrupted the input. - Fresh transcriptions auto-detect the source language (dub input is typically not English, which is the API default); a new --source-lang flag pins it instead. - --out viability is validated before the billed pipeline: existing directory, missing parent directory, and missing file extension (ffmpeg picks the container from it) now fail upfront, and a language that slugs to nothing (e.g. 中文) asks for an explicit --out instead of colliding every such dub onto "<stem>.dub..<ext>". - --voice is parsed before any billed work, and SPEAKER=VOICE pins for speakers absent from the diarized transcript warn instead of being dropped silently (mirrors assembly speak). - A --transcript-id that is queued/processing/errored is rejected with the real reason (shared resolve_diarized_transcript, so clip gets the same fix) instead of a misleading "no utterances" error. - Translations truncated at max_tokens (finish_reason length/max_tokens) raise instead of dubbing speech that stops mid-sentence. - The success line escapes user-controlled --lang/--voice text (an embedded "[/]" crashed with MarkupError after the dub succeeded). - URLs are rejected with the URL echoed intact (Path() collapsed "s3://…" to "s3:/…") and a download hint. - ffmpeg output paths starting with "-" are passed as "./-…" so they can't be parsed as ffmpeg options (clip's cut destinations too). https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
…mediafile refactor Reconciles the shared-scaffolding refactor with three upstream changes: the language-native voice rotation (#136), the dub --video flag and the new caption command (#139), and the exec-module splits (#138). - run_dub keeps upstream's YouTube-download branch and _dub_and_emit split, with this branch's early validation (--voice parse, URL echo, out-path checks) threaded through; the parsed --voice pair rides in a frozen _VoicePlan. - caption_exec now uses the shared mediafile helpers too (it had copied the same scaffolding), which also gives caption the upfront out-path validation, the samefile self-overwrite guard, the transcript status check, and the './-' ffmpeg path hardening. - mediafile grows the caption-shaped pieces: validate_out (hoisted from dub_exec), a general resolve_transcript (diarized variant delegates), a kind= parameter for validate_local_media, and a suggestion override for ffmpeg_failure. - test_dub_pipeline's YouTube-source tests move to test_dub_sources.py to stay under the 500-line file gate. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
Keeps mediafile.require_ffmpeg in run_dub's download branch, ports the --download-sections fixture/tests into tests/test_dub_sources.py (where the YouTube-source dub tests moved), and regenerates the run-group help snapshot. https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The dub command (#135) copied clip_exec's local-media validation, ffmpeg discovery/invocation, ffmpeg-failure mapping, and diarized-transcript resolution near-verbatim, and speak_exec's sandbox guard — and the new caption command (#139) shipped a third copy of the same scaffolding. This PR hoists it all into one shared module and fixes the correctness bugs a review of #135 surfaced.
Deduplication (
aai_cli/mediafile.py)validate_local_media,validate_out,require_ffmpeg,run_ffmpeg,ffmpeg_failure,path_arg, andresolve_transcript/resolve_diarized_transcript, used byclip_exec,dub_exec, andcaption_exec(parameterized by command/purpose strings).tts/session.require_available(command): one sandbox guard for speak and dub.assemble_timelinereturns its bytearray instead of copying the whole dubbed track.Bug fixes (all commands sharing the helpers get them)
--out TALK.MP4on case-insensitive macOS APFS corrupted the input).--source-langflag pins it.--outviability (existing dir, missing parent, no extension) and--voicesyntax validated before the billed pipeline; unsluggable languages (e.g.中文) ask for an explicit--outinstead of colliding on<stem>.dub..<ext>.--transcript-ids are rejected with the real reason instead of a misleading "no utterances" error.--voicepins for absent speakers warn instead of vanishing.--lang/--voicetext (an embedded[/]crashed with MarkupError after the dub succeeded).s3://…tos3:/…); ffmpeg output paths starting with-are passed as./-….Merged with main's native-voice rotation (#136),
--video/caption (#139), and exec splits (#138); the YouTube-source dub tests moved totests/test_dub_sources.pyto stay under the 500-line gate.https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h