Skip to content

Deduplicate clip/dub media scaffolding into shared mediafile module#137

Merged
alexkroman merged 4 commits into
mainfrom
claude/kind-mendel-jw53hz
Jun 13, 2026
Merged

Deduplicate clip/dub media scaffolding into shared mediafile module#137
alexkroman merged 4 commits into
mainfrom
claude/kind-mendel-jw53hz

Conversation

@alexkroman

@alexkroman alexkroman commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

The dub command (#135) copied clip_exec's local-media validation, ffmpeg discovery/invocation, ffmpeg-failure mapping, and diarized-transcript resolution near-verbatim, and speak_exec's sandbox guard — and the new caption command (#139) shipped a third copy of the same scaffolding. This PR hoists it all into one shared module and fixes the correctness bugs a review of #135 surfaced.

Deduplication (aai_cli/mediafile.py)

  • validate_local_media, validate_out, require_ffmpeg, run_ffmpeg, ffmpeg_failure, path_arg, and resolve_transcript/resolve_diarized_transcript, used by clip_exec, dub_exec, and caption_exec (parameterized by command/purpose strings).
  • tts/session.require_available(command): one sandbox guard for speak and dub.
  • dub_exec: API key resolved once per run (keyring IPC each call), single-use generator folded away, and assemble_timeline returns its bytearray instead of copying the whole dubbed track.

Bug fixes (all commands sharing the helpers get them)

  • Self-overwrite guard now catches the same file under another spelling (--out TALK.MP4 on case-insensitive macOS APFS corrupted the input).
  • Fresh dub transcriptions auto-detect the source language (dub input is typically not English, the API default); new --source-lang flag pins it.
  • --out viability (existing dir, missing parent, no extension) and --voice syntax validated before the billed pipeline; unsluggable languages (e.g. 中文) ask for an explicit --out instead of colliding on <stem>.dub..<ext>.
  • Queued/processing/errored --transcript-ids are rejected with the real reason instead of a misleading "no utterances" error.
  • Translations truncated at max-tokens raise instead of dubbing speech that stops mid-sentence; --voice pins for absent speakers warn instead of vanishing.
  • The success line escapes user-controlled --lang/--voice text (an embedded [/] crashed with MarkupError after the dub succeeded).
  • Bucket URLs are rejected with the URL echoed intact (Path() collapsed s3://… to s3:/…); ffmpeg output paths starting with - are passed as ./-….

Merged with main's native-voice rotation (#136), --video/caption (#139), and exec splits (#138); the YouTube-source dub tests moved to tests/test_dub_sources.py to stay under the 500-line gate.

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h

The dub command (#135) copied clip_exec's local-media validation, ffmpeg
discovery/invocation, ffmpeg-failure mapping, and diarized-transcript
resolution near-verbatim, and speak_exec's sandbox guard. Hoist the
shared scaffolding so the two commands can't drift apart:

- aai_cli/mediafile.py: validate_local_media, require_ffmpeg,
  run_ffmpeg, ffmpeg_failure, resolve_diarized_transcript, used by both
  clip_exec and dub_exec (parameterized by command/purpose strings).
- tts/session.require_available(command): one sandbox guard for speak
  and dub (speak's message now also names streaming TTS as the reason).
- dub_exec: resolve the API key once per run instead of twice
  (config.resolve_api_key hits keyring IPC each call), pass the
  already-computed transcript id into _utterances_of, fold the
  single-use `starts` generator into the zip comprehension, and return
  assemble_timeline's bytearray directly instead of copying the whole
  dubbed track into bytes (write_wav accepts any buffer).

Tests: ffmpeg fakes now patch mediafile.run_ffmpeg; the duplicated
plain() ANSI-stripper in _dub_helpers imports from _clip_helpers; new
dub status-message test; suggestion asserts pin the per-command
parameterization of the shared helpers.

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
@alexkroman alexkroman enabled auto-merge June 12, 2026 23:25
claude added 3 commits June 12, 2026 23:43
All confirmed findings from the dub (#135) code review:

- Self-overwrite guard now also catches the same file under another
  spelling (samefile when --out exists): on case-insensitive
  filesystems (macOS APFS) `--out TALK.MP4` against talk.mp4 passed the
  path comparison and ffmpeg corrupted the input.
- Fresh transcriptions auto-detect the source language (dub input is
  typically not English, which is the API default); a new --source-lang
  flag pins it instead.
- --out viability is validated before the billed pipeline: existing
  directory, missing parent directory, and missing file extension
  (ffmpeg picks the container from it) now fail upfront, and a language
  that slugs to nothing (e.g. 中文) asks for an explicit --out instead
  of colliding every such dub onto "<stem>.dub..<ext>".
- --voice is parsed before any billed work, and SPEAKER=VOICE pins for
  speakers absent from the diarized transcript warn instead of being
  dropped silently (mirrors assembly speak).
- A --transcript-id that is queued/processing/errored is rejected with
  the real reason (shared resolve_diarized_transcript, so clip gets the
  same fix) instead of a misleading "no utterances" error.
- Translations truncated at max_tokens (finish_reason length/max_tokens)
  raise instead of dubbing speech that stops mid-sentence.
- The success line escapes user-controlled --lang/--voice text (an
  embedded "[/]" crashed with MarkupError after the dub succeeded).
- URLs are rejected with the URL echoed intact (Path() collapsed
  "s3://…" to "s3:/…") and a download hint.
- ffmpeg output paths starting with "-" are passed as "./-…" so they
  can't be parsed as ffmpeg options (clip's cut destinations too).

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
…mediafile refactor

Reconciles the shared-scaffolding refactor with three upstream changes:
the language-native voice rotation (#136), the dub --video flag and the
new caption command (#139), and the exec-module splits (#138).

- run_dub keeps upstream's YouTube-download branch and _dub_and_emit
  split, with this branch's early validation (--voice parse, URL echo,
  out-path checks) threaded through; the parsed --voice pair rides in a
  frozen _VoicePlan.
- caption_exec now uses the shared mediafile helpers too (it had copied
  the same scaffolding), which also gives caption the upfront out-path
  validation, the samefile self-overwrite guard, the transcript status
  check, and the './-' ffmpeg path hardening.
- mediafile grows the caption-shaped pieces: validate_out (hoisted from
  dub_exec), a general resolve_transcript (diarized variant delegates),
  a kind= parameter for validate_local_media, and a suggestion override
  for ffmpeg_failure.
- test_dub_pipeline's YouTube-source tests move to test_dub_sources.py
  to stay under the 500-line file gate.

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
Keeps mediafile.require_ffmpeg in run_dub's download branch, ports the
--download-sections fixture/tests into tests/test_dub_sources.py (where
the YouTube-source dub tests moved), and regenerates the run-group help
snapshot.

https://claude.ai/code/session_018TuAQTvp9PVy5EdhsDWo2h
@alexkroman alexkroman added this pull request to the merge queue Jun 13, 2026
Merged via the queue into main with commit 637af5f Jun 13, 2026
15 checks passed
@alexkroman alexkroman deleted the claude/kind-mendel-jw53hz branch June 13, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants