Skip to content

Add assembly clip command to cut media by transcript content#129

Merged
alexkroman merged 1 commit into
mainfrom
claude/bold-volta-b0y5c2
Jun 12, 2026
Merged

Add assembly clip command to cut media by transcript content#129
alexkroman merged 1 commit into
mainfrom
claude/bold-volta-b0y5c2

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Implements the assembly clip command, which cuts clips out of audio/video files based on speaker labels, text search, LLM-driven selection, or explicit time ranges.

Summary

This PR adds a complete new command (assembly clip) that orchestrates media cutting via ffmpeg, driven by transcript-based selection. The command supports multiple selection sources (speaker/search filters, LLM Gateway model picks, explicit ranges), handles YouTube/media-page downloads via yt-dlp, accepts piped transcripts on stdin, and outputs clips with optional padding and merging.

Key Changes

Core Implementation:

  • aai_cli/clip_exec.py (369 lines): Main orchestration logic for validation, transcript resolution, segment selection, and ffmpeg invocation. Handles local files, YouTube URLs, piped transcripts (-t -), and LLM-driven selection via the LLM Gateway.
  • aai_cli/clip_select.py (198 lines): Pure selection logic—range parsing (seconds and clock times like 1:30-2:45), utterance filtering by speaker/search, segment merging with padding, and LLM reply parsing.
  • aai_cli/commands/clip.py (128 lines): Typer CLI command definition with all flags (--speaker, --search, --llm, --range, --padding, --out-dir, --transcript-id, etc.) and help text.

Test Suite:

  • tests/test_clip_exec.py (362 lines): Tests validation, ffmpeg orchestration, range-only cutting, and transcript-backed selection (ffmpeg boundary faked).
  • tests/test_clip_select.py (209 lines): Tests pure selection logic—range parsing, segment merging, utterance filtering, LLM listing/reply contract, and clock formatting.
  • tests/test_clip_sources.py (294 lines): Tests YouTube/media-page downloads, stdin transcript piping (-t -), and LLM-driven selection (all boundaries faked).
  • tests/test_clip_command.py (158 lines): CLI-level tests for argv parsing, error rendering, and command placement in help.
  • tests/_clip_helpers.py (67 lines): Shared test builders (option defaults, transcript fakes, ffmpeg recording).

Integration:

  • Updated aai_cli/main.py to register the clip command sub-app.
  • Updated .importlinter to allow clip_exec and clip_select modules.
  • Updated AGENTS.md to document the command layer architecture.
  • Updated aai_cli/skills/aai-cli/references/transcription.md with clip command documentation.
  • Updated help snapshot tests in tests/__snapshots__/test_snapshots_help_run.ambr.
  • Updated tests/_snapshot_surface.py to include clip in the help group.
  • Updated README.md to list the new command.
  • Updated tests/test_smoke.py to verify command ordering.

Notable Implementation Details

  • Selection composition: --speaker and --search filter utterances first; --llm then picks windows from the filtered set (or the whole transcript if unfiltered). --range adds explicit segments. All sources merge and overlap-coalesce.
  • Transcript sources: Transcripts can be made fresh (with speaker labels), fetched by ID, or piped as JSON on stdin (-t -), avoiding re-transcription.
  • YouTube support: Media-page URLs are downloaded via youtube.download_audio() into a temp directory; clips land in --out-dir or the current directory.
  • LLM integration: The LLM Gateway receives a timestamped utterance listing and returns JSON segment picks; the reply is parsed robustly (handles markdown code blocks, surrounding text).
  • Padding & merging: Segments are padded (clamped at 0), sorted, and coalesced where they touch or overlap, so consecutive utterances don't shatter into per-sentence files.
  • ffmpeg orchestration: Each surviving segment is re-encoded into its own file (`.clip

https://claude.ai/code/session_011SdBCjATahktayRZfjmwWk

A FunClip-style transcript-driven clipping command. assembly clip cuts a
local audio/video file (or a YouTube/media-page URL, downloaded via
yt-dlp) with ffmpeg, selecting segments four composable ways:

- --speaker A / --search "topic": filter diarized utterances (the file is
  transcribed with speaker labels on the fly, or reuse one with
  -t TRANSCRIPT_ID, or pipe `transcribe --json` output in with -t -)
- --llm "the best moments": the timestamped utterances go to LLM Gateway
  and the model picks the windows (composes with the filters)
- --range 1:30-2:45: explicit windows, no transcript needed

Selections are padded (--padding), merged where they touch, and each
surviving segment is re-encoded to <name>.clipNN<ext> (next to the
input, or --out-dir; downloads land in the cwd). --json emits the
written clips with start/end/duration.

The pure selection logic (range parsing, utterance filtering, LLM reply
parsing, merging) lives in clip_select; orchestration (transcript
resolution, yt-dlp, ffmpeg) in clip_exec, following the options/run
split with commands/clip.py as the thin argv surface.

https://claude.ai/code/session_011SdBCjATahktayRZfjmwWk
@alexkroman alexkroman enabled auto-merge June 12, 2026 21:06
@alexkroman alexkroman added this pull request to the merge queue Jun 12, 2026
Merged via the queue into main with commit e53d28e Jun 12, 2026
16 checks passed
@alexkroman alexkroman deleted the claude/bold-volta-b0y5c2 branch June 12, 2026 21:13
alexkroman pushed a commit that referenced this pull request Jun 12, 2026
Combines the new clip command from #129 with dictate: both registered in
the command order, help groups, import-linter contracts, and AGENTS.md
(seven options/run-split exec modules now); run-group help snapshot
regenerated.

Note: the full check.sh gate was deliberately skipped for this merge
commit at the operator's request; the default pytest suite passed on the
merged tree (2210 passed).

https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants