Add assembly clip command to cut media by transcript content#129
Merged
Conversation
A FunClip-style transcript-driven clipping command. assembly clip cuts a local audio/video file (or a YouTube/media-page URL, downloaded via yt-dlp) with ffmpeg, selecting segments four composable ways: - --speaker A / --search "topic": filter diarized utterances (the file is transcribed with speaker labels on the fly, or reuse one with -t TRANSCRIPT_ID, or pipe `transcribe --json` output in with -t -) - --llm "the best moments": the timestamped utterances go to LLM Gateway and the model picks the windows (composes with the filters) - --range 1:30-2:45: explicit windows, no transcript needed Selections are padded (--padding), merged where they touch, and each surviving segment is re-encoded to <name>.clipNN<ext> (next to the input, or --out-dir; downloads land in the cwd). --json emits the written clips with start/end/duration. The pure selection logic (range parsing, utterance filtering, LLM reply parsing, merging) lives in clip_select; orchestration (transcript resolution, yt-dlp, ffmpeg) in clip_exec, following the options/run split with commands/clip.py as the thin argv surface. https://claude.ai/code/session_011SdBCjATahktayRZfjmwWk
alexkroman
pushed a commit
that referenced
this pull request
Jun 12, 2026
Combines the new clip command from #129 with dictate: both registered in the command order, help groups, import-linter contracts, and AGENTS.md (seven options/run-split exec modules now); run-group help snapshot regenerated. Note: the full check.sh gate was deliberately skipped for this merge commit at the operator's request; the default pytest suite passed on the merged tree (2210 passed). https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the
assembly clipcommand, which cuts clips out of audio/video files based on speaker labels, text search, LLM-driven selection, or explicit time ranges.Summary
This PR adds a complete new command (
assembly clip) that orchestrates media cutting via ffmpeg, driven by transcript-based selection. The command supports multiple selection sources (speaker/search filters, LLM Gateway model picks, explicit ranges), handles YouTube/media-page downloads via yt-dlp, accepts piped transcripts on stdin, and outputs clips with optional padding and merging.Key Changes
Core Implementation:
aai_cli/clip_exec.py(369 lines): Main orchestration logic for validation, transcript resolution, segment selection, and ffmpeg invocation. Handles local files, YouTube URLs, piped transcripts (-t -), and LLM-driven selection via the LLM Gateway.aai_cli/clip_select.py(198 lines): Pure selection logic—range parsing (seconds and clock times like1:30-2:45), utterance filtering by speaker/search, segment merging with padding, and LLM reply parsing.aai_cli/commands/clip.py(128 lines): Typer CLI command definition with all flags (--speaker,--search,--llm,--range,--padding,--out-dir,--transcript-id, etc.) and help text.Test Suite:
tests/test_clip_exec.py(362 lines): Tests validation, ffmpeg orchestration, range-only cutting, and transcript-backed selection (ffmpeg boundary faked).tests/test_clip_select.py(209 lines): Tests pure selection logic—range parsing, segment merging, utterance filtering, LLM listing/reply contract, and clock formatting.tests/test_clip_sources.py(294 lines): Tests YouTube/media-page downloads, stdin transcript piping (-t -), and LLM-driven selection (all boundaries faked).tests/test_clip_command.py(158 lines): CLI-level tests for argv parsing, error rendering, and command placement in help.tests/_clip_helpers.py(67 lines): Shared test builders (option defaults, transcript fakes, ffmpeg recording).Integration:
aai_cli/main.pyto register theclipcommand sub-app..importlinterto allowclip_execandclip_selectmodules.AGENTS.mdto document the command layer architecture.aai_cli/skills/aai-cli/references/transcription.mdwithclipcommand documentation.tests/__snapshots__/test_snapshots_help_run.ambr.tests/_snapshot_surface.pyto includeclipin the help group.README.mdto list the new command.tests/test_smoke.pyto verify command ordering.Notable Implementation Details
--speakerand--searchfilter utterances first;--llmthen picks windows from the filtered set (or the whole transcript if unfiltered).--rangeadds explicit segments. All sources merge and overlap-coalesce.-t -), avoiding re-transcription.youtube.download_audio()into a temp directory; clips land in--out-diror the current directory.https://claude.ai/code/session_011SdBCjATahktayRZfjmwWk