Skip to content

Commit e53d28e

Browse files
alexkromanclaude
andauthored
Add assembly clip command to cut media by transcript content (#129)
Implements the `assembly clip` command, which cuts clips out of audio/video files based on speaker labels, text search, LLM-driven selection, or explicit time ranges. ## Summary This PR adds a complete new command (`assembly clip`) that orchestrates media cutting via ffmpeg, driven by transcript-based selection. The command supports multiple selection sources (speaker/search filters, LLM Gateway model picks, explicit ranges), handles YouTube/media-page downloads via yt-dlp, accepts piped transcripts on stdin, and outputs clips with optional padding and merging. ## Key Changes **Core Implementation:** - `aai_cli/clip_exec.py` (369 lines): Main orchestration logic for validation, transcript resolution, segment selection, and ffmpeg invocation. Handles local files, YouTube URLs, piped transcripts (`-t -`), and LLM-driven selection via the LLM Gateway. - `aai_cli/clip_select.py` (198 lines): Pure selection logic—range parsing (seconds and clock times like `1:30-2:45`), utterance filtering by speaker/search, segment merging with padding, and LLM reply parsing. - `aai_cli/commands/clip.py` (128 lines): Typer CLI command definition with all flags (`--speaker`, `--search`, `--llm`, `--range`, `--padding`, `--out-dir`, `--transcript-id`, etc.) and help text. **Test Suite:** - `tests/test_clip_exec.py` (362 lines): Tests validation, ffmpeg orchestration, range-only cutting, and transcript-backed selection (ffmpeg boundary faked). - `tests/test_clip_select.py` (209 lines): Tests pure selection logic—range parsing, segment merging, utterance filtering, LLM listing/reply contract, and clock formatting. - `tests/test_clip_sources.py` (294 lines): Tests YouTube/media-page downloads, stdin transcript piping (`-t -`), and LLM-driven selection (all boundaries faked). - `tests/test_clip_command.py` (158 lines): CLI-level tests for argv parsing, error rendering, and command placement in help. - `tests/_clip_helpers.py` (67 lines): Shared test builders (option defaults, transcript fakes, ffmpeg recording). **Integration:** - Updated `aai_cli/main.py` to register the `clip` command sub-app. - Updated `.importlinter` to allow `clip_exec` and `clip_select` modules. - Updated `AGENTS.md` to document the command layer architecture. - Updated `aai_cli/skills/aai-cli/references/transcription.md` with `clip` command documentation. - Updated help snapshot tests in `tests/__snapshots__/test_snapshots_help_run.ambr`. - Updated `tests/_snapshot_surface.py` to include `clip` in the help group. - Updated `README.md` to list the new command. - Updated `tests/test_smoke.py` to verify command ordering. ## Notable Implementation Details - **Selection composition**: `--speaker` and `--search` filter utterances first; `--llm` then picks windows from the filtered set (or the whole transcript if unfiltered). `--range` adds explicit segments. All sources merge and overlap-coalesce. - **Transcript sources**: Transcripts can be made fresh (with speaker labels), fetched by ID, or piped as JSON on stdin (`-t -`), avoiding re-transcription. - **YouTube support**: Media-page URLs are downloaded via `youtube.download_audio()` into a temp directory; clips land in `--out-dir` or the current directory. - **LLM integration**: The LLM Gateway receives a timestamped utterance listing and returns JSON segment picks; the reply is parsed robustly (handles markdown code blocks, surrounding text). - **Padding & merging**: Segments are padded (clamped at 0), sorted, and coalesced where they touch or overlap, so consecutive utterances don't shatter into per-sentence files. - **ffmpeg orchestration**: Each surviving segment is re-encoded into its own file (`<name>.clip https://claude.ai/code/session_011SdBCjATahktayRZfjmwWk Co-authored-by: Claude <noreply@anthropic.com>
1 parent 2cc701f commit e53d28e

16 files changed

Lines changed: 1912 additions & 6 deletions

.importlinter

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ source_modules =
1111
aai_cli.argscan
1212
aai_cli.auth
1313
aai_cli.client
14+
aai_cli.clip_exec
15+
aai_cli.clip_select
1416
aai_cli.code_gen
1517
aai_cli.coding_agent
1618
aai_cli.config
@@ -54,6 +56,7 @@ modules =
5456
aai_cli.commands.account
5557
aai_cli.commands.agent
5658
aai_cli.commands.audit
59+
aai_cli.commands.clip
5760
aai_cli.commands.deploy
5861
aai_cli.commands.dev
5962
aai_cli.commands.doctor
@@ -77,6 +80,7 @@ type = forbidden
7780
source_modules =
7881
aai_cli.argscan
7982
aai_cli.client
83+
aai_cli.clip_select
8084
aai_cli.config
8185
aai_cli.config_builder
8286
aai_cli.debuglog

AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,9 @@ A Typer CLI. `aai_cli/main.py` builds the `app`, registers each command sub-app,
162162

163163
### Command layer
164164

165-
Each file in `aai_cli/commands/` is a Typer sub-app (`transcribe`, `stream`, `agent`, `speak`, `llm`, `transcripts`, `login` (login/logout/whoami), `doctor`, `init`, `dev`, `share`, `deploy`, `setup`, `onboard`, `account` (balance/usage/limits), `keys`, `sessions`, `audit`, `telemetry` (status/enable/disable), `webhooks` (listen)). Command bodies run through `context.run_command(ctx, fn, json=...)`, which maps any `CLIError` to clean stderr output + the error's exit code. Commands never print tracebacks for expected failures.
165+
Each file in `aai_cli/commands/` is a Typer sub-app (`transcribe`, `stream`, `agent`, `speak`, `llm`, `clip`, `transcripts`, `login` (login/logout/whoami), `doctor`, `init`, `dev`, `share`, `deploy`, `setup`, `onboard`, `account` (balance/usage/limits), `keys`, `sessions`, `audit`, `telemetry` (status/enable/disable), `webhooks` (listen)). Command bodies run through `context.run_command(ctx, fn, json=...)`, which maps any `CLIError` to clean stderr output + the error's exit code. Commands never print tracebacks for expected failures.
166166

167-
**Options/run split for flag-heavy commands** (gh-CLI style): the Typer function only parses argv into a frozen `<Cmd>Options` dataclass and hands it to a module-level `run_<cmd>(opts, state, *, json_mode)` through a thin lambda adapter in `run_command(ctx, ..., json=...)`. The five run commands follow it — `aai_cli/stream_exec.py` (the reference implementation), `transcribe_exec.py`, `agent_exec.py`, `speak_exec.py`, `llm_exec.py`. Because the run path is a plain function of data, tests construct options directly (`dataclasses.replace` off a defaults instance, see `tests/test_stream_exec.py` and `tests/test_command_options_seam.py`) instead of round-tripping argv through `CliRunner` — which is also the cheap way to kill mutation-gate mutants on orchestration lines. Follow this for new or heavily-reworked commands with long bodies; small commands keep the inline `body()` closure — the dataclass is pure ceremony there.
167+
**Options/run split for flag-heavy commands** (gh-CLI style): the Typer function only parses argv into a frozen `<Cmd>Options` dataclass and hands it to a module-level `run_<cmd>(opts, state, *, json_mode)` through a thin lambda adapter in `run_command(ctx, ..., json=...)`. The six run commands follow it — `aai_cli/stream_exec.py` (the reference implementation), `transcribe_exec.py`, `agent_exec.py`, `speak_exec.py`, `llm_exec.py`, `clip_exec.py`. Because the run path is a plain function of data, tests construct options directly (`dataclasses.replace` off a defaults instance, see `tests/test_stream_exec.py` and `tests/test_command_options_seam.py`) instead of round-tripping argv through `CliRunner` — which is also the cheap way to kill mutation-gate mutants on orchestration lines. Follow this for new or heavily-reworked commands with long bodies; small commands keep the inline `body()` closure — the dataclass is pure ceremony there.
168168

169169
### Cross-cutting state (resolution order matters)
170170

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ assembly init # scaffold a starter app
173173
- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too.
174174
- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal.
175175
- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`).
176+
- **Transcript-driven clipping**: `assembly clip` cuts an audio/video file (or a YouTube/podcast URL) with ffmpeg by diarized speaker (`--speaker A`), text match (`--search "pricing"`), LLM pick (`--llm "the three best moments"`), or explicit time range (`--range 1:30-2:45`) — transcribing on the fly, reusing a finished transcript with `-t ID`, or reading one from a pipe (`assembly transcribe x.mp4 --speaker-labels --json | assembly clip x.mp4 -t - --llm "…"`).
176177
- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model.
177178
- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`); `assembly dev` runs it, `assembly share` exposes it on a public URL, and `assembly deploy` ships it to Vercel, Railway, or Fly.io.
178179
- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`.

0 commit comments

Comments
 (0)