Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![License](https://img.shields.io/badge/license-MIT-D6402E)](https://github.com/AssemblyAI/cli/blob/main/LICENSE)
[![Docs](https://img.shields.io/badge/docs-assemblyai-D6402E)](https://www.assemblyai.com/docs)

The AssemblyAI CLI (`assembly`) brings speech AI directly into your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.
The AssemblyAI CLI (`assembly`) brings speech AI directly into your terminal: transcribe files, URLs, YouTube/podcast pages, and whole podcast RSS feeds, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.

<p align="center">
<img src="assets/welcome.png" alt="The assembly CLI welcome screen, listing command groups for transcription, streaming, voice agents, app scaffolding, and account management" width="820">
Expand Down Expand Up @@ -44,7 +44,7 @@ That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-ins

| Command | What it does |
| :--- | :--- |
| `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs |
| `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, podcast RSS feeds, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs |
| `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too |
| `assembly dictate` | Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) |
| `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal |
Expand Down Expand Up @@ -285,11 +285,13 @@ assembly transcribe video.mp4 -o srt # captions
assembly transcribe call.mp3 --speaker-labels --summarization --json
```

Transcribe in batches — a directory, a glob, or a piped list, resumable on re-run:
Transcribe in batches — a directory, a glob, a piped list, or a whole podcast
RSS feed (every episode becomes one source), resumable on re-run:

```sh
assembly transcribe ./recordings
assembly transcribe "s3://bucket/calls/*.mp3" # needs: pip install s3fs
assembly transcribe "https://feeds.simplecast.com/54nAGcIl" # every episode in the feed
find . -name "*.wav" | assembly transcribe --from-stdin
```

Expand Down
123 changes: 123 additions & 0 deletions aai_cli/app/transcribe/feed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
"""Podcast RSS/Atom feed expansion for ``assembly transcribe``.

A feed URL names a whole show, so transcribing it means transcribing every
episode. ``feed_episode_urls`` fetches the URL and, when ``feedparser`` recognizes
it as an RSS or Atom feed, returns its episode enclosure URLs (in feed order —
newest first) for the batch path to transcribe, one resumable sidecar per episode.
The enclosures are direct media URLs the API fetches itself, so — unlike a YouTube
or podcast *page*, which yt-dlp downloads first — no local download step is needed.

Detection is deliberately narrow so a direct media URL or ordinary web page still
falls through to the single-source path untouched (and is never fetched twice):
only an http(s) URL whose path is feed-shaped — no extension, or one of
``.xml``/``.rss``/``.atom`` — and that no dedicated yt-dlp extractor already claims
is sniffed, the response body is bounded, and only content ``feedparser`` parses as
a real feed with at least one enclosure is treated as a feed. We hand ``feedparser``
the already-fetched bytes (never the URL) so our bounded, safe fetch below stays the
only network path.
"""

from __future__ import annotations

from pathlib import PurePosixPath
from urllib.parse import urlsplit

from pydantic import BaseModel, Field

from aai_cli.core import youtube

# A feed lives at an extensionless URL (e.g. feeds.simplecast.com/<id>) or a feed
# document (.xml/.rss/.atom). Every other path — .mp3, .txt, .pdf — is never a feed,
# so it is left for the single-source path and never fetched here.
_FEED_URL_SUFFIXES = frozenset({"", ".xml", ".rss", ".atom"})

# Bound the download so a hostile or huge URL can't exhaust memory; 10 MB of feed
# already holds thousands of episodes, far past any realistic batch.
_MAX_FEED_BYTES = 10 * 1024 * 1024 # pragma: no mutate -- tuning knob, not behavior
_FETCH_TIMEOUT_SECONDS = 15.0 # pragma: no mutate -- tuning knob, not behavior


class _Enclosure(BaseModel):
"""One ``<enclosure>`` / Atom enclosure link; ``href`` is the media URL."""

href: str = ""


class _Entry(BaseModel):
# default_factory (not a shared `= []`) so each entry gets its own list, and the
# typed factory keeps the field's element type known under pyright strict.
enclosures: list[_Enclosure] = Field(default_factory=list[_Enclosure])


class _ParsedFeed(BaseModel):
"""The slice of feedparser's untyped result we use, validated into a real type
(the project pattern for untyped third-party returns — cf. core/wer.py)."""

# feedparser sets ``version`` to a non-empty id ("rss20", "atom10", …) for a
# recognized feed and to "" for anything it doesn't recognize as one.
version: str = ""
entries: list[_Entry] = Field(default_factory=list[_Entry])


def feed_episode_urls(url: str) -> list[str] | None:
"""The episode media URLs if `url` is a podcast feed, else ``None``.

Returns ``None`` (stay single-source) for a direct-media URL, a yt-dlp page,
an unreachable URL, or any content that isn't a feed carrying enclosures.
"""
if not _looks_like_feed_url(url) or youtube.is_downloadable_url(url):
return None
body = _fetch(url)
if body is None:
return None
return _episode_urls(body)


def _looks_like_feed_url(url: str) -> bool:
"""True when the URL path is feed-shaped: extensionless or a feed document."""
suffix = PurePosixPath(urlsplit(url).path).suffix.lower()
return suffix in _FEED_URL_SUFFIXES


def _episode_urls(body: str) -> list[str] | None:
"""The enclosure URLs in a feed body, deduped in document order; ``None`` when
feedparser doesn't recognize it as a feed or it carries no enclosures."""
import feedparser

# feedparser ships only partial inline types (its parse signature is Unknown),
# so the result is validated through _ParsedFeed below; mirror remotefs.py's
# fsspec shim in ignoring the unavoidable unknown-member report on the call.
raw = feedparser.parse(body) # pyright: ignore[reportUnknownMemberType]
parsed = _ParsedFeed.model_validate(raw)
if not parsed.version:
return None
urls = [enc.href for entry in parsed.entries for enc in entry.enclosures if enc.href]
deduped = list(dict.fromkeys(urls))
return deduped or None


def _fetch(url: str) -> str | None:
"""Up to ``_MAX_FEED_BYTES`` of `url` decoded as text, or ``None`` on any failure
or when the response is obviously binary media (audio/video/image)."""
import httpx2 as httpx

chunks: list[bytes] = []
try:
with (
httpx.Client(timeout=_FETCH_TIMEOUT_SECONDS, follow_redirects=True) as client,
client.stream("GET", url) as response,
):
if not response.is_success:
return None
content_type = response.headers.get("content-type", "").lower()
if content_type.startswith(("audio/", "video/", "image/")):
return None
total = 0
for chunk in response.iter_bytes():
chunks.append(chunk)
total += len(chunk)
if total >= _MAX_FEED_BYTES:
break
except (httpx.HTTPError, OSError):
return None
return b"".join(chunks).decode("utf-8", "replace")
7 changes: 6 additions & 1 deletion aai_cli/app/transcribe/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,12 @@ def run_transcribe(opts: TranscribeOptions, state: AppState, *, json_mode: bool)
transcribe_validate.validate_speakers_expected(merged)

sources = transcribe_sources.expand_sources(
opts.source, from_stdin=opts.from_stdin, sample=opts.sample
opts.source,
from_stdin=opts.from_stdin,
sample=opts.sample,
# --show-code must never touch the network; skip the feed probe and treat a
# URL as a single source for code generation.
detect_feeds=not opts.show_code,
)
if sources is not None:
transcribe_sources.reject_single_source_flags(
Expand Down
27 changes: 22 additions & 5 deletions aai_cli/app/transcribe/sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,24 +49,41 @@
_GLOB_CHARS = frozenset("*?[")


def expand_sources(source: str | None, *, from_stdin: bool, sample: bool) -> list[str] | None:
def expand_sources(
source: str | None, *, from_stdin: bool, sample: bool, detect_feeds: bool = True
) -> list[str] | None:
"""The batch source list, or ``None`` when this is a single-source invocation.

Batch mode triggers on ``--from-stdin``, a directory (scanned recursively for
audio files), a glob pattern that names no existing file, or a bucket URL
that is a glob or trailing-slash folder. A plain file, URL, ``-`` (audio
piped on stdin), or ``--sample`` stays on the single-source path.
audio files), a glob pattern that names no existing file, a bucket URL that is
a glob or trailing-slash folder, or an http(s) URL that turns out to be a
podcast RSS/Atom feed (each episode becomes one batch source). A plain file,
direct media URL, ``-`` (audio piped on stdin), or ``--sample`` stays on the
single-source path. ``detect_feeds=False`` skips the feed probe (and its
network fetch) for paths that must not touch the network, e.g. ``--show-code``.
"""
if from_stdin:
return _stdin_sources(source, sample=sample)
# `not source` (rather than `is None`) also catches the empty string — e.g. an
# unset shell variable in `assembly transcribe "$FILE"`. `Path("")` is `Path(".")`,
# so it would otherwise fall into the directory branch and batch-transcribe the
# whole working directory; instead it stays single-source and fails validation.
if not source or sample or source == "-" or source.startswith(URL_PREFIXES):
if not source or sample or source == "-":
return None
if source.startswith(URL_PREFIXES):
# A podcast feed URL expands into its episode enclosure URLs (batch mode);
# a direct media URL or ordinary page returns None and stays single-source.
from aai_cli.app.transcribe import feed

return feed.feed_episode_urls(source) if detect_feeds else None
if remotefs.is_remote_url(source):
return _remote_sources(source)
return _local_sources(source)


def _local_sources(source: str) -> list[str] | None:
"""Batch sources for a local path: a directory's audio files or a glob's matches,
else ``None`` (a single file, which the single-source path handles)."""
path = Path(source)
if path.is_dir():
return _directory_sources(path)
Expand Down
17 changes: 11 additions & 6 deletions aai_cli/commands/transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
("Try it with the hosted sample", "assembly transcribe --sample"),
("Transcribe a YouTube video", "assembly transcribe https://youtu.be/dtp6b76pMak"),
("Transcribe a podcast page", 'assembly transcribe "https://podcasts.apple.com/…"'),
(
"Transcribe a whole podcast feed",
'assembly transcribe "https://feeds.simplecast.com/…"',
),
("Label who said what", "assembly transcribe call.mp3 --speaker-labels"),
("Redact PII for compliance", "assembly transcribe call.mp3 --redact-pii"),
("Summarize a recording", "assembly transcribe call.mp3 --summarization"),
Expand All @@ -43,8 +47,8 @@ def transcribe(
ctx: typer.Context,
source: str | None = typer.Argument(
None,
help="Audio file, URL, YouTube/podcast URL, bucket URL (s3://, gs://, …), or a "
"directory/glob (batch mode)",
help="Audio file, URL, YouTube/podcast URL, podcast RSS feed, bucket URL "
"(s3://, gs://, …), or a directory/glob (batch mode)",
),
sample: bool = typer.Option(False, "--sample", help="Use the hosted wildfires.mp3 sample"),
# batch mode
Expand Down Expand Up @@ -362,10 +366,11 @@ def transcribe(
URLs (any page yt-dlp can extract) are downloaded first, then transcribed.

Batch mode: pass a directory or glob (or pipe a list with --from-stdin) to
transcribe many sources concurrently. Each source gets a .aai.json sidecar
with the full result (including any --llm responses), and a re-run skips
sources already transcribed — with changed --llm prompts it replays just
the LLM step, never a second transcription.
transcribe many sources concurrently. A podcast RSS/Atom feed URL also expands
to batch mode — every episode enclosure becomes one source. Each source gets a
.aai.json sidecar with the full result (including any --llm responses), and a
re-run skips sources already transcribed — with changed --llm prompts it
replays just the LLM step, never a second transcription.

Bucket URLs (s3://, gs://, az://, sftp://, …) work for single files and for
batches (a glob, or a folder ending in /); install the matching fsspec
Expand Down
9 changes: 6 additions & 3 deletions aai_cli/skills/aai-cli/references/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@ Five commands. All accept `--json` (auto-enabled when piped); `transcribe`,
`transcribe`, `stream`, and `agent` accept `--show-code` to print equivalent
Python SDK code without calling the API.

## `assembly transcribe [SOURCE]` — file / URL / YouTube / podcast page
## `assembly transcribe [SOURCE]` — file / URL / YouTube / podcast page / RSS feed

`SOURCE` is a local file path, public URL, or a media-page URL yt-dlp can extract
(YouTube, Apple Podcasts, Spreaker, SoundCloud, …) — those are downloaded first.
Use `--sample` for the hosted `wildfires.mp3`. Analysis results (summary,
chapters, sentiment, …) render automatically in human mode.
A podcast RSS/Atom feed URL expands into a resumable batch run over every episode
enclosure (one `.aai.json` sidecar apiece). Use `--sample` for the hosted
`wildfires.mp3`. Analysis results (summary, chapters, sentiment, …) render
automatically in human mode.

High-value flags (run `assembly transcribe --help` for the full set):

Expand All @@ -37,6 +39,7 @@ assembly transcribe --sample
assembly transcribe call.mp3 --speaker-labels --speakers-expected 2 --redact-pii
assembly transcribe call.mp3 -o text
assembly transcribe call.mp3 --show-code
assembly transcribe "https://feeds.simplecast.com/54nAGcIl" # every episode in the feed
```

## `assembly stream [SOURCE]` — live real-time transcription
Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ dependencies = [
# imported lazily). fsspec core only — each protocol's backend (s3fs, gcsfs, adlfs,
# …) stays a user-installed extra surfaced via a clean install hint.
"fsspec>=2026.4.0",
# Podcast RSS/Atom feed parsing for `assembly transcribe <feed-url>` (feed.py,
# imported lazily). The de-facto standard feed parser; pure-Python, no compiled
# deps. We hand it already-fetched bytes (never a URL) so our bounded, safe
# httpx fetch stays the only network path.
"feedparser>=6.0.11",
]

[project.urls]
Expand Down
Loading
Loading