From 5af9dc08b35ab1714f571bd7850a061fb2186084 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 00:15:07 +0000 Subject: [PATCH 1/2] Refactor README in the style of leading CLI READMEs Restructure to combine patterns from the Codex, Gemini, and Antigravity CLI READMEs: an up-front Quickstart, a features-at-a-glance command table, GitHub [!NOTE]/[!WARNING] callouts, a collapsible system-deps block, Option 1/2 authentication with best-for guidance, a categorized Documentation section, and grouped legal links. https://claude.ai/code/session_01GAo4XYyoxt8LgcYQU3sN1d --- README.md | 121 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 88 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index a83cb5bb..9985193e 100644 --- a/README.md +++ b/README.md @@ -4,12 +4,32 @@ [![License](https://img.shields.io/badge/license-MIT-D6402E)](https://github.com/AssemblyAI/cli/blob/main/LICENSE) [![Docs](https://img.shields.io/badge/docs-assemblyai-D6402E)](https://www.assemblyai.com/docs) -The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps. +The AssemblyAI CLI (`assembly`) brings speech AI directly into your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.

The assembly CLI welcome screen, listing command groups for transcription, streaming, voice agents, app scaffolding, and account management

+Learn more about the platform in the [AssemblyAI docs](https://www.assemblyai.com/docs). + +## ⚡ Quickstart + +Install on macOS or Linux with Homebrew: + +```sh +brew tap assemblyai/cli https://github.com/AssemblyAI/cli +brew install assembly +``` + +Sign in (stores your API key in the OS keyring) and run your first transcription: + +```sh +assembly login +assembly transcribe --sample +``` + +That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-installation) for pipx/uv and other options. + ## 🚀 Why the AssemblyAI CLI? - **🎯 One command for everything**: transcription, real-time streaming, voice agents, LLM prompts, and WER benchmarking — no SDK boilerplate. @@ -19,9 +39,33 @@ The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe fi - **🤖 Agent-ready**: `assembly setup install` wires your coding agent up with the AssemblyAI docs MCP server and skills. - **📖 Open source**: MIT licensed. +## 📋 Features at a glance + +| Command | What it does | +| :--- | :--- | +| `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs | +| `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too | +| `assembly dictate` | Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) | +| `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal | +| `assembly llm` | Prompt the LLM Gateway over a transcript, stdin, or a live stream | +| `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range — clip boundaries snap into nearby silence | +| `assembly dub` | Re-voice an audio/video file in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) | +| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) | +| `assembly eval` | Benchmark WER against Hugging Face datasets (built-in aliases: `librispeech`, `tedlium`, …) or local manifests | +| `assembly init` / `dev` / `share` / `deploy` | Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io | +| `assembly webhooks listen` | Open a public dev URL that prints webhook deliveries and can forward them to your local app | +| `assembly setup` | Wire a coding agent up with the AssemblyAI docs MCP server and skills | +| `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` | Account self-service via browser login | +| `assembly doctor` | Check your environment: API key, network, ffmpeg, microphone | + +Add `--show-code` to `transcribe` / `stream` / `agent` to print the equivalent Python SDK script instead of running — the built-in path from CLI experiment to SDK code. + ## ✨ Things you can do with it -A few one-liners that show what `assembly` can do. These are the fun ones; the everyday basics live under **Quick examples** below. +A few one-liners that show what `assembly` can do. The everyday basics live under [Getting started](#-getting-started) below. + +> [!NOTE] +> `speak` and `dub` are sandbox-only today — that's why the examples below pass `--sandbox`. **Recreate a scene with synthetic voices** — transcribe and diarize a YouTube clip, then pipe it straight into TTS with a different voice per speaker: @@ -30,7 +74,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe | assembly --sandbox speak --voice A=jane --voice B=mary --out scene.wav ``` -`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices. (`speak` is sandbox-only today, hence `--sandbox`.) +`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices. **Dub a video into another language** — the whole platform in one command: transcription with utterance timestamps, per-utterance LLM translation, TTS for each line (one voice per speaker), and ffmpeg laying the new track over the original video: @@ -38,7 +82,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe assembly --sandbox dub talk.mp4 --lang de ``` -The video stream is copied untouched; each dubbed line lands at its original start time. (Sandbox-only, like `speak`.) +The video stream is copied untouched; each dubbed line lands at its original start time. **Turn a podcast into audio** — Apple and Spotify podcast pages work too (yt-dlp ingestion): @@ -115,7 +159,8 @@ assembly eval librispeech --speech-model universal-3-pro --limit 50 Requires Python 3.12+ (Homebrew brings its own; for pipx/uv see the `--python` hint below). -> ⚠️ The `assemblyai-cli` package on PyPI is **not** this project — install with one of the +> [!WARNING] +> The `assemblyai-cli` package on PyPI is **not** this project — install with one of the > commands below, not `pip install assemblyai-cli`. ### Homebrew (recommended — macOS / Linux) @@ -139,25 +184,39 @@ uv tool install "git+https://github.com/AssemblyAI/cli.git" If your default interpreter is older than Python 3.12, add `--python python3.12` (pipx) or `--python 3.12` (uv) to the install command. +
+System dependencies for the live-audio commands (pipx/uv installs only) + Only the live-audio commands need anything extra: `stream`, `dictate`, and `agent` use PortAudio for -microphone capture (Debian/Ubuntu: `sudo apt-get install libportaudio2`; Fedora: -`sudo dnf install portaudio`) and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream -non-WAV audio. Plain `transcribe` uploads your file directly and needs neither. +microphone capture and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream non-WAV audio. +Plain `transcribe` uploads your file directly and needs neither. + +- Debian/Ubuntu: `sudo apt-get install libportaudio2 ffmpeg` +- Fedora: `sudo dnf install portaudio ffmpeg` +- macOS (Homebrew): `brew install portaudio ffmpeg` + +
## 🔐 Authentication New to AssemblyAI? Create a free account at [assemblyai.com/dashboard](https://www.assemblyai.com/dashboard) to get an API key. -The easiest path is browser login, which stores your API key in the OS keyring -(Keychain / Credential Manager / Secret Service): +### Option 1: Browser login (recommended) + +**✨ Best for:** day-to-day use on your own machine. + +Browser login stores your API key in the OS keyring (Keychain / Credential Manager / Secret Service) — nothing lands in a dotfile, and it unlocks the account commands (`keys`, `balance`, `usage`, `limits`, `sessions`, `audit`): ```sh assembly login ``` -In CI — or anywhere a browser isn't an option — set the key as an environment variable -instead. It's checked before the keyring, and nothing is written to disk: +### Option 2: API key environment variable + +**✨ Best for:** CI, containers, and anywhere a browser isn't an option. + +The environment variable is checked before the keyring, and nothing is written to disk: ```sh export ASSEMBLYAI_API_KEY="YOUR_API_KEY" @@ -165,13 +224,15 @@ export ASSEMBLYAI_API_KEY="YOUR_API_KEY" ## 🚀 Getting started -For a guided tour — sign in, run a first transcription, start building: +### Guided tour + +Sign in, run a first transcription, start building: ```sh assembly onboard ``` -Or jump straight in: +### Basic usage ```sh assembly transcribe --sample # transcribe the hosted sample file @@ -182,22 +243,6 @@ assembly agent # talk to a voice agent (use headphones) assembly init # scaffold a starter app ``` -## 📋 Key features - -- **Transcription**: `assembly transcribe` handles files, URLs, and YouTube/podcast pages, with flags for speaker labels, PII redaction, summarization, sentiment, chapters, and more. -- **Batch transcription**: point `assembly transcribe` at a directory or glob — local, or in bucket storage (`"s3://bucket/calls/*.mp3"`, `gs://`, `az://`, …, with the matching fsspec backend such as `s3fs` installed) — or pipe paths with `--from-stdin` to transcribe everything concurrently, with sidecar files that make re-runs resumable. Add `--llm "prompt"` to run an LLM prompt over each finished transcript, saved into the sidecars. -- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too. -- **Dictation**: `assembly dictate` is push-to-talk for your terminal — press Enter to record, Enter again to get the utterance back instantly from the Sync API (up to 120 s per utterance). -- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal. -- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`). -- **Transcript-driven clipping**: `assembly clip` cuts an audio/video file (or a YouTube/podcast URL) with ffmpeg by diarized speaker (`--speaker A`), text match (`--search "pricing"`), LLM pick (`--llm "the three best moments"`), or explicit time range (`--range 1:30-2:45`) — transcribing on the fly, reusing a finished transcript with `-t ID`, or reading one from a pipe (`assembly transcribe x.mp4 --speaker-labels --json | assembly clip x.mp4 -t - --llm "…"`). Clip boundaries snap into nearby silence (ffmpeg `silencedetect`) so cuts don't land mid-word; `--no-snap` cuts at the exact selected times. -- **Dubbing**: `assembly dub` re-voices an audio/video file in another language (`assembly --sandbox dub talk.mp4 --lang de`): diarized transcription, per-utterance LLM translation, streaming TTS per speaker, and an ffmpeg track-swap that leaves the video untouched. Sandbox-only today, like `speak`. -- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model. -- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`); `assembly dev` runs it, `assembly share` exposes it on a public URL, and `assembly deploy` ships it to Vercel, Railway, or Fly.io. -- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`. -- **Code generation**: add `--show-code` to `transcribe`/`stream`/`agent` to print the equivalent Python SDK script instead of running. -- **Account self-service**: `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` via browser login. - ### Quick examples Pull exactly the output you need: @@ -231,10 +276,18 @@ assembly transcribe --sample --speaker-labels --show-code ## 📚 Documentation +### In the terminal + - Run `assembly --help` or `assembly --help` for flags and examples. - Run `assembly doctor` to check your environment (API key, network, ffmpeg, microphone). -- [AssemblyAI docs](https://www.assemblyai.com/docs) -- [API reference](https://www.assemblyai.com/docs/api-reference) +- Run `assembly onboard` for the guided tour. + +### Resources + +- [**AssemblyAI docs**](https://www.assemblyai.com/docs) — guides for every model and feature. +- [**API reference**](https://www.assemblyai.com/docs/api-reference) — the REST and streaming APIs the CLI drives. +- [**Dashboard**](https://www.assemblyai.com/dashboard) — manage your account and API keys. +- [**AGENTS.md**](AGENTS.md) — development conventions and architecture notes for contributors. ## 🤝 Contributing @@ -250,4 +303,6 @@ See [AGENTS.md](AGENTS.md) for development conventions and architecture notes. ## 📄 Legal -Released under the [MIT license](LICENSE). +- **License**: released under the [MIT license](LICENSE). +- **Privacy**: [AssemblyAI privacy policy](https://www.assemblyai.com/legal/privacy-policy) — the CLI's anonymous usage telemetry is opt-out (`assembly telemetry disable`, `AAI_TELEMETRY_DISABLED=1`, or `DO_NOT_TRACK=1`). +- **Terms**: [AssemblyAI terms of service](https://www.assemblyai.com/legal/terms-of-service). From 753e0be81cb33d77deba87f276079b9d5f6e8593 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 00:21:30 +0000 Subject: [PATCH 2/2] Second pass over the README Complete the features table against the real command surface (add caption and transcripts/sessions, move sessions out of the account row, reorder rows to match the CLI's own help grouping), point the karaoke example at assembly caption, mirror the brew trust line in Quickstart so either install block copy-pastes safely, and dedupe repeated onboard, AGENTS.md, and --show-code mentions. https://claude.ai/code/session_01GAo4XYyoxt8LgcYQU3sN1d --- README.md | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 9985193e..31741e48 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Install on macOS or Linux with Homebrew: ```sh brew tap assemblyai/cli https://github.com/AssemblyAI/cli +brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set; harmless otherwise brew install assembly ``` @@ -47,16 +48,18 @@ That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-ins | `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too | | `assembly dictate` | Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) | | `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal | +| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) | | `assembly llm` | Prompt the LLM Gateway over a transcript, stdin, or a live stream | | `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range — clip boundaries snap into nearby silence | | `assembly dub` | Re-voice an audio/video file in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) | -| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) | +| `assembly caption` | Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched | | `assembly eval` | Benchmark WER against Hugging Face datasets (built-in aliases: `librispeech`, `tedlium`, …) or local manifests | -| `assembly init` / `dev` / `share` / `deploy` | Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io | | `assembly webhooks listen` | Open a public dev URL that prints webhook deliveries and can forward them to your local app | +| `assembly init` / `dev` / `share` / `deploy` | Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io | | `assembly setup` | Wire a coding agent up with the AssemblyAI docs MCP server and skills | -| `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` | Account self-service via browser login | | `assembly doctor` | Check your environment: API key, network, ffmpeg, microphone | +| `assembly transcripts` / `sessions` | Browse and fetch past transcripts and streaming sessions | +| `assembly keys` / `balance` / `usage` / `limits` / `audit` | Account self-service via browser login | Add `--show-code` to `transcribe` / `stream` / `agent` to print the equivalent Python SDK script instead of running — the built-in path from CLI experiment to SDK code. @@ -106,6 +109,8 @@ assembly transcribe video.mp4 -o srt --chars-per-caption 24 > lyrics.srt ffmpeg -i video.mp4 -vf "subtitles=lyrics.srt:force_style='Fontsize=28,PrimaryColour=&H00FFFF&'" karaoke.mp4 ``` +Prefer one step over styling control? `assembly caption video.mp4` transcribes and burns the captions in for you. + **Keep a live to-do list from your mic** — `llm -f` re-runs the prompt over the growing transcript, updating in place: ```sh @@ -268,26 +273,18 @@ ffmpeg -i talk.mp4 -f wav - | assembly transcribe - git log --oneline -30 | assembly llm "write release notes grouped by feature/fix" ``` -Graduate to the SDK — print the equivalent Python script instead of running: - -```sh -assembly transcribe --sample --speaker-labels --show-code -``` - ## 📚 Documentation ### In the terminal - Run `assembly --help` or `assembly --help` for flags and examples. - Run `assembly doctor` to check your environment (API key, network, ffmpeg, microphone). -- Run `assembly onboard` for the guided tour. ### Resources - [**AssemblyAI docs**](https://www.assemblyai.com/docs) — guides for every model and feature. - [**API reference**](https://www.assemblyai.com/docs/api-reference) — the REST and streaming APIs the CLI drives. - [**Dashboard**](https://www.assemblyai.com/dashboard) — manage your account and API keys. -- [**AGENTS.md**](AGENTS.md) — development conventions and architecture notes for contributors. ## 🤝 Contributing