diff --git a/README.md b/README.md
index ab7aaba0..5080dd62 100644
--- a/README.md
+++ b/README.md
@@ -4,12 +4,33 @@
[](https://github.com/AssemblyAI/cli/blob/main/LICENSE)
[](https://www.assemblyai.com/docs)
-The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.
+The AssemblyAI CLI (`assembly`) brings speech AI directly into your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.
+Learn more about the platform in the [AssemblyAI docs](https://www.assemblyai.com/docs).
+
+## ⚡ Quickstart
+
+Install on macOS or Linux with Homebrew:
+
+```sh
+brew tap assemblyai/cli https://github.com/AssemblyAI/cli
+brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set; harmless otherwise
+brew install assembly
+```
+
+Sign in (stores your API key in the OS keyring) and run your first transcription:
+
+```sh
+assembly login
+assembly transcribe --sample
+```
+
+That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-installation) for pipx/uv and other options.
+
## 🚀 Why the AssemblyAI CLI?
- **🎯 One command for everything**: transcription, real-time streaming, voice agents, LLM prompts, and WER benchmarking — no SDK boilerplate.
@@ -19,9 +40,35 @@ The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe fi
- **🤖 Agent-ready**: `assembly setup install` wires your coding agent up with the AssemblyAI docs MCP server and skills.
- **📖 Open source**: MIT licensed.
+## 📋 Features at a glance
+
+| Command | What it does |
+| :--- | :--- |
+| `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs |
+| `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too |
+| `assembly dictate` | Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) |
+| `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal |
+| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
+| `assembly llm` | Prompt the LLM Gateway over a transcript, stdin, or a live stream |
+| `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range (`--video` keeps the picture for URL sources) — clip boundaries snap into nearby silence |
+| `assembly dub` | Re-voice an audio/video file or URL in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) |
+| `assembly caption` | Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched |
+| `assembly eval` | Benchmark WER against Hugging Face datasets (built-in aliases: `librispeech`, `tedlium`, …) or local manifests |
+| `assembly webhooks listen` | Open a public dev URL that prints webhook deliveries and can forward them to your local app |
+| `assembly init` / `dev` / `share` / `deploy` | Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io |
+| `assembly setup` | Wire a coding agent up with the AssemblyAI docs MCP server and skills |
+| `assembly doctor` | Check your environment: API key, network, ffmpeg, microphone |
+| `assembly transcripts` / `sessions` | Browse and fetch past transcripts and streaming sessions |
+| `assembly keys` / `balance` / `usage` / `limits` / `audit` | Account self-service via browser login |
+
+Add `--show-code` to `transcribe` / `stream` / `agent` to print the equivalent Python SDK script instead of running — the built-in path from CLI experiment to SDK code.
+
## ✨ Things you can do with it
-A few one-liners that show what `assembly` can do. These are the fun ones; the everyday basics live under **Quick examples** below.
+A few one-liners that show what `assembly` can do. The everyday basics live under [Getting started](#-getting-started) below.
+
+> [!NOTE]
+> `speak` and `dub` are sandbox-only today — that's why the examples below pass `--sandbox`.
**Recreate a scene with synthetic voices** — transcribe and diarize a YouTube clip, then pipe it straight into TTS with a different voice per speaker:
@@ -30,7 +77,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe
| assembly --sandbox speak --voice A=jane --voice B=mary --out scene.wav
```
-`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices. (`speak` is sandbox-only today, hence `--sandbox`.)
+`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices.
**Dub a video into another language** — the whole platform in one command: transcription with utterance timestamps, per-utterance LLM translation, TTS for each line (one voice per speaker), and ffmpeg laying the new track over the original video. A great demo is the first YouTube video ever, "Me at the zoo" — it's 19 seconds long, a single clear English speaker, and instantly recognizable, so the dub finishes fast and the before/after is obvious:
@@ -38,7 +85,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe
assembly --sandbox dub "https://www.youtube.com/watch?v=jNQXAC9IVRw" -l de --video
```
-The video stream is copied untouched; each dubbed line lands at its original start time. (Sandbox-only, like `speak`.)
+The video stream is copied untouched; each dubbed line lands at its original start time.
**Turn a podcast into audio** — Apple and Spotify podcast pages work too (yt-dlp ingestion):
@@ -114,7 +161,8 @@ assembly eval librispeech --speech-model universal-3-pro --limit 50
Requires Python 3.12+ (Homebrew brings its own; for pipx/uv see the `--python` hint below).
-> ⚠️ The `assemblyai-cli` package on PyPI is **not** this project — install with one of the
+> [!WARNING]
+> The `assemblyai-cli` package on PyPI is **not** this project — install with one of the
> commands below, not `pip install assemblyai-cli`.
### Homebrew (recommended — macOS / Linux)
@@ -138,25 +186,39 @@ uv tool install "git+https://github.com/AssemblyAI/cli.git"
If your default interpreter is older than Python 3.12, add `--python python3.12` (pipx) or
`--python 3.12` (uv) to the install command.
+
+System dependencies for the live-audio commands (pipx/uv installs only)
+
Only the live-audio commands need anything extra: `stream`, `dictate`, and `agent` use PortAudio for
-microphone capture (Debian/Ubuntu: `sudo apt-get install libportaudio2`; Fedora:
-`sudo dnf install portaudio`) and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream
-non-WAV audio. Plain `transcribe` uploads your file directly and needs neither.
+microphone capture and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream non-WAV audio.
+Plain `transcribe` uploads your file directly and needs neither.
+
+- Debian/Ubuntu: `sudo apt-get install libportaudio2 ffmpeg`
+- Fedora: `sudo dnf install portaudio ffmpeg`
+- macOS (Homebrew): `brew install portaudio ffmpeg`
+
+
## 🔐 Authentication
New to AssemblyAI? Create a free account at
[assemblyai.com/dashboard](https://www.assemblyai.com/dashboard) to get an API key.
-The easiest path is browser login, which stores your API key in the OS keyring
-(Keychain / Credential Manager / Secret Service):
+### Option 1: Browser login (recommended)
+
+**✨ Best for:** day-to-day use on your own machine.
+
+Browser login stores your API key in the OS keyring (Keychain / Credential Manager / Secret Service) — nothing lands in a dotfile, and it unlocks the account commands (`keys`, `balance`, `usage`, `limits`, `sessions`, `audit`):
```sh
assembly login
```
-In CI — or anywhere a browser isn't an option — set the key as an environment variable
-instead. It's checked before the keyring, and nothing is written to disk:
+### Option 2: API key environment variable
+
+**✨ Best for:** CI, containers, and anywhere a browser isn't an option.
+
+The environment variable is checked before the keyring, and nothing is written to disk:
```sh
export ASSEMBLYAI_API_KEY="YOUR_API_KEY"
@@ -164,13 +226,15 @@ export ASSEMBLYAI_API_KEY="YOUR_API_KEY"
## 🚀 Getting started
-For a guided tour — sign in, run a first transcription, start building:
+### Guided tour
+
+Sign in, run a first transcription, start building:
```sh
assembly onboard
```
-Or jump straight in:
+### Basic usage
```sh
assembly transcribe --sample # transcribe the hosted sample file
@@ -181,23 +245,6 @@ assembly agent # talk to a voice agent (use headphones)
assembly init # scaffold a starter app
```
-## 📋 Key features
-
-- **Transcription**: `assembly transcribe` handles files, URLs, and YouTube/podcast pages, with flags for speaker labels, PII redaction, summarization, sentiment, chapters, and more.
-- **Batch transcription**: point `assembly transcribe` at a directory or glob — local, or in bucket storage (`"s3://bucket/calls/*.mp3"`, `gs://`, `az://`, …, with the matching fsspec backend such as `s3fs` installed) — or pipe paths with `--from-stdin` to transcribe everything concurrently, with sidecar files that make re-runs resumable. Add `--llm "prompt"` to run an LLM prompt over each finished transcript, saved into the sidecars.
-- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too.
-- **Dictation**: `assembly dictate` is push-to-talk for your terminal — press Enter to record, Enter again to get the utterance back instantly from the Sync API (up to 120 s per utterance).
-- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal.
-- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`).
-- **Transcript-driven clipping**: `assembly clip` cuts an audio/video file (or a YouTube/podcast URL — add `--video` to download the full video so the clips keep the picture) with ffmpeg by diarized speaker (`--speaker A`), text match (`--search "pricing"`), LLM pick (`--llm "the three best moments"`), or explicit time range (`--range 1:30-2:45`) — transcribing on the fly, reusing a finished transcript with `-t ID`, or reading one from a pipe (`assembly transcribe x.mp4 --speaker-labels --json | assembly clip x.mp4 -t - --llm "…"`). Clip boundaries snap into nearby silence (ffmpeg `silencedetect`) so cuts don't land mid-word; `--no-snap` cuts at the exact selected times.
-- **Dubbing**: `assembly dub` re-voices an audio/video file or URL in another language (`assembly --sandbox dub talk.mp4 -l de`): diarized transcription, per-utterance LLM translation, streaming TTS per speaker, and an ffmpeg track-swap that leaves the video untouched. Sandbox-only today, like `speak`.
-- **Captioning**: `assembly caption` burns always-visible captions into a video (`assembly caption talk.mp4`) — it transcribes the file (or reuses a transcript with `-t ID`), fetches the SRT export, and renders it into the picture with ffmpeg, leaving the audio untouched; `--chars-per-caption` and `--font-size` shape the captions.
-- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model.
-- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`); `assembly dev` runs it, `assembly share` exposes it on a public URL, and `assembly deploy` ships it to Vercel, Railway, or Fly.io.
-- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`.
-- **Code generation**: add `--show-code` to `transcribe`/`stream`/`agent` to print the equivalent Python SDK script instead of running.
-- **Account self-service**: `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` via browser login.
-
### Quick examples
Pull exactly the output you need:
@@ -223,18 +270,18 @@ ffmpeg -i talk.mp4 -f wav - | assembly transcribe -
git log --oneline -30 | assembly llm "write release notes grouped by feature/fix"
```
-Graduate to the SDK — print the equivalent Python script instead of running:
-
-```sh
-assembly transcribe --sample --speaker-labels --show-code
-```
-
## 📚 Documentation
+### In the terminal
+
- Run `assembly --help` or `assembly --help` for flags and examples.
- Run `assembly doctor` to check your environment (API key, network, ffmpeg, microphone).
-- [AssemblyAI docs](https://www.assemblyai.com/docs)
-- [API reference](https://www.assemblyai.com/docs/api-reference)
+
+### Resources
+
+- [**AssemblyAI docs**](https://www.assemblyai.com/docs) — guides for every model and feature.
+- [**API reference**](https://www.assemblyai.com/docs/api-reference) — the REST and streaming APIs the CLI drives.
+- [**Dashboard**](https://www.assemblyai.com/dashboard) — manage your account and API keys.
## 🤝 Contributing
@@ -250,4 +297,6 @@ See [AGENTS.md](AGENTS.md) for development conventions and architecture notes.
## 📄 Legal
-Released under the [MIT license](LICENSE).
+- **License**: released under the [MIT license](LICENSE).
+- **Privacy**: [AssemblyAI privacy policy](https://www.assemblyai.com/legal/privacy-policy) — the CLI's anonymous usage telemetry is opt-out (`assembly telemetry disable`, `AAI_TELEMETRY_DISABLED=1`, or `DO_NOT_TRACK=1`).
+- **Terms**: [AssemblyAI terms of service](https://www.assemblyai.com/legal/terms-of-service).