Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 44 additions & 112 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
[![License](https://img.shields.io/badge/license-MIT-D6402E)](https://github.com/AssemblyAI/cli/blob/main/LICENSE)
[![Docs](https://img.shields.io/badge/docs-assemblyai-D6402E)](https://www.assemblyai.com/docs)

The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe files, stream live audio, run a two-way voice agent, prompt the LLM Gateway, and scaffold ready-to-deploy starter apps.
The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.

## 🚀 Why the AssemblyAI CLI?

- **🎯 Everything in one command**: transcription, real-time streaming, voice agents, and LLM prompts — no SDK boilerplate.
- **🔌 Pipeline-friendly**: data goes to stdout, errors to stderr, `--json` for stable machine-readable output, `-` reads audio from stdin.
- **🔐 Secure by default**: your API key lives in the OS keyring, never in a dotfile, and run commands have no `--api-key` flag so keys can't leak into shell history.
- **🛠️ From demo to app**: `assembly init` scaffolds a runnable FastAPI starter app, and `--show-code` prints the equivalent Python SDK script for any command.
- **🎯 One command for everything**: transcription, real-time streaming, voice agents, LLM prompts, and WER benchmarking — no SDK boilerplate.
- **🔌 Built for pipelines**: data goes to stdout, errors to stderr, `--json` gives stable machine-readable output, and `-` reads audio from stdin.
- **🔐 Secure by default**: your API key lives in the OS keyring, never in a dotfileand run commands have no `--api-key` flag, so keys can't leak into `ps` or shell history.
- **🛠️ From demo to deployed app**: `assembly init` scaffolds a runnable FastAPI starter, `assembly dev` / `share` / `deploy` run, tunnel, and ship it, and `--show-code` prints the equivalent Python SDK script for any run command.
- **🤖 Agent-ready**: `assembly setup install` wires your coding agent up with the AssemblyAI docs MCP server and skills.
- **📖 Open source**: MIT licensed.

Expand All @@ -30,156 +30,100 @@ brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set
brew install assembly
```

Homebrew pulls in `ffmpeg` and `portaudio`, so `stream` and `agent` work out of the box.
Plain `transcribe` uploads your file directly and needs neither.
Homebrew pulls in `ffmpeg` and `portaudio`, so every command works out of the box.

### pipx / uv

With pipx:

```sh
pipx install "git+https://github.com/AssemblyAI/cli.git"
```

Or with uv:

```sh
# or
uv tool install "git+https://github.com/AssemblyAI/cli.git"
```

If your default interpreter is older than Python 3.12, add `--python python3.12` (pipx) or
`--python 3.12` (uv) to the install command.

Only `stream` and `agent` need extras: on Linux, install PortAudio once for microphone support
(Debian/Ubuntu: `sudo apt-get install libportaudio2`; Fedora: `sudo dnf install portaudio`), and
have [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream non-WAV audio. Plain `transcribe` needs
neither.

## 📋 Key Features

- **Transcription**: `assembly transcribe` handles files, URLs, and YouTube/podcast pages, with flags for speaker labels, PII redaction, summarization, sentiment, chapters, and more.
- **Batch transcription**: point `assembly transcribe` at a directory or glob (or pipe paths with `--from-stdin`) to transcribe everything concurrently, with sidecar files that make re-runs resumable. Add `--llm "prompt"` to run an LLM prompt over each finished transcript, saved into the sidecars.
- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too.
- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal (use headphones).
- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`).
- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model.
- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`).
- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`.
- **Code generation**: add `--show-code` to `transcribe`/`stream`/`agent` to print the equivalent Python SDK script instead of running.
- **Account self-service**: `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` via browser login.
Only the live-audio commands need anything extra: `stream` and `agent` use PortAudio for
microphone capture (Debian/Ubuntu: `sudo apt-get install libportaudio2`; Fedora:
`sudo dnf install portaudio`) and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream
non-WAV audio. Plain `transcribe` uploads your file directly and needs neither.

## 🔐 Authentication

New to AssemblyAI? Create a free account at
[assemblyai.com/dashboard](https://www.assemblyai.com/dashboard) to get an API key.

### Option 1: Browser login (recommended)
The easiest path is browser login, which stores your API key in the OS keyring
(Keychain / Credential Manager / Secret Service):

```sh
assembly login
```

Stores your API key in the OS keyring (Keychain / Credential Manager / Secret Service).

### Option 2: Environment variable
In CI — or anywhere a browser isn't an option — set the key as an environment variable
instead. It's checked before the keyring, and nothing is written to disk:

```sh
export ASSEMBLYAI_API_KEY="YOUR_API_KEY"
```

Checked before the keyring, so nothing is written to disk — ideal for CI (set it as a masked secret).

## 🚀 Getting Started

### Basic usage
## 🚀 Getting started

Guided setup: sign in, first transcription, start building:
For a guided tour — sign in, run a first transcription, start building:

```sh
assembly onboard
```

Transcribe the hosted sample:
Or jump straight in:

```sh
assembly transcribe --sample
assembly transcribe --sample # transcribe the hosted sample file
assembly transcribe call.mp3 # then your own audio
assembly stream --sample # live streaming, no microphone needed
assembly stream # stream your microphone (Ctrl-C to stop)
assembly agent # talk to a voice agent (use headphones)
assembly init # scaffold a starter app
```

Then your own audio:
## 📋 Key features

```sh
assembly transcribe call.mp3
```

Stream the hosted sample live (no microphone needed):

```sh
assembly stream --sample
```

Or stream your microphone (Ctrl-C to stop):

```sh
assembly stream
```

Talk to a voice agent:

```sh
assembly agent
```

Scaffold a starter app:

```sh
assembly init
```
- **Transcription**: `assembly transcribe` handles files, URLs, and YouTube/podcast pages, with flags for speaker labels, PII redaction, summarization, sentiment, chapters, and more.
- **Batch transcription**: point `assembly transcribe` at a directory or glob (or pipe paths with `--from-stdin`) to transcribe everything concurrently, with sidecar files that make re-runs resumable. Add `--llm "prompt"` to run an LLM prompt over each finished transcript, saved into the sidecars.
- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too.
- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal.
- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`).
- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model.
- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`); `assembly dev` runs it, `assembly share` exposes it on a public URL, and `assembly deploy` ships it to Vercel, Railway, or Fly.io.
- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`.
- **Code generation**: add `--show-code` to `transcribe`/`stream`/`agent` to print the equivalent Python SDK script instead of running.
- **Account self-service**: `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` via browser login.

### Quick examples

Just the text:

```sh
assembly transcribe call.mp3 -o text
```

Or captions:

```sh
assembly transcribe video.mp4 -o srt
```

Speaker labels + summary, as JSON:
Pull exactly the output you need:

```sh
assembly transcribe call.mp3 -o text # just the text
assembly transcribe video.mp4 -o srt # captions
assembly transcribe call.mp3 --speaker-labels --summarization --json
```

Batch: a whole directory or glob, resumable on re-run:
Transcribe in batches — a directory, a glob, or a piped list, resumable on re-run:

```sh
assembly transcribe ./recordings
```

Or pipe paths in:

```sh
find . -name "*.wav" | assembly transcribe --from-stdin
```

Pipe audio in, pipe text out:
Compose with other tools — audio in, text out:

```sh
ffmpeg -i talk.mp4 -f wav - | assembly transcribe -
```

Prompt the LLM Gateway over any text:

```sh
git log --oneline -30 | assembly llm "write release notes grouped by feature/fix"
```

Print the equivalent Python SDK script instead of running:
Graduate to the SDK — print the equivalent Python script instead of running:

```sh
assembly transcribe --sample --speaker-labels --show-code
Expand All @@ -196,22 +140,10 @@ assembly transcribe --sample --speaker-labels --show-code

This project uses [uv](https://docs.astral.sh/uv/):

Create/refresh the venv:

```sh
uv sync
```

Run the CLI from the locked environment:

```sh
uv run assembly --help
```

Run the full gate CI runs:

```sh
./scripts/check.sh
uv sync # create/refresh the venv
uv run assembly --help # run the CLI from the locked environment
./scripts/check.sh # the full gate CI runs
```

See [AGENTS.md](AGENTS.md) for development conventions and architecture notes.
Expand Down
Loading