diff --git a/README.md b/README.md index ab7aaba0..5080dd62 100644 --- a/README.md +++ b/README.md @@ -4,12 +4,33 @@ [![License](https://img.shields.io/badge/license-MIT-D6402E)](https://github.com/AssemblyAI/cli/blob/main/LICENSE) [![Docs](https://img.shields.io/badge/docs-assemblyai-D6402E)](https://www.assemblyai.com/docs) -The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps. +The AssemblyAI CLI (`assembly`) brings speech AI directly into your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.

The assembly CLI welcome screen, listing command groups for transcription, streaming, voice agents, app scaffolding, and account management

+Learn more about the platform in the [AssemblyAI docs](https://www.assemblyai.com/docs). + +## ⚡ Quickstart + +Install on macOS or Linux with Homebrew: + +```sh +brew tap assemblyai/cli https://github.com/AssemblyAI/cli +brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set; harmless otherwise +brew install assembly +``` + +Sign in (stores your API key in the OS keyring) and run your first transcription: + +```sh +assembly login +assembly transcribe --sample +``` + +That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-installation) for pipx/uv and other options. + ## 🚀 Why the AssemblyAI CLI? - **🎯 One command for everything**: transcription, real-time streaming, voice agents, LLM prompts, and WER benchmarking — no SDK boilerplate. @@ -19,9 +40,35 @@ The AssemblyAI CLI (`assembly`) brings speech AI to your terminal: transcribe fi - **🤖 Agent-ready**: `assembly setup install` wires your coding agent up with the AssemblyAI docs MCP server and skills. - **📖 Open source**: MIT licensed. +## 📋 Features at a glance + +| Command | What it does | +| :--- | :--- | +| `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs | +| `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too | +| `assembly dictate` | Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) | +| `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal | +| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) | +| `assembly llm` | Prompt the LLM Gateway over a transcript, stdin, or a live stream | +| `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range (`--video` keeps the picture for URL sources) — clip boundaries snap into nearby silence | +| `assembly dub` | Re-voice an audio/video file or URL in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) | +| `assembly caption` | Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched | +| `assembly eval` | Benchmark WER against Hugging Face datasets (built-in aliases: `librispeech`, `tedlium`, …) or local manifests | +| `assembly webhooks listen` | Open a public dev URL that prints webhook deliveries and can forward them to your local app | +| `assembly init` / `dev` / `share` / `deploy` | Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io | +| `assembly setup` | Wire a coding agent up with the AssemblyAI docs MCP server and skills | +| `assembly doctor` | Check your environment: API key, network, ffmpeg, microphone | +| `assembly transcripts` / `sessions` | Browse and fetch past transcripts and streaming sessions | +| `assembly keys` / `balance` / `usage` / `limits` / `audit` | Account self-service via browser login | + +Add `--show-code` to `transcribe` / `stream` / `agent` to print the equivalent Python SDK script instead of running — the built-in path from CLI experiment to SDK code. + ## ✨ Things you can do with it -A few one-liners that show what `assembly` can do. These are the fun ones; the everyday basics live under **Quick examples** below. +A few one-liners that show what `assembly` can do. The everyday basics live under [Getting started](#-getting-started) below. + +> [!NOTE] +> `speak` and `dub` are sandbox-only today — that's why the examples below pass `--sandbox`. **Recreate a scene with synthetic voices** — transcribe and diarize a YouTube clip, then pipe it straight into TTS with a different voice per speaker: @@ -30,7 +77,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe | assembly --sandbox speak --voice A=jane --voice B=mary --out scene.wav ``` -`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices. (`speak` is sandbox-only today, hence `--sandbox`.) +`speak` auto-detects `Speaker A:` labels, merges each speaker's turns, and rotates voices. **Dub a video into another language** — the whole platform in one command: transcription with utterance timestamps, per-utterance LLM translation, TTS for each line (one voice per speaker), and ffmpeg laying the new track over the original video. A great demo is the first YouTube video ever, "Me at the zoo" — it's 19 seconds long, a single clear English speaker, and instantly recognizable, so the dub finishes fast and the before/after is obvious: @@ -38,7 +85,7 @@ assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labe assembly --sandbox dub "https://www.youtube.com/watch?v=jNQXAC9IVRw" -l de --video ``` -The video stream is copied untouched; each dubbed line lands at its original start time. (Sandbox-only, like `speak`.) +The video stream is copied untouched; each dubbed line lands at its original start time. **Turn a podcast into audio** — Apple and Spotify podcast pages work too (yt-dlp ingestion): @@ -114,7 +161,8 @@ assembly eval librispeech --speech-model universal-3-pro --limit 50 Requires Python 3.12+ (Homebrew brings its own; for pipx/uv see the `--python` hint below). -> ⚠️ The `assemblyai-cli` package on PyPI is **not** this project — install with one of the +> [!WARNING] +> The `assemblyai-cli` package on PyPI is **not** this project — install with one of the > commands below, not `pip install assemblyai-cli`. ### Homebrew (recommended — macOS / Linux) @@ -138,25 +186,39 @@ uv tool install "git+https://github.com/AssemblyAI/cli.git" If your default interpreter is older than Python 3.12, add `--python python3.12` (pipx) or `--python 3.12` (uv) to the install command. +
+System dependencies for the live-audio commands (pipx/uv installs only) + Only the live-audio commands need anything extra: `stream`, `dictate`, and `agent` use PortAudio for -microphone capture (Debian/Ubuntu: `sudo apt-get install libportaudio2`; Fedora: -`sudo dnf install portaudio`) and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream -non-WAV audio. Plain `transcribe` uploads your file directly and needs neither. +microphone capture and [`ffmpeg`](https://ffmpeg.org) on `PATH` to stream non-WAV audio. +Plain `transcribe` uploads your file directly and needs neither. + +- Debian/Ubuntu: `sudo apt-get install libportaudio2 ffmpeg` +- Fedora: `sudo dnf install portaudio ffmpeg` +- macOS (Homebrew): `brew install portaudio ffmpeg` + +
## 🔐 Authentication New to AssemblyAI? Create a free account at [assemblyai.com/dashboard](https://www.assemblyai.com/dashboard) to get an API key. -The easiest path is browser login, which stores your API key in the OS keyring -(Keychain / Credential Manager / Secret Service): +### Option 1: Browser login (recommended) + +**✨ Best for:** day-to-day use on your own machine. + +Browser login stores your API key in the OS keyring (Keychain / Credential Manager / Secret Service) — nothing lands in a dotfile, and it unlocks the account commands (`keys`, `balance`, `usage`, `limits`, `sessions`, `audit`): ```sh assembly login ``` -In CI — or anywhere a browser isn't an option — set the key as an environment variable -instead. It's checked before the keyring, and nothing is written to disk: +### Option 2: API key environment variable + +**✨ Best for:** CI, containers, and anywhere a browser isn't an option. + +The environment variable is checked before the keyring, and nothing is written to disk: ```sh export ASSEMBLYAI_API_KEY="YOUR_API_KEY" @@ -164,13 +226,15 @@ export ASSEMBLYAI_API_KEY="YOUR_API_KEY" ## 🚀 Getting started -For a guided tour — sign in, run a first transcription, start building: +### Guided tour + +Sign in, run a first transcription, start building: ```sh assembly onboard ``` -Or jump straight in: +### Basic usage ```sh assembly transcribe --sample # transcribe the hosted sample file @@ -181,23 +245,6 @@ assembly agent # talk to a voice agent (use headphones) assembly init # scaffold a starter app ``` -## 📋 Key features - -- **Transcription**: `assembly transcribe` handles files, URLs, and YouTube/podcast pages, with flags for speaker labels, PII redaction, summarization, sentiment, chapters, and more. -- **Batch transcription**: point `assembly transcribe` at a directory or glob — local, or in bucket storage (`"s3://bucket/calls/*.mp3"`, `gs://`, `az://`, …, with the matching fsspec backend such as `s3fs` installed) — or pipe paths with `--from-stdin` to transcribe everything concurrently, with sidecar files that make re-runs resumable. Add `--llm "prompt"` to run an LLM prompt over each finished transcript, saved into the sidecars. -- **Real-time streaming**: `assembly stream` transcribes the microphone, a file, or a URL live — on macOS it can capture system audio too. -- **Dictation**: `assembly dictate` is push-to-talk for your terminal — press Enter to record, Enter again to get the utterance back instantly from the Sync API (up to 120 s per utterance). -- **Voice agent**: `assembly agent` runs a full-duplex spoken conversation in your terminal. -- **LLM Gateway**: `assembly llm` prompts an LLM over a transcript, stdin, or a live stream (`assembly stream --llm "summarize as I talk"`). -- **Transcript-driven clipping**: `assembly clip` cuts an audio/video file (or a YouTube/podcast URL — add `--video` to download the full video so the clips keep the picture) with ffmpeg by diarized speaker (`--speaker A`), text match (`--search "pricing"`), LLM pick (`--llm "the three best moments"`), or explicit time range (`--range 1:30-2:45`) — transcribing on the fly, reusing a finished transcript with `-t ID`, or reading one from a pipe (`assembly transcribe x.mp4 --speaker-labels --json | assembly clip x.mp4 -t - --llm "…"`). Clip boundaries snap into nearby silence (ffmpeg `silencedetect`) so cuts don't land mid-word; `--no-snap` cuts at the exact selected times. -- **Dubbing**: `assembly dub` re-voices an audio/video file or URL in another language (`assembly --sandbox dub talk.mp4 -l de`): diarized transcription, per-utterance LLM translation, streaming TTS per speaker, and an ffmpeg track-swap that leaves the video untouched. Sandbox-only today, like `speak`. -- **Captioning**: `assembly caption` burns always-visible captions into a video (`assembly caption talk.mp4`) — it transcribes the file (or reuses a transcript with `-t ID`), fetches the SRT export, and renders it into the picture with ffmpeg, leaving the audio untouched; `--chars-per-caption` and `--font-size` shape the captions. -- **Model evaluation**: `assembly eval` transcribes a Hugging Face dataset (with built-in aliases for common benchmarks: `assembly eval tedlium`) or a local `.csv`/`.jsonl` manifest and scores WER against its references — handy for picking a speech model. -- **Starter apps**: `assembly init` scaffolds a self-contained FastAPI + HTML app (`audio-transcription`, `live-captions`, `voice-agent`); `assembly dev` runs it, `assembly share` exposes it on a public URL, and `assembly deploy` ships it to Vercel, Railway, or Fly.io. -- **Webhook testing**: `assembly webhooks listen` opens a public dev URL (cloudflared quick tunnel) that prints webhook deliveries as they arrive and can forward them to your local app with `--forward-to`. -- **Code generation**: add `--show-code` to `transcribe`/`stream`/`agent` to print the equivalent Python SDK script instead of running. -- **Account self-service**: `assembly keys` / `balance` / `usage` / `limits` / `sessions` / `audit` via browser login. - ### Quick examples Pull exactly the output you need: @@ -223,18 +270,18 @@ ffmpeg -i talk.mp4 -f wav - | assembly transcribe - git log --oneline -30 | assembly llm "write release notes grouped by feature/fix" ``` -Graduate to the SDK — print the equivalent Python script instead of running: - -```sh -assembly transcribe --sample --speaker-labels --show-code -``` - ## 📚 Documentation +### In the terminal + - Run `assembly --help` or `assembly --help` for flags and examples. - Run `assembly doctor` to check your environment (API key, network, ffmpeg, microphone). -- [AssemblyAI docs](https://www.assemblyai.com/docs) -- [API reference](https://www.assemblyai.com/docs/api-reference) + +### Resources + +- [**AssemblyAI docs**](https://www.assemblyai.com/docs) — guides for every model and feature. +- [**API reference**](https://www.assemblyai.com/docs/api-reference) — the REST and streaming APIs the CLI drives. +- [**Dashboard**](https://www.assemblyai.com/dashboard) — manage your account and API keys. ## 🤝 Contributing @@ -250,4 +297,6 @@ See [AGENTS.md](AGENTS.md) for development conventions and architecture notes. ## 📄 Legal -Released under the [MIT license](LICENSE). +- **License**: released under the [MIT license](LICENSE). +- **Privacy**: [AssemblyAI privacy policy](https://www.assemblyai.com/legal/privacy-policy) — the CLI's anonymous usage telemetry is opt-out (`assembly telemetry disable`, `AAI_TELEMETRY_DISABLED=1`, or `DO_NOT_TRACK=1`). +- **Terms**: [AssemblyAI terms of service](https://www.assemblyai.com/legal/terms-of-service).