Add AssemblyAI CLI (aai) by alexkroman · Pull Request #1 · AssemblyAI/cli

alexkroman · 2026-06-03T14:19:28Z

Summary

Initial implementation of the aai onboarding CLI for AssemblyAI. This populates the repo (previously just a README) with the full CLI package, tests, and tooling.

Commands

aai login — store the API key via the system keyring
aai transcribe — file/sample transcription, emits a runnable code template
aai transcripts — list / get past transcripts
aai samples — scaffold key-injected sample scripts
aai stream — real-time transcription from a file or microphone
aai agent — live two-way voice conversation with an AssemblyAI voice agent
aai claude — wire Claude Code to AssemblyAI's docs MCP server + skill

Stack

typer + rich, packaged with hatchling
Optional extras: [mic] (PyAudio) for streaming/agent mic input, [dev] for tooling
ruff + pre-commit configured

Tests

Full pytest suite — 157 tests passing locally.

pip install -e ".[dev]"
python -m pytest -q

🤖 Generated with Claude Code

Initial implementation of the `aai` onboarding CLI for AssemblyAI. Commands: - login: store the API key via keyring - transcribe: file/sample transcription with a runnable code template - transcripts: list/get past transcripts - samples: scaffold key-injected sample scripts - stream: real-time transcription from a file or microphone - agent: live two-way voice conversation with an AssemblyAI voice agent - claude: wire Claude Code to AssemblyAI's docs MCP server + skill Built on typer + rich; packaged with hatchling. Includes a full pytest suite (157 tests) plus ruff and pre-commit configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CI: - scripts/check.sh runs ruff (lint + format), mypy, and pytest - .github/workflows/ci.yml runs it on every PR and push to main Type safety: - generic output.emit(); typed audio stream, ffmpeg stdout, stream source - [tool.mypy] config (ignore_missing_imports); mypy is now green Robustness: - clean "not authenticated" errors across all commands: detect rejected keys (incl. the Voice Agent's 1008 policy-violation close) and raise NotAuthenticated instead of a raw protocol/APIError - `aai claude install` no longer hangs: detach child stdin, add a timeout, and pass `npx -y` so an invisible prompt can't block forever Install + docs: - install.sh for `curl -fsSL .../install.sh | sh` (pipx/pip, no clone) - README rewritten with the curl install as the top path - remove DEMO.md; gitignore .claude/, .mypy_cache/, docs/ Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- python -m assemblyai_cli entrypoint - client.py auth/error branches (get/transcribe/stream -> NotAuthenticated vs APIError) - session.py should_send_audio gate and _send_audio_loop (forward/drop/stop-on-error) - renderer human-mode lines + broken-pipe write swallowing - human-mode command paths (transcripts table, agent half-duplex notice, stream Ctrl-C, interactive login with/without a working browser) - MicCapture closes a closeable stream Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Quality gates (CI runs all of these on every PR): - coverage gate: pytest-cov with --cov-fail-under=90 (currently 96%) - broader ruff rules: BLE, C4, SIM, RET, PTH, ARG, S, RUF (+ fixes) - stricter mypy: disallow_untyped_defs, warn_unused_ignores, warn_return_any, no_implicit_optional (fully annotated) - new CI jobs: `pre-commit run --all-files` (hooks can't drift) and `python -m build` + `twine check` (package always builds) Features / UX: - `aai samples create agent` scaffolds a runnable voice-agent script - bundle PyAudio as a core dependency (mic/speaker work out of the box; drop the [mic] extra); CI installs portaudio19-dev for the build - groups (`aai claude`, `aai samples`) print their subcommands instead of "Missing command" (no_args_is_help) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- extract BaseRenderer (assemblyai_cli/render.py) for the shared NDJSON + in-place-line plumbing; AgentRenderer/StreamRenderer now only map their own events. stopped() is shared, so `aai stream` Ctrl-C reuses it. - unify the two near-identical mic classes (MicSource + MicCapture) into a single MicrophoneSource (assemblyai_cli/microphone.py) used by both `stream` and `agent`; consolidate their tests into test_microphone.py. Behavior is unchanged (outputs preserved); ~30 fewer lines, 96% coverage. Note on Rich: transcripts already uses rich.table; evaluated rich.Live for the live transcript line but it complicates the JSON/threaded paths and testability without a real simplification, so kept the lightweight helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Correctness: - renderer _write re-raises BrokenPipeError so `aai stream/agent --json | head` stops cleanly again (the dedup had swallowed it); agent handles it like stream - is_auth_failure no longer matches bare numbers (401/403/1008) anywhere in a message — those caused valid keys/real errors to be reported as "key rejected". Voice Agent 1008 and pre-upgrade HTTP 401/403 are now detected structurally (close code / status code) in session.py - mic-open failures in the agent's daemon capture thread are now surfaced to the user (clean CLIError/exit) instead of vanishing with a hung session - ffmpeg no longer SIGTERM'd after natural EOF, removing a spurious "exit -15" decode error on fully-streamed files - validate_key reuses the shared is_auth_failure (catches forbidden/403) Cleanup: - shared status_str() for transcript status (was copy-pasted 3x) - shared pyaudio_missing_error() (was duplicated with divergent wording) - claude.py honors CLAUDE_CONFIG_DIR; Step TypedDict removes the type: ignore Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Verified against the real `skills` CLI (vercel-labs/skills): - `skills add` auto-selects PROJECT scope when run inside a project, so a bare `skills add` from a repo never reached ~/.claude/skills — hence install said "installed" while status said "not_installed". Pass --global (+ --yes) to pin user scope, matching where status looks. Skill name confirmed: "assemblyai". - the skill is symlinked into ~/.claude/skills from the skills store, so shutil.rmtree couldn't remove it; `_remove_skill` now shells out to `npx skills remove assemblyai --global` (and verifies it's gone). End-to-end: `aai claude install` → `status` → `remove` now agree on a real machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`aai stream` now mirrors `aai transcribe`: - `--sample` streams the same hosted wildfires.mp3 clip - a positional source can be a local file OR an http(s) URL (decoded via ffmpeg, which reads URLs natively — verified it yields 16k mono PCM from the sample URL) Shared the source-resolution logic in client.resolve_audio_source() so transcribe and stream don't duplicate the --sample / "provide a path or URL" handling. FileSource grew a URL branch (skips the local is_file/WAV fast-path, always ffmpeg). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PyAudio ships no macOS/Linux wheels, so a fresh install had to compile it from source against PortAudio headers (brew install portaudio + a compiler). sounddevice bundles PortAudio in its macOS/Windows wheels, so `pip install` now works with zero system dependencies on those platforms; Linux needs only the libportaudio2 runtime (no headers/compiler). - microphone.py: replace SDK PyAudio MicrophoneStream with a sounddevice RawInputStream iterator (_SoundDeviceMic); rename pyaudio_missing_error -> audio_missing_error - agent/audio.py: Player uses sounddevice RawOutputStream; simpler teardown - pyproject: pyaudio -> sounddevice; agent.py.tmpl + README updated - tests: cover _SoundDeviceMic, both factories, and missing-dep/device-failure branches (coverage 97.4%) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Supply-chain: - pin GitHub Actions to commit SHAs (a moved tag can't change what runs) - add least-privilege `permissions: contents: read` to the workflow - new `pip-audit` CI job that fails on known dependency CVEs (deps are clean today; documented the --ignore-vuln escape hatch for unfixable transitives) - .github/dependabot.yml to keep pip deps and the pinned Actions current - fix stale "PyAudio" CI label left by the sounddevice migration Tests: - branch coverage gate (pytest --cov-branch, still >=90; currently 96%) - add hypothesis property tests: NDJSON renderers preserve arbitrary text (quotes/newlines/unicode), and WAV chunking is byte-exact and bounded Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drive the real `aai` CLI as a subprocess against the live AssemblyAI API, synthesizing speech locally with kokoro TTS. Marked `e2e`, they skip when the API key / kokoro / numpy is unavailable, so CI and keyless contributors are never blocked. A new precommit `pytest-e2e` hook runs them; the default unit run and coverage gate exclude them via `-m "not e2e"`. To make the agent drivable, `aai agent` now accepts a positional source / --sample (mirroring stream/transcribe): it streams a clip as the user's speech via a NullPlayer (headless), suppresses the greeting, runs full-duplex so nothing is muted, waits for session.ready before streaming, and exits after the agent's first reply. Live-mic behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a --prompt switch to transcribe and stream that transforms the transcript through AssemblyAI's LLM Gateway, plus a standalone `aai llm` command. The gateway is OpenAI-compatible and has no assemblyai-SDK client for the synchronous endpoint, so we talk to it via the openai SDK pointed at https://llm-gateway.assemblyai.com/v1 (Bearer auth, transcript_id injection). - transcribe --prompt: transforms the finished transcript server-side via the transcript id ({{ transcript }} injection). Human prints the transform only; --json keeps raw text + transform{model,prompt,output}. Conflicts with --srt/--vtt. - stream --prompt: accumulates finalized turns, then runs one transform on the full transcript when the stream ends (native per-turn streaming gateway is not provisioned for general accounts, so we don't rely on it). - aai llm: prompt the gateway directly, with --transcript-id injection, --model/--system/--max-tokens, and --list-models. Adds openai>=1.40. Unit suite covers the gateway client, both switches, and the new command (coverage gate green); e2e tests exercise aai llm, transcribe --prompt, and stream --prompt against the live API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>