Add `assembly dictate`: hotkey-driven push-to-talk transcription by alexkroman · Pull Request #131 · AssemblyAI/cli

alexkroman · 2026-06-12T21:35:32Z

Implements the assembly dictate command for real-time, hotkey-driven transcription using the AssemblyAI Sync STT API. This is a new interactive command that lets users press Enter to record audio and get instant transcripts back without polling.

Key Changes

New modules:

aai_cli/sync_stt.py: HTTP boundary for the Sync STT API (sync.assemblyai.com). Handles multipart PCM + JSON config uploads, normalizes errors into the CLIError hierarchy, and parses transcript responses. Stays Rich-free per the import-linter contract.
aai_cli/hotkey.py: Single-keypress input via termios cbreak mode. Scopes terminal state changes to a with block, handles POSIX-only constraints gracefully (raises clean CLIError on Windows), and supports both blocking reads and zero-timeout polls for responsive recording.
aai_cli/dictate_exec.py: Session orchestration for the dictate command. Implements the push-to-talk loop: wait for hotkey → record mic → transcribe → repeat. Supports language/prompt/word-boost flags, duration caps, and both human-readable and JSON output modes.
aai_cli/commands/dictate.py: Typer CLI surface. Maps argv flags to DictateOptions and delegates to run_dictate.

Test coverage:

tests/test_sync_stt.py (222 lines): HTTP boundary tests covering request shape, multipart encoding, error normalization (auth, rate limits, audio validation), response parsing, and fallback error detail extraction.
tests/test_hotkey.py (119 lines): Terminal I/O tests using real pty pairs, verifying cbreak scoping, single-keypress reads, EOF handling, and clean error modes for non-TTY/non-POSIX platforms.
tests/test_dictate_exec.py (312 lines): Session behavior tests with fully injected boundaries (scripted keys, canned PCM, mocked HTTP). Covers hotkey toggles, recording stops, duration caps, language/prompt handling, JSON mode, and Ctrl-C cleanup.
tests/test_dictate_command.py (84 lines): CLI surface tests verifying argv → options mapping and the terminal-validation-before-credentials ordering.

Integration:

Updated aai_cli/environments.py to add sync_base URL per environment (production and sandbox).
Registered dictate command in aai_cli/main.py under the "Run AssemblyAI" workflow group.
Updated help snapshot tests and tests/_snapshot_surface.py to include the new command.
Updated .importlinter to allow aai_cli.dictate_exec as a source module.

Notable Implementation Details

Boundary injection for testability: The session is driven by constructing DictateOptions and injecting three boundaries (TerminalKeys, MicrophoneSource, sync_stt.transcribe_pcm), so no test needs a real terminal, microphone, or network.
Responsive recording: In-recording key polls use zero timeout (non-blocking) so audio chunks never stall behind keyboard reads; idle waits use None (blocking).
Validation ordering: Terminal validation (TerminalKeys entry) happens before credential resolution, so piped stdin surfaces as "needs a terminal" rather than triggering a login prompt.
Error normalization: All HTTP failures (auth, rate limits, audio validation) are normalized into the CLIError hierarchy with appropriate exit codes and suggestions.
Immutable options: DictateOptions is a frozen dataclass, preventing accidental mutation during the session.
Language handling: Comma-separated language codes are parsed into a list for code-switching audio; blank input is treated as unset.
Prompt + language warning: When both --prompt and --language are set, a warning notes that the server ignores language_code with a custom prompt.

https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf

A new run command for hands-free terminal dictation: press Enter (or Space) to start recording the microphone, press again to stop, and the utterance is POSTed to the Sync API (sync.assemblyai.com/transcribe, X-AAI-Model: u3-sync-pro) which returns the transcript in the response body — no polling. q/Esc/Ctrl-C ends the session. - sync_stt.py: the Sync API HTTP boundary (httpx2, multipart raw-PCM + JSON config), normalizing 401/403 to auth_failure(), 429/503 to a retryable APIError, and error_code/message bodies into clean messages. Environment gains a sync_base host (prod + sandbox). - hotkey.py: TerminalKeys reads single keypresses with cbreak scoped to a with-block (Ctrl-C still signals); clean not-a-tty / no-termios errors instead of tracebacks. - dictate_exec.py: the options/run split. Capture is resampled to 16 kHz PCM16; the key poll runs with zero timeout between ~100 ms mic chunks; recordings are capped at the API's 120 s limit and ones under its 80 ms floor are skipped with a warning instead of a server 400. --language (comma list for code-switching), --prompt, --word-boost, --device, --once, --max-seconds, --json (one NDJSON object per utterance). Tests cover the HTTP boundary via MockTransport, the termios behavior via a real pty pair, and the session loop via injected key/mic/HTTP seams; the terminal requirement is validated before credentials. https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf

Combines the new clip command from #129 with dictate: both registered in the command order, help groups, import-linter contracts, and AGENTS.md (seven options/run-split exec modules now); run-group help snapshot regenerated. Note: the full check.sh gate was deliberately skipped for this merge commit at the operator's request; the default pytest suite passed on the merged tree (2210 passed). https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf

alexkroman enabled auto-merge June 12, 2026 21:41

claude and others added 2 commits June 12, 2026 21:58

Merge branch 'main' into claude/adoring-cerf-ynr8gc

8fc5b14

alexkroman added this pull request to the merge queue Jun 12, 2026

Merged via the queue into main with commit a324f69 Jun 12, 2026
15 checks passed

alexkroman deleted the claude/adoring-cerf-ynr8gc branch June 12, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `assembly dictate`: hotkey-driven push-to-talk transcription#131

Add `assembly dictate`: hotkey-driven push-to-talk transcription#131
alexkroman merged 3 commits into
mainfrom
claude/adoring-cerf-ynr8gc

alexkroman commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 12, 2026

Key Changes

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants