Skip to content

Add assembly dictate: hotkey-driven push-to-talk transcription#131

Merged
alexkroman merged 3 commits into
mainfrom
claude/adoring-cerf-ynr8gc
Jun 12, 2026
Merged

Add assembly dictate: hotkey-driven push-to-talk transcription#131
alexkroman merged 3 commits into
mainfrom
claude/adoring-cerf-ynr8gc

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Implements the assembly dictate command for real-time, hotkey-driven transcription using the AssemblyAI Sync STT API. This is a new interactive command that lets users press Enter to record audio and get instant transcripts back without polling.

Key Changes

New modules:

  • aai_cli/sync_stt.py: HTTP boundary for the Sync STT API (sync.assemblyai.com). Handles multipart PCM + JSON config uploads, normalizes errors into the CLIError hierarchy, and parses transcript responses. Stays Rich-free per the import-linter contract.
  • aai_cli/hotkey.py: Single-keypress input via termios cbreak mode. Scopes terminal state changes to a with block, handles POSIX-only constraints gracefully (raises clean CLIError on Windows), and supports both blocking reads and zero-timeout polls for responsive recording.
  • aai_cli/dictate_exec.py: Session orchestration for the dictate command. Implements the push-to-talk loop: wait for hotkey → record mic → transcribe → repeat. Supports language/prompt/word-boost flags, duration caps, and both human-readable and JSON output modes.
  • aai_cli/commands/dictate.py: Typer CLI surface. Maps argv flags to DictateOptions and delegates to run_dictate.

Test coverage:

  • tests/test_sync_stt.py (222 lines): HTTP boundary tests covering request shape, multipart encoding, error normalization (auth, rate limits, audio validation), response parsing, and fallback error detail extraction.
  • tests/test_hotkey.py (119 lines): Terminal I/O tests using real pty pairs, verifying cbreak scoping, single-keypress reads, EOF handling, and clean error modes for non-TTY/non-POSIX platforms.
  • tests/test_dictate_exec.py (312 lines): Session behavior tests with fully injected boundaries (scripted keys, canned PCM, mocked HTTP). Covers hotkey toggles, recording stops, duration caps, language/prompt handling, JSON mode, and Ctrl-C cleanup.
  • tests/test_dictate_command.py (84 lines): CLI surface tests verifying argv → options mapping and the terminal-validation-before-credentials ordering.

Integration:

  • Updated aai_cli/environments.py to add sync_base URL per environment (production and sandbox).
  • Registered dictate command in aai_cli/main.py under the "Run AssemblyAI" workflow group.
  • Updated help snapshot tests and tests/_snapshot_surface.py to include the new command.
  • Updated .importlinter to allow aai_cli.dictate_exec as a source module.

Notable Implementation Details

  • Boundary injection for testability: The session is driven by constructing DictateOptions and injecting three boundaries (TerminalKeys, MicrophoneSource, sync_stt.transcribe_pcm), so no test needs a real terminal, microphone, or network.
  • Responsive recording: In-recording key polls use zero timeout (non-blocking) so audio chunks never stall behind keyboard reads; idle waits use None (blocking).
  • Validation ordering: Terminal validation (TerminalKeys entry) happens before credential resolution, so piped stdin surfaces as "needs a terminal" rather than triggering a login prompt.
  • Error normalization: All HTTP failures (auth, rate limits, audio validation) are normalized into the CLIError hierarchy with appropriate exit codes and suggestions.
  • Immutable options: DictateOptions is a frozen dataclass, preventing accidental mutation during the session.
  • Language handling: Comma-separated language codes are parsed into a list for code-switching audio; blank input is treated as unset.
  • Prompt + language warning: When both --prompt and --language are set, a warning notes that the server ignores language_code with a custom prompt.

https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf

A new run command for hands-free terminal dictation: press Enter (or
Space) to start recording the microphone, press again to stop, and the
utterance is POSTed to the Sync API (sync.assemblyai.com/transcribe,
X-AAI-Model: u3-sync-pro) which returns the transcript in the response
body — no polling. q/Esc/Ctrl-C ends the session.

- sync_stt.py: the Sync API HTTP boundary (httpx2, multipart raw-PCM +
  JSON config), normalizing 401/403 to auth_failure(), 429/503 to a
  retryable APIError, and error_code/message bodies into clean messages.
  Environment gains a sync_base host (prod + sandbox).
- hotkey.py: TerminalKeys reads single keypresses with cbreak scoped to
  a with-block (Ctrl-C still signals); clean not-a-tty / no-termios
  errors instead of tracebacks.
- dictate_exec.py: the options/run split. Capture is resampled to
  16 kHz PCM16; the key poll runs with zero timeout between ~100 ms mic
  chunks; recordings are capped at the API's 120 s limit and ones under
  its 80 ms floor are skipped with a warning instead of a server 400.
  --language (comma list for code-switching), --prompt, --word-boost,
  --device, --once, --max-seconds, --json (one NDJSON object per
  utterance).

Tests cover the HTTP boundary via MockTransport, the termios behavior
via a real pty pair, and the session loop via injected key/mic/HTTP
seams; the terminal requirement is validated before credentials.

https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
@alexkroman alexkroman enabled auto-merge June 12, 2026 21:41
claude and others added 2 commits June 12, 2026 21:58
Combines the new clip command from #129 with dictate: both registered in
the command order, help groups, import-linter contracts, and AGENTS.md
(seven options/run-split exec modules now); run-group help snapshot
regenerated.

Note: the full check.sh gate was deliberately skipped for this merge
commit at the operator's request; the default pytest suite passed on the
merged tree (2210 passed).

https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
@alexkroman alexkroman added this pull request to the merge queue Jun 12, 2026
Merged via the queue into main with commit a324f69 Jun 12, 2026
15 checks passed
@alexkroman alexkroman deleted the claude/adoring-cerf-ynr8gc branch June 12, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants