Add assembly dictate: hotkey-driven push-to-talk transcription#131
Merged
Conversation
A new run command for hands-free terminal dictation: press Enter (or Space) to start recording the microphone, press again to stop, and the utterance is POSTed to the Sync API (sync.assemblyai.com/transcribe, X-AAI-Model: u3-sync-pro) which returns the transcript in the response body — no polling. q/Esc/Ctrl-C ends the session. - sync_stt.py: the Sync API HTTP boundary (httpx2, multipart raw-PCM + JSON config), normalizing 401/403 to auth_failure(), 429/503 to a retryable APIError, and error_code/message bodies into clean messages. Environment gains a sync_base host (prod + sandbox). - hotkey.py: TerminalKeys reads single keypresses with cbreak scoped to a with-block (Ctrl-C still signals); clean not-a-tty / no-termios errors instead of tracebacks. - dictate_exec.py: the options/run split. Capture is resampled to 16 kHz PCM16; the key poll runs with zero timeout between ~100 ms mic chunks; recordings are capped at the API's 120 s limit and ones under its 80 ms floor are skipped with a warning instead of a server 400. --language (comma list for code-switching), --prompt, --word-boost, --device, --once, --max-seconds, --json (one NDJSON object per utterance). Tests cover the HTTP boundary via MockTransport, the termios behavior via a real pty pair, and the session loop via injected key/mic/HTTP seams; the terminal requirement is validated before credentials. https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
Combines the new clip command from #129 with dictate: both registered in the command order, help groups, import-linter contracts, and AGENTS.md (seven options/run-split exec modules now); run-group help snapshot regenerated. Note: the full check.sh gate was deliberately skipped for this merge commit at the operator's request; the default pytest suite passed on the merged tree (2210 passed). https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the
assembly dictatecommand for real-time, hotkey-driven transcription using the AssemblyAI Sync STT API. This is a new interactive command that lets users press Enter to record audio and get instant transcripts back without polling.Key Changes
New modules:
aai_cli/sync_stt.py: HTTP boundary for the Sync STT API (sync.assemblyai.com). Handles multipart PCM + JSON config uploads, normalizes errors into the CLIError hierarchy, and parses transcript responses. Stays Rich-free per the import-linter contract.aai_cli/hotkey.py: Single-keypress input via termios cbreak mode. Scopes terminal state changes to awithblock, handles POSIX-only constraints gracefully (raises clean CLIError on Windows), and supports both blocking reads and zero-timeout polls for responsive recording.aai_cli/dictate_exec.py: Session orchestration for the dictate command. Implements the push-to-talk loop: wait for hotkey → record mic → transcribe → repeat. Supports language/prompt/word-boost flags, duration caps, and both human-readable and JSON output modes.aai_cli/commands/dictate.py: Typer CLI surface. Maps argv flags toDictateOptionsand delegates torun_dictate.Test coverage:
tests/test_sync_stt.py(222 lines): HTTP boundary tests covering request shape, multipart encoding, error normalization (auth, rate limits, audio validation), response parsing, and fallback error detail extraction.tests/test_hotkey.py(119 lines): Terminal I/O tests using real pty pairs, verifying cbreak scoping, single-keypress reads, EOF handling, and clean error modes for non-TTY/non-POSIX platforms.tests/test_dictate_exec.py(312 lines): Session behavior tests with fully injected boundaries (scripted keys, canned PCM, mocked HTTP). Covers hotkey toggles, recording stops, duration caps, language/prompt handling, JSON mode, and Ctrl-C cleanup.tests/test_dictate_command.py(84 lines): CLI surface tests verifying argv → options mapping and the terminal-validation-before-credentials ordering.Integration:
aai_cli/environments.pyto addsync_baseURL per environment (production and sandbox).dictatecommand inaai_cli/main.pyunder the "Run AssemblyAI" workflow group.tests/_snapshot_surface.pyto include the new command..importlinterto allowaai_cli.dictate_execas a source module.Notable Implementation Details
DictateOptionsand injecting three boundaries (TerminalKeys, MicrophoneSource, sync_stt.transcribe_pcm), so no test needs a real terminal, microphone, or network.None(blocking).DictateOptionsis a frozen dataclass, preventing accidental mutation during the session.--promptand--languageare set, a warning notes that the server ignores language_code with a custom prompt.https://claude.ai/code/session_01FCXQLAyo8xpZiXrQ7hCMAf