AssemblyAI
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎aai_cli/AGENTS.md‎
Lines changed: 2 additions & 2 deletions b/‎aai_cli/AGENTS.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎aai_cli/commands/dictate/__init__.py‎
Lines changed: 13 additions & 7 deletions b/‎aai_cli/commands/dictate/__init__.py‎
Lines changed: 13 additions & 7 deletions
diff --git a/‎aai_cli/commands/dictate/_exec.py‎
Lines changed: 52 additions & 52 deletions b/‎aai_cli/commands/dictate/_exec.py‎
Lines changed: 52 additions & 52 deletions
@@ -46,7 +46,7 @@ That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-ins
 | :--- | :--- |
 | `assembly transcribe` | Transcribe files, URLs, YouTube/podcast pages, podcast RSS feeds, directories, globs, or bucket storage (`s3://`, `gs://`, `az://`) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs |
 | `assembly stream` | Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too |
-| `assembly dictate` | Push-to-talk dictation: recording starts immediately, press Enter for instant text (Sync STT API, up to 120 s per utterance) |
+| `assembly dictate` | Signal-driven dictation: records immediately, send SIGTERM for instant text — scriptable from hotkey tools like Hammerspoon (Sync STT API, up to 120 s per utterance) |
 | `assembly agent` | Full-duplex spoken conversation with a voice agent, right in your terminal |
 | `assembly agent-cascade` | Same live conversation, but wired client-side from Streaming STT + the LLM Gateway + streaming TTS, like the `agent-cascade` starter (sandbox-only) |
 | `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
 
@@ -32,7 +32,7 @@ between layers is enforced — higher may import lower, never the reverse:
   `help_text`, `typer_patches`, `update_check`.
 - **`core/`** — the Rich-free library layer: `client`, `config`,
   `config_builder`, `keyring_store`, `environments`, `env`, `errors`, `llm`,
-  `telemetry`, `debuglog`, `remotefs`, `sync_stt`, `hotkey`, `ws`, `youtube`,
+  `telemetry`, `debuglog`, `remotefs`, `sync_stt`, `signals`, `ws`, `youtube`,
   `wer`, `argscan`, `jsonshape`, `timeparse`, `microphone`, `procs`, `stdio`,
   `choices`, `locking`, `config_lock`. Contract 4 also forbids `rich` here, so
   "no Rich below the UI layer" is structural.
@@ -149,7 +149,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
 ### Feature subsystems
 
 - **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback.
-- **`core/sync_stt.py`** + **`core/hotkey.py`** + `commands/dictate/` — `assembly dictate`: push-to-talk dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). `hotkey.TerminalKeys` scopes stdin into cbreak (Ctrl-C still signals) and reads single keypresses; `dictate_exec._record` polls it with a zero timeout between ~100 ms mic chunks. All three boundaries (keys, mic, HTTP) are injectable, so the suite never needs a real terminal — `tests/test_hotkey.py` drives a pty pair for the termios behavior.
+- **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead.
 - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
 - **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, per-sentence TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker.
 - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`).
 
@@ -22,7 +22,11 @@
     rich_help_panel=help_panels.TRANSCRIPTION,
     epilog=examples_epilog(
         [
-            ("Dictate one utterance: recording starts, Enter transcribes it", "assembly dictate"),
+            ("Record until SIGTERM, then print the transcript", "assembly dictate"),
+            (
+                "Stop the recording and transcribe (e.g. from a hotkey tool)",
+                "kill -TERM $(pgrep -f 'assembly dictate')",
+            ),
             (
                 "Pipe the utterance into another command",
                 'assembly dictate | assembly llm "write a conventional commit"',
@@ -75,13 +79,15 @@ def dictate(
         help="Output mode: text (the bare transcript per utterance, pipe-friendly) or json",
     ),
 ) -> None:
-    """Push-to-talk dictation: record the mic, get the transcript back
+    """Signal-driven dictation: record the mic, get the transcript back
 
-    Recording starts immediately; press Enter (or Space) to stop and the
-    utterance is sent to the AssemblyAI Sync API — the transcript prints right
-    away (no polling) and dictate exits, so it flows straight to the next
-    command in a pipe. The recording can be up to 120 seconds long. Press
-    Ctrl-C to cancel without transcribing.
+    Recording starts immediately and runs headless — no terminal needed — so a
+    hotkey tool like Hammerspoon can launch it as a background task and send
+    SIGTERM (kill -TERM, task:terminate()) to stop. On SIGTERM the utterance is
+    sent to the AssemblyAI Sync API, the transcript prints right away (no
+    polling), and dictate exits, so it flows straight to the next command in a
+    pipe. The recording can be up to 120 seconds long. Ctrl-C (SIGINT) cancels
+    without transcribing.
     """
     opts = dictate_exec.DictateOptions(
         language=language,
 
@@ -1,25 +1,29 @@
 """Run logic for `assembly dictate`: the options/run split (see AGENTS.md).
 
-Push-to-talk dictation over the Sync STT API: recording starts immediately,
-runs until a hotkey is pressed (or the duration cap), then the utterance is
-POSTed to the Sync API, the transcript is printed, and dictate exits. The
-command module (aai_cli/commands/dictate/__init__.py) only parses argv into a
-``DictateOptions``; tests drive the session by constructing options directly and
-injecting the key/mic/HTTP boundaries, with no CliRunner argv round-trip and no
-real terminal.
+Headless dictation over the Sync STT API: recording starts immediately and runs
+until SIGTERM is delivered (or the duration cap), then the utterance is POSTed to
+the Sync API, the transcript is printed, and dictate exits 0. There is no terminal
+interaction — a controller like Hammerspoon launches `assembly dictate` as a
+background task and sends SIGTERM (``task:terminate()`` / ``kill -TERM``) to mean
+"I'm done dictating", so the transcript flows straight to the next command in a
+pipe. SIGINT (Ctrl-C) cancels without transcribing (exit 130). The command module
+(aai_cli/commands/dictate/__init__.py) only parses argv into a ``DictateOptions``;
+tests drive the session by constructing options directly and injecting the
+stop-signal/mic/HTTP boundaries, with no real signals, microphone, or network.
 """
 
 from __future__ import annotations
 
+from collections.abc import Callable
 from dataclasses import dataclass
 
 import typer
 
 from aai_cli.app.context import AppState
 from aai_cli.core import choices, errors, sync_stt
 from aai_cli.core.config_builder import split_csv
-from aai_cli.core.hotkey import CTRL_C, CTRL_D, ESC, TerminalKeys
 from aai_cli.core.microphone import MicrophoneSource
+from aai_cli.core.signals import stop_on_terminate
 from aai_cli.streaming.validate import resolve_output_modes
 from aai_cli.ui import output
 
@@ -28,10 +32,6 @@
 TARGET_RATE = 16000
 _BYTES_PER_SECOND = TARGET_RATE * 2  # PCM16 mono
 
-# Enter or Space stops the (auto-started) recording; q / Esc / Ctrl-D also stop
-# it (Ctrl-C cancels — cbreak mode keeps SIGINT delivery).
-STOP_KEYS = frozenset({"\r", "\n", " ", "q", "Q", ESC, CTRL_C, CTRL_D})
-
 
 @dataclass(frozen=True)
 class DictateOptions:
@@ -52,7 +52,7 @@ class DictateOptions:
 
 
 def _note(message: str, *, json_mode: bool, quiet: bool) -> None:
-    """A muted stderr hint guiding the interactive session; silent under --json
+    """A muted stderr hint naming how to finish the recording; silent under --json
     (stderr must stay machine-readable) and --quiet."""
     if json_mode or quiet:
         return
@@ -68,11 +68,14 @@ def _languages(language: str | None) -> str | list[str] | None:
     return codes[0] if len(codes) == 1 else codes
 
 
-def _record(keys: TerminalKeys, mic: MicrophoneSource, *, max_seconds: float) -> bytes:
-    """Capture PCM until a hotkey is pressed again or the duration cap is hit.
+def _record(
+    stop_requested: Callable[[], bool], mic: MicrophoneSource, *, max_seconds: float
+) -> bytes:
+    """Capture PCM until SIGTERM is delivered (``stop_requested`` flips True) or the
+    duration cap is hit.
 
-    The key poll runs between ~100 ms mic chunks with a zero timeout, so the mic
-    read loop is never blocked waiting on the keyboard.
+    The stop poll runs between ~100 ms mic chunks, so a SIGTERM is honored within one
+    chunk without blocking the mic read loop.
     """
     pcm = bytearray()
     frames = iter(mic)
@@ -81,8 +84,7 @@ def _record(keys: TerminalKeys, mic: MicrophoneSource, *, max_seconds: float) ->
             pcm += chunk
             if len(pcm) >= int(max_seconds * _BYTES_PER_SECOND):
                 break
-            # None (no key pending) is simply not in the set.
-            if keys.read(0) in STOP_KEYS:
+            if stop_requested():
                 break
     finally:
         # MicrophoneSource yields from a generator whose cleanup releases the
@@ -122,8 +124,8 @@ def _transcribe_utterance(
 ) -> None:
     """Send one recorded utterance to the Sync API and print the transcript.
 
-    A recording below the API's 80 ms floor (a double-tapped hotkey) is skipped
-    with a warning rather than bounced off the server as a 400.
+    A recording below the API's 80 ms floor (an instant SIGTERM) is skipped with a
+    warning rather than bounced off the server as a 400.
     """
     if len(pcm) < sync_stt.MIN_AUDIO_MS * _BYTES_PER_SECOND // 1000:
         output.emit_warning(
@@ -144,7 +146,7 @@ def _transcribe_utterance(
 
 
 def _capture_and_transcribe(
-    keys: TerminalKeys,
+    stop_requested: Callable[[], bool],
     api_key: str,
     opts: DictateOptions,
     state: AppState,
@@ -156,10 +158,12 @@ def _capture_and_transcribe(
         target_rate=TARGET_RATE,
         device=opts.device,
         on_open=lambda: _note(
-            "● Recording — press Enter to stop.", json_mode=json_mode, quiet=state.quiet
+            "● Recording — send SIGTERM to transcribe (Ctrl-C cancels).",
+            json_mode=json_mode,
+            quiet=state.quiet,
         ),
     )
-    pcm = _record(keys, mic, max_seconds=opts.max_seconds)
+    pcm = _record(stop_requested, mic, max_seconds=opts.max_seconds)
     _transcribe_utterance(api_key, pcm, opts, state, json_mode=json_mode)
 
 
@@ -170,34 +174,30 @@ def run_dictate(opts: DictateOptions, state: AppState, *, json_mode: bool) -> No
     # dictate has no live panel, so the text_mode half is unused — plain
     # transcript text is already the non-JSON default in `_emit`.
     _, json_mode = resolve_output_modes(opts.output_field, json_mode=json_mode)
+    # Resolve credentials before recording: don't capture audio we can't transcribe.
+    api_key = state.resolve_api_key()
+    if opts.prompt and opts.language:
+        # The server ignores language_code whenever a custom prompt is set;
+        # never drop a requested flag silently (mirrors the speak warnings).
+        output.emit_warning(
+            "--language is ignored when --prompt is set; state the language inside the prompt.",
+            json_mode=json_mode,
+        )
+    if opts.once and not state.quiet:
+        # Deprecation trap, not removal: --once still parses so old scripts don't
+        # break, but recording one utterance and exiting is now the default, so the
+        # flag does nothing — say so once (mirrors `login`).
+        output.emit_warning(
+            "--once is now the default and can be omitted.",
+            json_mode=json_mode,
+        )
     try:
-        # Entering TerminalKeys validates the terminal (a usage precondition)
-        # before credentials, so a piped stdin reads as "needs a terminal" — not
-        # as a login prompt.
-        with TerminalKeys() as keys:
-            api_key = state.resolve_api_key()
-            if opts.prompt and opts.language:
-                # The server ignores language_code whenever a custom prompt is set;
-                # never drop a requested flag silently (mirrors the speak warnings).
-                output.emit_warning(
-                    "--language is ignored when --prompt is set; "
-                    "state the language inside the prompt.",
-                    json_mode=json_mode,
-                )
-            if opts.once and not state.quiet:
-                # Deprecation trap, not removal: --once still parses so old scripts
-                # don't break, but recording one utterance and exiting is now the
-                # default, so the flag does nothing — say so once (mirrors `login`).
-                output.emit_warning(
-                    "--once is now the default and can be omitted.",
-                    json_mode=json_mode,
-                )
-            # Recording auto-starts and exits after one utterance: a single
-            # keystroke stops the capture, which also closes a piped stdout so
-            # `assembly dictate | assembly llm …` unblocks the downstream command.
-            _capture_and_transcribe(keys, api_key, opts, state, json_mode=json_mode)
+        # Recording auto-starts and exits after one utterance: SIGTERM stops the
+        # capture, which also closes a piped stdout so `assembly dictate | assembly
+        # llm …` unblocks the downstream command.
+        with stop_on_terminate() as stop_requested:
+            _capture_and_transcribe(stop_requested, api_key, opts, state, json_mode=json_mode)
     except KeyboardInterrupt:
-        # Ctrl-C cancels dictation, so it exits 130 (cancel) — distinct from `q`, which
-        # ends the session normally (exit 0). The with-block above already restored the
-        # terminal on the way out.
+        # Ctrl-C / SIGINT cancels dictation, so it exits 130 (cancel) — distinct from
+        # SIGTERM, which finishes the utterance normally (exit 0).
         raise typer.Exit(code=errors.CANCELLED_EXIT_CODE) from None