You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dictate: make the default headless and signal-driven
Recording now starts immediately and runs without a terminal, finishing on
SIGTERM (clean exit 0) so a hotkey tool like Hammerspoon can launch `assembly
dictate` as a background task and `kill -TERM` / `task:terminate()` to
transcribe. SIGINT (Ctrl-C) still cancels (exit 130).
The press-Enter/keypress terminal mode and its `core/hotkey.py` (TerminalKeys)
backend are removed; dictate now polls a new `signals.stop_on_terminate` latch
between mic chunks. Contrast `signals.terminate_as_interrupt` (stream/agent/
speak), which routes SIGTERM into the cancel path instead.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JrDRFsdAYyXwWSudM2da8g
|`assembly stream`| Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too |
49
-
|`assembly dictate`|Push-to-talk dictation: recording starts immediately, press Enter for instant text (Sync STT API, up to 120 s per utterance) |
49
+
|`assembly dictate`|Signal-driven dictation: records immediately, send SIGTERM for instant text — scriptable from hotkey tools like Hammerspoon (Sync STT API, up to 120 s per utterance) |
50
50
|`assembly agent`| Full-duplex spoken conversation with a voice agent, right in your terminal |
51
51
|`assembly agent-cascade`| Same live conversation, but wired client-side from Streaming STT + the LLM Gateway + streaming TTS, like the `agent-cascade` starter (sandbox-only) |
52
52
|`assembly speak`| Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
`choices`, `locking`, `config_lock`. Contract 4 also forbids `rich` here, so
38
38
"no Rich below the UI layer" is structural.
@@ -149,7 +149,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
149
149
### Feature subsystems
150
150
151
151
-**`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback.
152
-
-**`core/sync_stt.py`** + **`core/hotkey.py`** + `commands/dictate/` — `assembly dictate`: push-to-talk dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). `hotkey.TerminalKeys` scopes stdin into cbreak (Ctrl-C still signals) and reads single keypresses; `dictate_exec._record` polls it with a zero timeout between ~100 ms mic chunks. All three boundaries (keys, mic, HTTP) are injectable, so the suite never needs a real terminal — `tests/test_hotkey.py`drives a pty pair for the termios behavior.
152
+
-**`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py`scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead.
153
153
-**`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
154
154
-**`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade``assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, per-sentence TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker.
155
155
-**`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:**`session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`).
0 commit comments