assembly code/live: voice-interrupt UX, modal dismissal, concise speech, gemini live default by alexkroman · Pull Request #252 · AssemblyAI/cli

alexkroman · 2026-06-18T23:31:50Z

Follow-up UX polish on top of #251 (now merged).

Interrupt the readback → resume listening. Escape/Ctrl-C while the voice is speaking now stops the talking and goes back to listening (you can talk over the reply) instead of pausing to text mode. Interrupting while listening still pauses to the text prompt. Ctrl-C only arms the double-press quit when it paused to text.
Escape/Ctrl-C dismiss the modals. The approval modal declines the tool; the ask modal returns an empty answer.
Concise, speech-ready replies. The assembly code system prompt now tells the model its prose is read aloud — keep it to a sentence or two of plain spoken language, code in fenced blocks (the readback skips them).
assembly live defaults to gemini-2.5-flash-lite (low latency for spoken turns); assembly code stays gpt-5.1. Verified the gateway accepts it; --help snapshot updated.

./scripts/check.sh → All checks passed (100% patch coverage, mutation gate, build+twine).

🤖 Generated with Claude Code

…ch, gemini live default - Interrupting the readback (Escape/Ctrl-C while the voice is speaking) now stops the talking and resumes listening instead of pausing to text mode; interrupting while listening still pauses to the text prompt. Ctrl-C only arms the double-press quit when it paused to text, not when it resumed listening. - Escape/Ctrl-C dismiss the approval modal (declining the tool) and the ask modal (empty answer). - The assembly code system prompt now steers the model to concise, speech-ready prose (read aloud), with code kept in fenced blocks the readback skips. - assembly live defaults to gemini-2.5-flash-lite (low latency for spoken turns); assembly code stays gpt-5.1. Verified the gateway accepts it; --help snapshot updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…itions assembly live (and assembly code) bind tools whose JSON-Schema `parameters` carry `$schema`/`additionalProperties`/`title`. OpenAI ignores them, but Gemini's function_declarations 400 on them ("Unknown name \"$schema\""), so every tool-bound turn failed — the brain graph raised a non-CLIError, the reply worker died silently, and the live agent never responded. _GatewayChatOpenAI now strips those keys (recursively) from each tool's parameter schema in the outgoing request, so a tool-bound request works on every gateway-routed model. Verified end-to-end: the brain now replies on gemini-2.5-flash-lite. This is what makes the gemini live default usable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aikido-pr-checks · 2026-06-18T23:50:56Z

+            _strip_schema_keys(function.get("parameters"))
+
+
+def _strip_schema_keys(node: object) -> None:


Function _strip_schema_keys recursively traverses schema nodes without depth limits or visited tracking; add a max-depth parameter or convert to an iterative traversal to avoid unbounded recursion.

Details

✨ AI Reasoning
A new recursive routine was introduced to walk JSON-Schema-shaped structures and remove keys. It unconditionally recurses into dict/list children (_strip_schema_keys calls itself for each child) with no depth counter, visited set, or maximum depth. Malicious or very deeply nested input could trigger deep recursion and stack overflow. This risk is directly introduced by the added sanitization helpers.

🔧 How do I fix it?
Add depth limiting via counter parameters that are checked and enforced, or replace with iterative approaches using explicit loops or stack data structures. For graphs, combine depth limiting with visited set tracking.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

…definitions Expanding the earlier $schema/additionalProperties/title fix: the default MCP tools carry more validation keywords Gemini's function_declarations reject (exclusiveMinimum/ Maximum, multipleOf, patternProperties, …), each 400-ing a tool-bound turn. Strip the full validation/metadata keyword set (structural keys kept). Verified end-to-end: the live brain replies on gemini-2.5-flash-lite with all 28 default MCP tools loaded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ort warnings - Give AssistantMessage a top margin in the live TUI so the greeting is separated from the splash and each reply is separated from the preceding user turn (scoped to the live app's CSS, so `assembly code` is unaffected). - Suppress firecrawl-py's pydantic "Field name 'json'/'schema' shadows an attribute" UserWarnings at the runtime import site (pytest already filters them via pyproject); they otherwise leak into the user's terminal whenever a FIRECRAWL_API_KEY is set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the brain graph failed mid-turn (a gateway 4xx/5xx, a tool raising, a recursion limit), it raised a non-CLIError, _generate_reply only caught CLIError, and the reply worker died on a daemon thread — so the agent announced an action ("I'll search…") and then never came back, with no clue why. brain._run_graph now converts any graph exception into a CLIError (re-raising CLIErrors unchanged), and the cascade shows it in the transcript ("(error: …)") and records it, instead of swallowing it. The user sees *why* a turn produced no answer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aikido-pr-checks · 2026-06-19T00:29:12Z

+        # a langgraph recursion limit. Convert it to a CLIError so the cascade records and
+        # *surfaces* it (the engine shows it in the transcript) instead of the reply worker
+        # dying silently and the user getting no answer with no clue why.
+        raise CLIError(


Embedding the raw exception (f"the agent couldn't complete the turn: {exc}") may expose user/tool data. Redact or sanitize exception text before including it in CLIError messages.

Details

✨ AI Reasoning
The code now catches all Exceptions from the agent graph and raises a CLIError whose message embeds the original exception's string representation. That original exception may include user-controlled data, tool outputs, or other sensitive content. The CLIError is then recorded and shown to the user/UI by other parts of the cascade, so this change increases the risk of leaking unsanitized user input or external payloads to logs/terminal.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks · 2026-06-19T00:29:12Z

+            # brain._run_graph). Show it in the transcript so the turn doesn't just vanish —
+            # the user sees *why* there was no answer instead of silence.
            self._record_error(exc)
+            self.renderer.reply_started()


Rendering CLIError.message directly to the transcript may leak user/tool data. Sanitize or replace the message with a generic error before showing.

Details

✨ AI Reasoning
In the reply worker's except CLIError handler the code now writes the CLIError.message into the agent transcript via renderer.agent_transcript(f"(error: {exc.message})"). That message originates from exceptions converted earlier (which can contain untrusted user or external content). Displaying it verbatim to the UI increases risk of leaking sensitive or malicious content.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

…laude-haiku-4-5 - A second Ctrl-C now always quits, even mid-readback: the quit-pending check moved ahead of stopping voice, so a spoken turn can't trap you. The first Ctrl-C (and Escape) still stops the readback and resumes listening; the second Ctrl-C exits. _stop_voice_activity returns None now (its result is no longer branched on). - assembly live defaults to claude-haiku-4-5-20251001 (low latency for spoken turns); assembly code stays gpt-5.1. Config test + --help snapshot updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… can't hang startup Each MCP server was loaded with an unbounded asyncio.run(get_tools()); a slow/hung server (npx/uvx cold-start, an unreachable host) blocked `assembly live` startup indefinitely, and a Ctrl-C in that window triggered langchain-mcp-adapters' cancel-time crash. Wrap the fetch in asyncio.wait_for(timeout=15s) — a server that won't list its tools in time is cancelled and skipped (_safe_load turns the TimeoutError into []). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pause speech Two focused changes to the `assembly live` voice agent (still deepagents-based): Slim the toolset to just Firecrawl web search. A low-latency spoken turn does best with one obvious tool rather than a large menu it has to choose among — the big toolset (URL fetch, docs MCP, and a curated 5-server default MCP set) made the model narrate "I'll search…" without ever calling anything, and bloated every request with tool schemas. build_live_tools now returns only the web-search tool (when FIRECRAWL_API_KEY is set), and no MCP servers load by default (--mcp-config stays as a strictly opt-in power-user knob; default_servers is removed). The prompt's capability builder is trimmed to match. Wire Escape/Ctrl-C to pause speech and return to listening. A new CascadeSession.interrupt_reply signals the in-flight reply to stop (sets the stop flag + flushes audio) WITHOUT joining the worker — a UI-thread join would deadlock against the worker's call_from_thread render hops. run_cascade gains an on_session hook so the live TUI captures the session and binds Escape (interrupt) and Ctrl-C (interrupt while speaking, else quit); Ctrl-Q always quits as the guaranteed escape hatch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A spoken turn that paused to use a tool (web search) sat silent on "thinking…", reading as a hang. The brain now feeds an on_tool sink a short, speakable label ("Searching the web") as each tool call lands: build_completer's complete_reply takes an optional on_tool, and the graph is streamed — rather than invoke-d — whenever a sink is wired (not just under -v), so calls surface live. The cascade engine passes the renderer's tool_call as that sink, so every front-end shows it: the live TUI drops a dim inline "Searching the web…" note, the line renderer prints it (stderr in piped text mode), and --json emits a new additive tool.use event. The Renderer protocol gains tool_call. Also extracts the shared cascade test fakes into tests/_cascade_fakes.py so the engine/command/TUI suites share one set of doubles and stay under the 500-line gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A Ctrl-C during the voice TUI's setup — opening the mic, building the deepagents graph, loading --mcp-config servers — lands before Textual captures the keyboard, so it surfaced as a raw KeyboardInterrupt (and, mid asyncio.run/threading teardown, a noisy traceback). The line-renderer path already mapped this to a clean exit 130; the TUI dispatch did not. Extract a _launch_tui helper that wraps _run_live_tui and maps a setup-time KeyboardInterrupt to typer.Exit(130), matching the assembly code TUI. (In-session Ctrl-C is already a Textual binding, so it never reaches the graph as an exception.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexkroman-assembly and others added 2 commits June 18, 2026 16:24

aikido-pr-checks Bot reviewed Jun 18, 2026

View reviewed changes

alexkroman-assembly and others added 3 commits June 18, 2026 17:04

aikido-pr-checks Bot reviewed Jun 19, 2026

View reviewed changes

alexkroman-assembly and others added 5 commits June 18, 2026 17:46

alexkroman added this pull request to the merge queue Jun 19, 2026

Merged via the queue into main with commit 8bdf6b7 Jun 19, 2026
20 checks passed

alexkroman deleted the assembly-voice-ux branch June 19, 2026 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assembly code/live: voice-interrupt UX, modal dismissal, concise speech, gemini live default#252

assembly code/live: voice-interrupt UX, modal dismissal, concise speech, gemini live default#252
alexkroman merged 10 commits into
mainfrom
assembly-voice-ux

alexkroman commented Jun 18, 2026

Uh oh!

aikido-pr-checks Bot Jun 18, 2026

Uh oh!

aikido-pr-checks Bot Jun 19, 2026

Uh oh!

aikido-pr-checks Bot Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_strip_schema_keys(function.get("parameters"))


		def _strip_schema_keys(node: object) -> None:

Conversation

alexkroman commented Jun 18, 2026

Uh oh!

aikido-pr-checks Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants