Add voice mode to `assembly code` command by alexkroman · Pull Request #234 · AssemblyAI/cli

alexkroman · 2026-06-18T00:39:12Z

Adds a new voice-first interface to the assembly code command that lets users speak their requests and hear replies read back aloud (in sandbox environments with streaming TTS).

Summary

The assembly code command now defaults to voice mode in interactive terminals. Users can speak their coding requests via microphone (streaming STT) and receive replies read back aloud via streaming TTS (sandbox only). The implementation gracefully degrades: if no microphone is available, it falls back to typed input; if streaming TTS isn't available (production), replies display as text.

Key Changes

New aai_cli/code_agent/voice.py module: Implements VoiceSession with injectable dependencies for microphone input, streaming STT, streaming TTS, and audio playback. Protocols define the streaming interfaces so the loop is fully unit-testable with fakes (no actual mic/speaker/socket needed).
Voice mode in _exec.py:
- Added --voice/--no-voice flag to CodeOptions (defaults to True)
- New _run_voice() function wires the voice session, event sink, and read-line handler
- _voice_sink() renders all events and reads assistant text aloud via TTS
- _voice_read_line() captures spoken turns, with graceful fallback to typed input if the microphone fails (latched after first failure so the mic isn't retried)
- _announce_voice() prints a one-time notice explaining whether readback is available
- Updated run_code() dispatch logic: voice mode runs first (if enabled and interactive), then TUI, then REPL
Comprehensive test coverage (tests/test_code_voice.py + updates to tests/test_code_command.py):
- Voice session tests with fake mic, stream function, synth function, and player
- Tests for listen/speak behavior, turn finalization, gating, and readback availability
- Integration tests for voice mode dispatch, fallback to typed input on mic errors, and ask-handler wiring

Implementation Details

Streaming STT uses the u3-rt-pro model (same as assembly stream and assembly agent-cascade) with format_turns=True for punctuated, cased output
Streaming TTS synthesizes at 24 kHz (the player's native rate)
Microphone gating: The mic stream is shut down the instant a turn finalizes, ensuring exactly one utterance per listen() call
Error handling: Audio device errors (mic_missing, mic_error, audio_input_error) trigger a one-time fallback to input(); other errors re-raise
Readback availability is determined by tts_session.is_available() at session build time
All voice I/O is dependency-injected so tests drive the loop with lightweight fakes

https://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS

In an interactive terminal, `assembly code` now defaults to a voice loop: speak your request (microphone -> Streaming STT) and hear the agent's replies read back over streaming TTS. This is the new front-end picked ahead of the keyboard TUI and the headless REPL. - New code_agent/voice.py: a VoiceSession with injectable STT/TTS legs. listen() captures one spoken turn (gating the mic shut the instant a turn finalizes) and returns the finalized transcript; speak() reads each assistant reply back, a no-op where readback isn't available. - _exec.run_code now selects voice (default, TTY) -> TUI (--no-voice) -> headless REPL (piped/CI). The voice path runs the Rich REPL with a voice read_line + a reply-speaking sink, and degrades to typed input on the first microphone-open failure. - Readback needs streaming TTS, so it engages only in the sandbox (tts.session.is_available); microphone input works everywhere, and a non-TTY (pipe/CI) keeps the existing headless behavior. - Added the --voice/--no-voice flag (default on) plus help/docs, and regenerated the `run`-group help snapshot. Both network legs are injected, so tests/test_code_voice.py and the new _exec voice-helper tests drive it with fakes — no mic, speaker, or socket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS

aikido-pr-checks · 2026-06-18T00:39:39Z

+            state["typed"] = True
+            return _read_line()
+        if line:
+            renderer.notice(f"Heard: {line}")


Renderer prints raw speech transcript (variable 'line') via renderer.notice(f"Heard: {line}"). Avoid logging unsanitized user transcripts; mask, sanitize, or omit echoing sensitive speech data.

Details

✨ AI Reasoning
The change added a code path that directly prints user-spoken transcripts back to the terminal via renderer.notice with the raw variable 'line'. This outputs unsanitized, user-controlled text (potentially personal data, secrets, or malicious payloads) to logs/console. This is a new exposure introduced by the PR and can leak sensitive information entered by the user; it is not sanitized, masked, or otherwise protected.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

+                    return
+                yield chunk
+
+        self.stream_fn(self.api_key, gated(), params=_stt_params(mic.sample_rate), on_turn=on_turn)


+            return
+        config = SpeakConfig(text=text, sample_rate=_TTS_SAMPLE_RATE)
+        with self.player_factory() as player:
+            self.synth_fn(self.api_key, config, on_audio=player.feed)


aikido-pr-checks Bot reviewed Jun 18, 2026

View reviewed changes

github-code-quality Bot found potential problems Jun 18, 2026

View reviewed changes

Merge branch 'main' into claude/peaceful-galileo-cltkj2

3a8e50b

alexkroman enabled auto-merge June 18, 2026 01:06

alexkroman added this pull request to the merge queue Jun 18, 2026

Merged via the queue into main with commit c01c828 Jun 18, 2026
19 checks passed

alexkroman deleted the claude/peaceful-galileo-cltkj2 branch June 18, 2026 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add voice mode to `assembly code` command#234

Add voice mode to `assembly code` command#234
alexkroman merged 2 commits into
mainfrom
claude/peaceful-galileo-cltkj2

alexkroman commented Jun 18, 2026

Uh oh!

aikido-pr-checks Bot Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 18, 2026

Summary

Key Changes

Implementation Details

Uh oh!

aikido-pr-checks Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants