Skip to content

Add voice mode to assembly code command#234

Merged
alexkroman merged 2 commits into
mainfrom
claude/peaceful-galileo-cltkj2
Jun 18, 2026
Merged

Add voice mode to assembly code command#234
alexkroman merged 2 commits into
mainfrom
claude/peaceful-galileo-cltkj2

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Adds a new voice-first interface to the assembly code command that lets users speak their requests and hear replies read back aloud (in sandbox environments with streaming TTS).

Summary

The assembly code command now defaults to voice mode in interactive terminals. Users can speak their coding requests via microphone (streaming STT) and receive replies read back aloud via streaming TTS (sandbox only). The implementation gracefully degrades: if no microphone is available, it falls back to typed input; if streaming TTS isn't available (production), replies display as text.

Key Changes

  • New aai_cli/code_agent/voice.py module: Implements VoiceSession with injectable dependencies for microphone input, streaming STT, streaming TTS, and audio playback. Protocols define the streaming interfaces so the loop is fully unit-testable with fakes (no actual mic/speaker/socket needed).

  • Voice mode in _exec.py:

    • Added --voice/--no-voice flag to CodeOptions (defaults to True)
    • New _run_voice() function wires the voice session, event sink, and read-line handler
    • _voice_sink() renders all events and reads assistant text aloud via TTS
    • _voice_read_line() captures spoken turns, with graceful fallback to typed input if the microphone fails (latched after first failure so the mic isn't retried)
    • _announce_voice() prints a one-time notice explaining whether readback is available
    • Updated run_code() dispatch logic: voice mode runs first (if enabled and interactive), then TUI, then REPL
  • Comprehensive test coverage (tests/test_code_voice.py + updates to tests/test_code_command.py):

    • Voice session tests with fake mic, stream function, synth function, and player
    • Tests for listen/speak behavior, turn finalization, gating, and readback availability
    • Integration tests for voice mode dispatch, fallback to typed input on mic errors, and ask-handler wiring

Implementation Details

  • Streaming STT uses the u3-rt-pro model (same as assembly stream and assembly agent-cascade) with format_turns=True for punctuated, cased output
  • Streaming TTS synthesizes at 24 kHz (the player's native rate)
  • Microphone gating: The mic stream is shut down the instant a turn finalizes, ensuring exactly one utterance per listen() call
  • Error handling: Audio device errors (mic_missing, mic_error, audio_input_error) trigger a one-time fallback to input(); other errors re-raise
  • Readback availability is determined by tts_session.is_available() at session build time
  • All voice I/O is dependency-injected so tests drive the loop with lightweight fakes

https://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS

In an interactive terminal, `assembly code` now defaults to a voice loop:
speak your request (microphone -> Streaming STT) and hear the agent's
replies read back over streaming TTS. This is the new front-end picked
ahead of the keyboard TUI and the headless REPL.

- New code_agent/voice.py: a VoiceSession with injectable STT/TTS legs.
  listen() captures one spoken turn (gating the mic shut the instant a
  turn finalizes) and returns the finalized transcript; speak() reads
  each assistant reply back, a no-op where readback isn't available.
- _exec.run_code now selects voice (default, TTY) -> TUI (--no-voice)
  -> headless REPL (piped/CI). The voice path runs the Rich REPL with a
  voice read_line + a reply-speaking sink, and degrades to typed input
  on the first microphone-open failure.
- Readback needs streaming TTS, so it engages only in the sandbox
  (tts.session.is_available); microphone input works everywhere, and a
  non-TTY (pipe/CI) keeps the existing headless behavior.
- Added the --voice/--no-voice flag (default on) plus help/docs, and
  regenerated the `run`-group help snapshot.

Both network legs are injected, so tests/test_code_voice.py and the new
_exec voice-helper tests drive it with fakes — no mic, speaker, or socket.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS
state["typed"] = True
return _read_line()
if line:
renderer.notice(f"Heard: {line}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renderer prints raw speech transcript (variable 'line') via renderer.notice(f"Heard: {line}"). Avoid logging unsanitized user transcripts; mask, sanitize, or omit echoing sensitive speech data.

Details

✨ AI Reasoning
​The change added a code path that directly prints user-spoken transcripts back to the terminal via renderer.notice with the raw variable 'line'. This outputs unsanitized, user-controlled text (potentially personal data, secrets, or malicious payloads) to logs/console. This is a new exposure introduced by the PR and can leak sensitive information entered by the user; it is not sanitized, masked, or otherwise protected.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

return
yield chunk

self.stream_fn(self.api_key, gated(), params=_stt_params(mic.sample_rate), on_turn=on_turn)
return
config = SpeakConfig(text=text, sample_rate=_TTS_SAMPLE_RATE)
with self.player_factory() as player:
self.synth_fn(self.api_key, config, on_audio=player.feed)
@alexkroman alexkroman enabled auto-merge June 18, 2026 01:06
@alexkroman alexkroman added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit c01c828 Jun 18, 2026
19 checks passed
@alexkroman alexkroman deleted the claude/peaceful-galileo-cltkj2 branch June 18, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants