Add voice mode to assembly code command#234
Conversation
In an interactive terminal, `assembly code` now defaults to a voice loop: speak your request (microphone -> Streaming STT) and hear the agent's replies read back over streaming TTS. This is the new front-end picked ahead of the keyboard TUI and the headless REPL. - New code_agent/voice.py: a VoiceSession with injectable STT/TTS legs. listen() captures one spoken turn (gating the mic shut the instant a turn finalizes) and returns the finalized transcript; speak() reads each assistant reply back, a no-op where readback isn't available. - _exec.run_code now selects voice (default, TTY) -> TUI (--no-voice) -> headless REPL (piped/CI). The voice path runs the Rich REPL with a voice read_line + a reply-speaking sink, and degrades to typed input on the first microphone-open failure. - Readback needs streaming TTS, so it engages only in the sandbox (tts.session.is_available); microphone input works everywhere, and a non-TTY (pipe/CI) keeps the existing headless behavior. - Added the --voice/--no-voice flag (default on) plus help/docs, and regenerated the `run`-group help snapshot. Both network legs are injected, so tests/test_code_voice.py and the new _exec voice-helper tests drive it with fakes — no mic, speaker, or socket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS
| state["typed"] = True | ||
| return _read_line() | ||
| if line: | ||
| renderer.notice(f"Heard: {line}") |
There was a problem hiding this comment.
Renderer prints raw speech transcript (variable 'line') via renderer.notice(f"Heard: {line}"). Avoid logging unsanitized user transcripts; mask, sanitize, or omit echoing sensitive speech data.
Details
✨ AI Reasoning
The change added a code path that directly prints user-spoken transcripts back to the terminal via renderer.notice with the raw variable 'line'. This outputs unsanitized, user-controlled text (potentially personal data, secrets, or malicious payloads) to logs/console. This is a new exposure introduced by the PR and can leak sensitive information entered by the user; it is not sanitized, masked, or otherwise protected.
🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| return | ||
| yield chunk | ||
|
|
||
| self.stream_fn(self.api_key, gated(), params=_stt_params(mic.sample_rate), on_turn=on_turn) |
| return | ||
| config = SpeakConfig(text=text, sample_rate=_TTS_SAMPLE_RATE) | ||
| with self.player_factory() as player: | ||
| self.synth_fn(self.api_key, config, on_audio=player.feed) |
Adds a new voice-first interface to the
assembly codecommand that lets users speak their requests and hear replies read back aloud (in sandbox environments with streaming TTS).Summary
The
assembly codecommand now defaults to voice mode in interactive terminals. Users can speak their coding requests via microphone (streaming STT) and receive replies read back aloud via streaming TTS (sandbox only). The implementation gracefully degrades: if no microphone is available, it falls back to typed input; if streaming TTS isn't available (production), replies display as text.Key Changes
New
aai_cli/code_agent/voice.pymodule: ImplementsVoiceSessionwith injectable dependencies for microphone input, streaming STT, streaming TTS, and audio playback. Protocols define the streaming interfaces so the loop is fully unit-testable with fakes (no actual mic/speaker/socket needed).Voice mode in
_exec.py:--voice/--no-voiceflag toCodeOptions(defaults toTrue)_run_voice()function wires the voice session, event sink, and read-line handler_voice_sink()renders all events and reads assistant text aloud via TTS_voice_read_line()captures spoken turns, with graceful fallback to typed input if the microphone fails (latched after first failure so the mic isn't retried)_announce_voice()prints a one-time notice explaining whether readback is availablerun_code()dispatch logic: voice mode runs first (if enabled and interactive), then TUI, then REPLComprehensive test coverage (
tests/test_code_voice.py+ updates totests/test_code_command.py):Implementation Details
u3-rt-promodel (same asassembly streamandassembly agent-cascade) withformat_turns=Truefor punctuated, cased outputlisten()callmic_missing,mic_error,audio_input_error) trigger a one-time fallback toinput(); other errors re-raisetts_session.is_available()at session build timehttps://claude.ai/code/session_013tckfky3TVuNtHgKENWpoS