Skip to content

Add voice interruption support via cancel() method#248

Merged
alexkroman merged 1 commit into
mainfrom
claude/funny-shannon-1a4lir
Jun 18, 2026
Merged

Add voice interruption support via cancel() method#248
alexkroman merged 1 commit into
mainfrom
claude/funny-shannon-1a4lir

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Summary

Adds the ability to interrupt in-flight voice listening and readback operations via a new cancel() method on VoiceSession. This enables Ctrl-C and Escape keys to stop active voice activity (listening or text-to-speech playback) promptly, improving responsiveness in voice mode.

Key Changes

  • VoiceSession.cancel(): New method that sets a thread-safe flag to signal interruption. Each listen() and speak() call clears the flag on entry, so stale cancellations don't preempt subsequent operations.

  • Listening interruption: The listen() method now checks the cancel flag between microphone chunks via a gated() iterator. When cancelled mid-capture, the stream stops immediately and returns None instead of waiting for a finalized turn.

  • Readback interruption: The speak() method wraps the TTS feed callback to check the cancel flag before each audio chunk. When cancelled, it raises an internal _ReadbackInterrupted sentinel that aborts the player (discarding buffered audio) and stops synthesis cleanly.

  • TUI integration:

    • New _stop_voice_activity() method in CodeAgentApp that cancels active voice, pauses the session, and returns the text prompt.
    • action_interrupt() (Escape) now stops active voice before falling back to turn cancellation.
    • action_quit_or_interrupt() (Ctrl-C) now stops active voice and arms the quit hint for a second press, matching the double-press quit behavior.
  • Voice protocol: Added cancel() to the VoiceUI protocol so all voice implementations (real and test) provide the method.

  • Test coverage: Added comprehensive tests for:

    • Listening interrupted mid-capture
    • Readback interrupted mid-playback
    • Stale cancellations cleared on entry
    • Ctrl-C double-press quit flow with voice interruption
    • Escape key voice interruption without arming quit
    • Inactive voice (no-op behavior)

Implementation Details

  • Uses threading.Event for thread-safe cancellation signaling between the TUI thread and daemon voice legs.
  • The _ReadbackInterrupted sentinel subclasses CLIError so streaming TTS passes it through unchanged, allowing speak() to catch and suppress it cleanly.
  • Voice interruption takes priority over pending quit hints: if voice is active, Ctrl-C interrupts it rather than confirming a quit.
  • Paused voice sessions are considered inactive, so interruption is a no-op and defers to the quit path.

https://claude.ai/code/session_01Hqg3RHzMaHjG15pkBXP9WB

In voice mode the agent spends most of its time listening or reading a
reply back aloud — neither is a "running turn", so Ctrl-C/Escape skipped
straight to the double-press quit hint and could not stop the in-flight
listen or readback. A single Ctrl-C therefore appeared to do nothing.

Add `VoiceSession.cancel()` so an in-flight `listen()`/`speak()` stops
within a chunk (the mic gate and a readback-feed sentinel both check it),
and in the TUI treat active voice as interruptible: the first Ctrl-C (or
Escape) stops the listen/readback and pauses voice so the session goes
idle (the text prompt returns, Ctrl-V to talk again); a second Ctrl-C
then confirms the quit. Text mode's deepagents-style double-press is
unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Hqg3RHzMaHjG15pkBXP9WB
@alexkroman alexkroman enabled auto-merge June 18, 2026 19:32
_abort_readback()
player.feed(pcm, sample_rate)

self.synth_fn(self.api_key, config, on_audio=feed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — not actioning this. tts.session.synthesize is synthesize(api_key, config, *, connect=None, on_warning=None, on_audio=None) (session.py:234-241), so self.synth_fn(self.api_key, config, on_audio=feed) matches its arity exactly. synth_fn is the SynthFn Protocol attribute that defaults to synthesize; CodeQL appears to be misresolving it to a 1-arg target. The suggested self.synth_fn(config) would drop api_key and on_audio, breaking authentication and incremental playback. This call shape (previously on_audio=player.feed) predates this PR and is unchanged in behavior — only the callback was wrapped to honor cancel().


Generated by Claude Code

@alexkroman alexkroman added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit 2407433 Jun 18, 2026
20 checks passed
@alexkroman alexkroman deleted the claude/funny-shannon-1a4lir branch June 18, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants