Skip to content

Implement Websocket Audio IO#189

Open
reisbauer03 wants to merge 22 commits intodnhkng:mainfrom
reisbauer03:pr-websocket-audio
Open

Implement Websocket Audio IO#189
reisbauer03 wants to merge 22 commits intodnhkng:mainfrom
reisbauer03:pr-websocket-audio

Conversation

@reisbauer03
Copy link
Copy Markdown
Contributor

@reisbauer03 reisbauer03 commented Mar 30, 2026

This PR introduces the previously unimplemented WebSocket audio input/output backend.

  • New Configuration – A top‑level audio_io_options key is added so you can supply backend‑specific settings. A reference example is now in glados_websocket_config.yaml.

  • Protocol Documentation – The implemented WebSocket protocol is documented in README_WEBSOCKET_PROTOCOL.md. I verified the AI‑generated description for correctness and made a few edits.

  • Multi‑Channel Support – The backend can handle multiple audio inputs and outputs. Each input/output can be assigned a room tag (a simple text label) for matching purposes.

  • Speaker Segregation – When the backend option segregate_speakers is enabled, a speaker will only play audio from an input that shares the same tag. This is useful for multi‑room setups, letting GLaDOS “follow” you as you move between rooms.

  • Testing – I used the example client tests/audio-websocket-both.html; all features work as expected.

Summary by CodeRabbit

  • New Features

    • WebSocket audio I/O: concurrent speaker/mic streams, time‑synced playback, VAD-driven automatic mic selection, optional room-based speaker segregation, selectable runtime audio backend with optional blocking playback, and injectable audio backend used across tools.
  • Documentation

    • Added README section and protocol guide with usage, configuration examples, and message formats.
  • Tests

    • New browser test pages: speaker, microphone, and combined demo.
  • Chores

    • Added websockets runtime dependency and an example runtime configuration (includes persona preprompt and initial announcement).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8029b3ae-76fb-4ac5-bd78-3bf6cc7ca253

📥 Commits

Reviewing files that changed from the base of the PR and between 83f8258 and c64bafe.

📒 Files selected for processing (1)
  • src/glados/cli.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/glados/cli.py

📝 Walkthrough

Walkthrough

Adds a WebSocket-based audio I/O backend with /speaker and /microphone endpoints (VAD-driven mic arbitration, scheduled playback, optional speaker segregation); extends AudioProtocol and factory to accept backend options and a blocking play flag; wires config options through Glados; replaces direct sounddevice usage with the backend abstraction; adds docs, browser tests, and a websockets runtime dependency.

Changes

Cohort / File(s) Summary
Audio Protocol & Factory
src/glados/audio_io/__init__.py
Added wait: bool = False to AudioProtocol.start_speaking() and `backend_options: dict[str, Any]
Sounddevice Backend
src/glados/audio_io/sounddevice_io.py
start_speaking(...) now requires np.float32 audio, accepts wait to optionally block, and conditionally calls sd.wait() when wait=True.
WebSocket Audio Backend (new)
src/glados/audio_io/websocket_io.py
New WebsocketAudioIO with AudioData and MicState dataclasses; runs an internal websockets server exposing /speaker and /microphone; implements playback scheduling/sync, VAD-driven mic ownership, speaker segregation, interruption handling, and a sample queue.
CLI & Tools Integration
src/glados/cli.py, src/glados/tools/slow_clap.py
Removed direct sounddevice playback; initialize audio backend via get_audio_system(..., backend_options=...); use audio_system.start_speaking(..., wait=True); slow_clap reads float32 audio and delegates playback to injected backend with broader exception handling.
Core Config & Engine
src/glados/core/engine.py, configs/glados_websocket_config.yaml
Added audio_io_options to GladosConfig and pass-through to get_audio_system(); added example websocket config (server/port, sync delay, mic silence thresholds, MCP servers list, personality preprompt).
Project Dependency
pyproject.toml
Added runtime dependency websockets>=16.0.
Docs & Protocol Spec
docs/audio_websocket.md, README.md
Added protocol documentation and README section describing /speaker and /microphone formats, VAD ownership rules, room segregation behavior, and links to tests/config.
Browser Test Clients
tests/audio-websocket-both.html, tests/audio-websocket-speaker.html, tests/audio-websocket-mic.html
Added three HTML test pages: combined view, speaker client with sync ping/pong and scheduled playback, and mic client using an inline AudioWorklet to capture Float32 PCM and send room-tagged frames.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Browser as Browser Speaker
  participant Server as WebsocketAudioIO
  participant AudioCtx as Browser AudioContext
  rect rgba(200,220,255,0.5)
    Browser->>Server: CONNECT /speaker\nsend "room:<tag>"
    Browser->>Server: "sync_ping"
    Server->>Browser: "sync_pong:<ts>"
    Server->>Browser: "sampleRate:<hz>"
    Server->>Browser: "time:<play_time>"
    Server->>Browser: <Float32 PCM bytes>
    Browser->>AudioCtx: schedule playback at (time + offset)
    AudioCtx-->>Browser: playback finished
    Browser->>Server: "played"
  end
Loading
sequenceDiagram
  autonumber
  participant MicBrowser as Browser Microphone
  participant Server as WebsocketAudioIO
  participant VAD as VAD Engine
  participant Queue as Sample Queue
  rect rgba(220,255,200,0.5)
    MicBrowser->>Server: CONNECT /microphone\nsend "room:<tag>"
    Server->>MicBrowser: "sampleRate:<hz>"
    MicBrowser->>Server: <Float32 PCM chunk>
    Server->>VAD: run VAD on chunk
    VAD-->>Server: vad_confidence
    Server->>Server: arbitrate MicState (owner/silence)
    alt client owns mic
      Server->>Queue: enqueue (samples, vad_confidence)
    else not owner
      Server--xMicBrowser: drop samples
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Refactor #142: Audio IO refactor — modifies src/glados/audio_io/__init__.py and related audio_io components; strongly related to the protocol/factory changes and websocket backend integration.
  • Refactor #121: say() playback changes — adjusts CLI/tts playback flow; related to the refactor replacing direct sounddevice usage with an injectable backend.
  • Update to Silero VAD 5 #116: VAD API/behavior changes — touches VAD interfaces and processing used by the new websocket microphone server; likely overlaps with mic VAD usage.

Poem

🐰
I hopped through sockets, ears aflutter, bright,
Float32 streams beneath the moonlit byte.
Mic and speaker waltz with pings and time,
Rooms align and playbacks keep in rhyme. 🎧

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement Websocket Audio IO' directly and accurately describes the main change in the PR—adding a complete WebSocket-based audio I/O backend system.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
src/glados/tools/slow_clap.py (1)

97-99: Broad exception catch is acceptable here given backend abstraction.

The static analysis flags Exception as too broad (BLE001), but this is reasonable since different audio backends (sounddevice, websocket) may raise different exception types. The error message and logging provide sufficient context for debugging.

If you want to be more explicit, you could consider logging the exception type:

Optional: More specific error logging
         except Exception as other_error:
-            error_msg = f"error: other (possibly audio device) - {other_error}"
+            error_msg = f"error: {type(other_error).__name__} (possibly audio device) - {other_error}"
             logger.error(f"SlowClap: {error_msg}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/tools/slow_clap.py` around lines 97 - 99, The broad except
Exception catch in slow_clap.py is acceptable given multiple audio backends, but
improve diagnostics by including the exception type or full traceback in the
log; update the except block that builds error_msg and calls logger.error
(around the SlowClap handling) to either log the exception type (e.g., include
type(other_error).__name__ in the message) or call logger.exception so the
traceback is captured, keeping the existing message context ("SlowClap") intact.
README_WEBSOCKET_PROTOCOL.md (1)

121-153: Add language specifier to fenced code blocks for lint compliance.

The message flow example code blocks are missing language specifiers (markdownlint MD040). Using text makes the intent clear.

Proposed fix
-```
+```text
 Client connects to /speaker
 Client: room:Living Room
 ...

Apply similarly to the other two code blocks at lines 133 and 148.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README_WEBSOCKET_PROTOCOL.md` around lines 121 - 153, The three fenced code
blocks that show the message flows (the speaker example starting "Client
connects to /speaker", the Microphone Endpoint Flow starting "Client connects to
/microphone", and the sync ping/pong example starting "Client connects to
/speaker") need a language specifier for markdownlint compliance; update each
opening fence from ``` to ```text so the blocks read as ```text ... ``` to make
intent explicit and satisfy MD040.
tests/audio-websocket-speaker.html (1)

273-279: Optional: Consider adding a retry limit to sendPingInit.

The sendPingInit function has no maximum retry count. While the retry loop is bounded by the WebSocket lifecycle (it stops once connected or when the connection closes and triggers reconnect), adding an explicit limit would be more defensive.

♻️ Proposed fix
-    function sendPingInit(i) {
+    function sendPingInit(i, maxRetries = 20) {
+        if (i >= maxRetries) {
+            console.warn("Failed to sync clock: max retries exceeded");
+            return;
+        }
         if (ws.readyState === WebSocket.OPEN) {
             sendPing();
         } else {
             setTimeout(() => sendPingInit(i + 1), 500);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-speaker.html` around lines 273 - 279, The sendPingInit
loop (function sendPingInit) has no retry cap; add a maxRetries parameter and
counter (e.g., maxRetries = 10) and increment on each scheduled setTimeout; if
ws.readyState !== WebSocket.OPEN and retries >= maxRetries, stop retrying and
call a fallback (e.g., log via console.warn/processLogger and/or invoke the
existing reconnect logic) instead of scheduling another setTimeout; ensure
sendPingInit and any callers pass/initialize the retry count and reference ws
and sendPing names unchanged.
tests/audio-websocket-mic.html (2)

190-193: Consider revoking the blob URL to prevent memory leak.

The blob URL created for the AudioWorklet module is never revoked. While minor for a test page, it's good practice to clean up.

♻️ Proposed fix
         const blob = new Blob([workletCode], {type: 'application/javascript'});
         const workletUrl = URL.createObjectURL(blob);
 
         await micAudioCtx.audioWorklet.addModule(workletUrl);
+        URL.revokeObjectURL(workletUrl);
         const workletNode = new AudioWorkletNode(micAudioCtx, "mic-processor");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-mic.html` around lines 190 - 193, The Blob URL created
as workletUrl for the AudioWorklet module is never revoked; after awaiting
micAudioCtx.audioWorklet.addModule(workletUrl) revoke the object URL to avoid
leaks by calling URL.revokeObjectURL(workletUrl) (ideally in a finally block or
immediately after the await) so the temporary blob is released; locate the blob
and workletUrl variables and the addModule call to insert the revoke.

75-87: Duplicate click listeners may cause race condition.

There are two separate click event listeners on startBtn. Both will fire when the button is clicked:

  1. Lines 75-84: Sets flags and calls micConnect()
  2. Lines 85-87: Calls navigator.mediaDevices.getUserMedia()

This can cause issues because micStream might be set asynchronously by the second listener while startMic() (line 168) also attempts to get the stream if micStream is null. Consider consolidating into a single listener.

♻️ Proposed consolidation
-    startBtn.addEventListener('click', () => {
+    startBtn.addEventListener('click', async () => {
         userHasInteracted = true;
         startBtn.disabled = true;
         startBtn.innerText = "Audio Enabled";
 
+        // Get microphone stream first
+        micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
+
         // Start microphone
         if (!micWs || micWs.readyState === WebSocket.CLOSED) {
             micConnect();
         }
     });
-    startBtn.addEventListener('click', async () => {
-        micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
-    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-mic.html` around lines 75 - 87, The button has two
click listeners causing a race between setting flags/connect and acquiring
micStream; consolidate into a single startBtn click handler that (1) sets
userHasInteracted, disables the button and updates its text, (2) awaits
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}) and assigns
micStream, handling errors, and then (3) calls micConnect() (or startMic() if
that’s the intended flow) only after micStream is set; remove the duplicate
listener so micStream is not populated concurrently and ensure any fallback in
startMic() checks for null safely.
src/glados/audio_io/websocket_io.py (1)

46-47: Binding to all interfaces (0.0.0.0) is intentional but consider documenting security implications.

Static analysis (S104) flags binding to 0.0.0.0. This is intentional for a websocket server accepting remote connections, but consider adding a comment noting that this exposes the service on all network interfaces and should be firewalled in production if not intended.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 46 - 47, The SERVER
constant currently set to "0.0.0.0" exposes the websocket server on all network
interfaces; update the comment next to SERVER (and optionally mention PORT) to
document this security implication by noting that binding to 0.0.0.0 allows
remote connections from any interface and that this should be restricted,
firewalled, or changed to a specific interface in production if not intended;
include guidance to prefer a configurable value (env/config) and to ensure
appropriate firewalling or access controls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Line 293: Replace the direct type comparison in the websocket message
conditional so it uses isinstance(ws_msg, str) instead of type(ws_msg) == str;
update the branch in the conditional that checks ws_msg (the line with "elif
type(ws_msg) == str and ws_msg.startswith('room')") so it reads an isinstance
check, ensuring subclasses of str are handled correctly while preserving the
existing startswith('room') logic and control flow in the message handler.
- Line 402: Replace the type equality check "type(msg) == str" with the Pythonic
isinstance check "isinstance(msg, str)" where the websocket message is examined
(the conditional that currently reads 'if type(msg) == str and
msg.startswith("room")'); update that conditional in the message-handling
function/method that uses the variable msg so it becomes "if isinstance(msg,
str) and msg.startswith('room')", ensuring correct behavior for subclasses of
str and clearer intent.
- Around line 36-41: The module/class docstring incorrectly mentions "using
sounddevice" even though this file implements audio I/O over WebSocket; update
the docstring in src/glados/audio_io/websocket_io.py (the module/class docstring
for WebSocketAudioIO/WebsocketAudioIO) to accurately describe that this is a
WebSocket-based Audio I/O implementation (handling sending/receiving audio
frames over WebSocket, VAD integration, and playback/capture orchestration) and
remove any references to sounddevice.
- Around line 343-351: The WAITING PHASE currently treats any
non-`sync_ping`/`room:*` message as an ACK; change the logic so playback is only
considered acknowledged when the received message equals the explicit "played"
ACK (as sent by tests). Locate the await websocket.recv() loop in
websocket_io.py (the WAITING PHASE) and update either the handle_default_msg
function or the loop's conditional to check message == "played" (or the exact
expected payload) before setting self._playback_was_interrupted = False and
breaking; ignore or continue on other unexpected/malformed messages.
- Around line 299-364: The handler currently uses shared flags (_is_playing,
_playback_finished_event, _playback_was_interrupted) so the first connection
that ACKs ("played") clears playback for everyone; change to per-connection ACK
tracking so completion is only signaled once all active speakers have
acknowledged. Add a per-connection identifier (e.g., websocket or speaker_id)
and maintain a set/dict like pending_play_acks keyed by that id; in
_server_speaker replace the code path that on ACK sets _is_playing=False /
_playback_finished_event.set() with code that marks that connection as ACKed
(via pending_play_acks.discard(id) or pending_play_acks[id]=True) and only when
the central playback manager (the code that sets _is_playing to True and sends
audio) observes pending_play_acks empty should it clear _is_playing, set
_playback_finished_event, and update _playback_was_interrupted; also ensure
per-connection resets/send "reset" are done only for that websocket and that the
segregated-room logic still checks room before waiting for its own ACK (see
_server_speaker, handle_default_msg, _audio_lock, and where
_is_playing/_playback_finished_event/_playback_was_interrupted are mutated).

In `@tests/audio-websocket-both.html`:
- Line 6: Update the <title> element that currently reads "GLaDOS Microphone" so
it accurately reflects that the page contains both audio components (speaker and
microphone) — e.g., change the title string to "GLaDOS Speaker & Microphone" or
"GLaDOS Audio — Speaker and Microphone"; locate the <title> tag containing
"GLaDOS Microphone" in tests/audio-websocket-both.html and replace its text
accordingly.

---

Nitpick comments:
In `@README_WEBSOCKET_PROTOCOL.md`:
- Around line 121-153: The three fenced code blocks that show the message flows
(the speaker example starting "Client connects to /speaker", the Microphone
Endpoint Flow starting "Client connects to /microphone", and the sync ping/pong
example starting "Client connects to /speaker") need a language specifier for
markdownlint compliance; update each opening fence from ``` to ```text so the
blocks read as ```text ... ``` to make intent explicit and satisfy MD040.

In `@src/glados/audio_io/websocket_io.py`:
- Around line 46-47: The SERVER constant currently set to "0.0.0.0" exposes the
websocket server on all network interfaces; update the comment next to SERVER
(and optionally mention PORT) to document this security implication by noting
that binding to 0.0.0.0 allows remote connections from any interface and that
this should be restricted, firewalled, or changed to a specific interface in
production if not intended; include guidance to prefer a configurable value
(env/config) and to ensure appropriate firewalling or access controls.

In `@src/glados/tools/slow_clap.py`:
- Around line 97-99: The broad except Exception catch in slow_clap.py is
acceptable given multiple audio backends, but improve diagnostics by including
the exception type or full traceback in the log; update the except block that
builds error_msg and calls logger.error (around the SlowClap handling) to either
log the exception type (e.g., include type(other_error).__name__ in the message)
or call logger.exception so the traceback is captured, keeping the existing
message context ("SlowClap") intact.

In `@tests/audio-websocket-mic.html`:
- Around line 190-193: The Blob URL created as workletUrl for the AudioWorklet
module is never revoked; after awaiting
micAudioCtx.audioWorklet.addModule(workletUrl) revoke the object URL to avoid
leaks by calling URL.revokeObjectURL(workletUrl) (ideally in a finally block or
immediately after the await) so the temporary blob is released; locate the blob
and workletUrl variables and the addModule call to insert the revoke.
- Around line 75-87: The button has two click listeners causing a race between
setting flags/connect and acquiring micStream; consolidate into a single
startBtn click handler that (1) sets userHasInteracted, disables the button and
updates its text, (2) awaits
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}) and assigns
micStream, handling errors, and then (3) calls micConnect() (or startMic() if
that’s the intended flow) only after micStream is set; remove the duplicate
listener so micStream is not populated concurrently and ensure any fallback in
startMic() checks for null safely.

In `@tests/audio-websocket-speaker.html`:
- Around line 273-279: The sendPingInit loop (function sendPingInit) has no
retry cap; add a maxRetries parameter and counter (e.g., maxRetries = 10) and
increment on each scheduled setTimeout; if ws.readyState !== WebSocket.OPEN and
retries >= maxRetries, stop retrying and call a fallback (e.g., log via
console.warn/processLogger and/or invoke the existing reconnect logic) instead
of scheduling another setTimeout; ensure sendPingInit and any callers
pass/initialize the retry count and reference ws and sendPing names unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3275803-1a71-4565-8dee-5de8d8ae3f79

📥 Commits

Reviewing files that changed from the base of the PR and between 6cbcbbc and 421fc74.

📒 Files selected for processing (13)
  • README.md
  • README_WEBSOCKET_PROTOCOL.md
  • configs/glados_websocket_config.yaml
  • pyproject.toml
  • src/glados/audio_io/__init__.py
  • src/glados/audio_io/sounddevice_io.py
  • src/glados/audio_io/websocket_io.py
  • src/glados/cli.py
  • src/glados/core/engine.py
  • src/glados/tools/slow_clap.py
  • tests/audio-websocket-both.html
  • tests/audio-websocket-mic.html
  • tests/audio-websocket-speaker.html

Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread tests/audio-websocket-both.html Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README_WEBSOCKET_PROTOCOL.md`:
- Line 121: In README_WEBSOCKET_PROTOCOL.md there are three unlabeled fenced
code blocks (the blocks beginning with the lines "Client connects to /speaker"
and "Client connects to /microphone" and the repeated speaker block) that
trigger MD040; fix them by adding a language identifier (e.g., text) to each
opening fence so they read ```text instead of ```, ensuring all three
occurrences are updated consistently.
- Line 128: The example flow uses the generic label "Client: ACK" but the
protocol defines the ACK message payload as "played"; update the flow example so
the client step sends the explicit payload "played" (replace "Client: ACK" with
"Client: played") and ensure any surrounding text or examples refer to the ACK
message by its defined payload name "played" to avoid mismatch.

In `@src/glados/audio_io/websocket_io.py`:
- Line 61: The constructor docstring for WebsocketIO.__init__ currently
references "Initialize the sounddevice audio I/O", which is stale and
misleading; update the docstring to accurately describe that this backend is
websocket-based (e.g., "Initialize the websocket-based audio I/O" or similar) so
anyone reading the class WebsocketIO (and its __init__ method) understands the
actual transport; make the change only to the docstring text with no behavior
changes.
- Around line 300-303: The room-tag parsing can raise IndexError when ws_msg
lacks a colon; update the handler that processes ws_msg (the branch using
ws_msg.startswith("room") which assigns to nonlocal room) to validate the
payload before accessing split()[1] — either check that ":" is present and the
right-hand part is non-empty or use a safe split with maxsplit=1 and verify
length, and if malformed log/warn and ignore the message (return False) instead
of crashing.
- Line 53: The SERVER constant currently defaults to "0.0.0.0" exposing the
websocket to all interfaces; change the default value of SERVER in
websocket_io.py to the loopback address "127.0.0.1" so the service binds to
localhost by default, and ensure any code that reads/overrides SERVER (e.g.,
WebsocketServer initialization or config loader) still accepts explicit
overrides (env/config) for remote binding; update any related documentation or
config comments to state that remote exposure requires an explicit change.
- Line 397: The sent microphone sample-rate frame is missing the required
prefix; change the send so it follows the protocol format "sampleRate:<hz>"
instead of sending only the numeric value — update the websocket.send call that
currently uses self.SAMPLE_RATE to send f"sampleRate:{self.SAMPLE_RATE}"
(referencing the existing self.SAMPLE_RATE and the websocket.send invocation in
websocket_io.py).
- Around line 412-415: The code assumes every incoming websocket message (msg)
is raw PCM bytes and directly calls np.frombuffer(msg, dtype=np.float32), which
will raise on text/control frames; update the handling in the websocket receive
loop (references: self._is_listening, current_data, msg, np.frombuffer) to first
validate the payload is binary (bytes or bytearray) before decoding, e.g., skip
or log and ignore non-bytes messages, and wrap the np.frombuffer call in a
minimal try/except to catch/ignore decoding errors so the mic handling doesn't
crash.
- Around line 175-176: The current wait path uses
self._playback_finished_event.wait() with no timeout which can block forever;
change the waiting logic in the method that checks "if wait:" to call
self._playback_finished_event.wait(timeout=some_seconds) (or accept a
wait_timeout parameter) and then check the boolean return value; if it returns
False, log or raise a timeout/RuntimeError so callers don't hang. Update any
callers or signature as needed to expose a configurable timeout and reference
_playback_finished_event in the adjusted logic.
- Around line 337-342: The code currently holds self._audio_lock (a
threading.Lock) across await websocket.send() calls; change it to acquire the
lock only to synchronously snapshot the needed fields and a safe copy of the
buffer, then release the lock before any awaits. Concretely: inside the block
guarded by self._audio_lock read self._audio_data.play_time and .sample_rate
into local vars and make a thread-safe copy of the audio bytes (e.g., data_bytes
= self._audio_data.data.tobytes() or data_copy =
self._audio_data.data.copy().tobytes()), release the lock, then call await
websocket.send(...) three times using those locals (play_time, sample_rate,
data_bytes). Ensure you reference self._audio_lock, self._audio_data, and
websocket.send in the change and do not perform any awaits while holding the
lock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a17d1ea-8395-4712-b0e4-dbd51f022a86

📥 Commits

Reviewing files that changed from the base of the PR and between 421fc74 and 037a45f.

📒 Files selected for processing (3)
  • README_WEBSOCKET_PROTOCOL.md
  • src/glados/audio_io/websocket_io.py
  • tests/audio-websocket-both.html
✅ Files skipped from review due to trivial changes (1)
  • tests/audio-websocket-both.html

Comment thread README_WEBSOCKET_PROTOCOL.md Outdated
Comment thread README_WEBSOCKET_PROTOCOL.md Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 99-107: When the websocket reconnects reapply the currently
selected room to the server so routing remains correct: in speakerWs.onopen (and
the reconnect path around the earlier 285-290 block) call the same routine that
the room-selection button uses to send the room tag (or directly send the room
message) — e.g., invoke the existing room-send handler/function with the current
selected room value so the server receives the room assignment on every fresh
connection.
- Around line 273-282: sendPingInit() currently retries forever for any non-OPEN
state; update it to stop retrying when the socket is CLOSING or CLOSED and/or
allow cancellation by storing the timer ID. Specifically, inside sendPingInit
(and where setTimeout is scheduled) check ws.readyState against
WebSocket.CLOSING and WebSocket.CLOSED and return early if either is true, or
assign the setTimeout return value to a variable (e.g., pingRetryTimer) and
clearTimeout(pingRetryTimer) from the ws.onclose handler to avoid orphaned
loops; ensure sendPingInit still calls sendPing() when ws.readyState ===
WebSocket.OPEN.
- Around line 127-135: The code currently closes and recreates speakerAudioCtx
on every packet; change the logic to only recreate when the incoming sample rate
differs from speakerCurrentSampleRate or when speakerAudioCtx is null/closed. In
the message handler that reads sampleRate and audio payload, compare the new
sample rate to speakerCurrentSampleRate, and if different (or if speakerAudioCtx
is falsy) then close existing speakerAudioCtx and create a new
window.AudioContext({ sampleRate: newRate }); otherwise reuse the existing
speakerAudioCtx; ensure you update speakerCurrentSampleRate when you create a
new context and honor userHasInteracted when deciding to create the context.
- Around line 139-142: The onended ACK uses mutable globals (speakerCurrentAudio
and speakerWs) so a delayed onended from a stopped or old source can send a
stale "played"; when scheduling playback (where you set speakerCurrentAudio and
install AudioBufferSourceNode.onended) capture the current source and current
WebSocket in local constants (e.g., scheduledSource and scheduledWs) and in the
onended handler only send speakerWs.send("played") if scheduledSource ===
speakerCurrentAudio and scheduledWs === speakerWs (or equivalently check a
captured connection id), thereby suppressing ACKs from stopped or replaced
sources; apply the same captured-check pattern to all places that attach onended
(and any delayed ACK logic around lines mentioned) so the ACK is bound to the
scheduled source+connection, not global mutable state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5af4d73b-912f-425f-942c-553036f082ab

📥 Commits

Reviewing files that changed from the base of the PR and between 037a45f and d169799.

📒 Files selected for processing (7)
  • README_WEBSOCKET_PROTOCOL.md
  • configs/glados_websocket_config.yaml
  • src/glados/audio_io/__init__.py
  • src/glados/audio_io/websocket_io.py
  • tests/audio-websocket-both.html
  • tests/audio-websocket-mic.html
  • tests/audio-websocket-speaker.html
✅ Files skipped from review due to trivial changes (4)
  • tests/audio-websocket-both.html
  • README_WEBSOCKET_PROTOCOL.md
  • configs/glados_websocket_config.yaml
  • src/glados/audio_io/websocket_io.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/glados/audio_io/init.py
  • tests/audio-websocket-mic.html

Comment thread tests/audio-websocket-speaker.html
Comment thread tests/audio-websocket-speaker.html
Comment thread tests/audio-websocket-speaker.html
Comment thread tests/audio-websocket-speaker.html
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/audio-websocket-speaker.html (1)

142-144: ⚠️ Potential issue | 🔴 Critical

Suppress played for reset/cancelled sources.

Line 143 calls stop(), and AudioBufferSourceNode.onended still fires for that path. Because Lines 214-218 only capture the socket, a reset or replaced source can still send "played" on the same connection; src/glados/audio_io/websocket_io.py:356-373 treats that as playback completion and can advance the next playback early.

Suggested change
-                } else if (event.data === "reset" && speakerCurrentAudio) {
-                    speakerCurrentAudio.stop();
+                } else if (event.data === "reset" && speakerCurrentAudio) {
+                    const source = speakerCurrentAudio;
+                    speakerCurrentAudio = null;
+                    source.onended = null;
+                    source.stop();
                     speakerStatus("Stream reset");
                 }
@@
-        speakerCurrentAudio = speakerAudioCtx.createBufferSource();
-        speakerCurrentAudio.buffer = audioBuffer;
-        speakerCurrentAudio.connect(speakerAudioCtx.destination);
+        const source = speakerAudioCtx.createBufferSource();
+        source.buffer = audioBuffer;
+        source.connect(speakerAudioCtx.destination);
+        speakerCurrentAudio = source;
@@
-        speakerCurrentAudio.start(audioCtxPlayTime, playOffset);
+        source.start(audioCtxPlayTime, playOffset);
@@
-        let localWs = speakerWs;
-        // send ACK on end
-        speakerCurrentAudio.onended = () => {
-            if (localWs && localWs.readyState === WebSocket.OPEN) {
+        const localWs = speakerWs;
+        // send ACK on natural end only
+        source.onended = () => {
+            if (
+                source === speakerCurrentAudio &&
+                localWs === speakerWs &&
+                localWs.readyState === WebSocket.OPEN
+            ) {
+                speakerCurrentAudio = null;
                 localWs.send("played");
                 speakerStatus("Current audio played.");
             }
         };
Does `AudioBufferSourceNode.onended` fire when `stop()` is called, and is guarding the callback with source identity the recommended way to suppress stale completion ACKs when cancelling playback?

Also applies to: 192-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-speaker.html` around lines 142 - 144, The onended
handler for speakerCurrentAudio still sends a "played" ACK even when stop() is
called or the source is replaced; fix by tracking source identity or a cancelled
flag so the onended callback only sends "played" for the active source: when
creating a new AudioBufferSourceNode assign a unique id or set
speakerCurrentAudio._id and store currentSpeakerId, clear or set a cancelled
flag before calling stop() on the old source (or remove its onended handler),
and have the onended callback check that the fired source's id matches
currentSpeakerId (or that cancelled is false) before invoking speakerStatus or
sending the "played" socket message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 100-107: The onclose handler must fully tear down the audio graph
to avoid leaking AudioContexts and scheduled sources: when the socket closes,
explicitly stop and disconnect speakerCurrentAudio (if non-null) and set it to
null, and close speakerAudioCtx (if non-null) via its close() method and set it
to null; update the onclose handler that currently lives near the socket close
logic to perform these steps so resources are released before the next onopen
recreates speakerAudioCtx or speakerCurrentAudio.

---

Duplicate comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 142-144: The onended handler for speakerCurrentAudio still sends a
"played" ACK even when stop() is called or the source is replaced; fix by
tracking source identity or a cancelled flag so the onended callback only sends
"played" for the active source: when creating a new AudioBufferSourceNode assign
a unique id or set speakerCurrentAudio._id and store currentSpeakerId, clear or
set a cancelled flag before calling stop() on the old source (or remove its
onended handler), and have the onended callback check that the fired source's id
matches currentSpeakerId (or that cancelled is false) before invoking
speakerStatus or sending the "played" socket message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87454683-bb72-4cdc-b8c8-809fdfb976cf

📥 Commits

Reviewing files that changed from the base of the PR and between d169799 and 92c6b0b.

📒 Files selected for processing (2)
  • tests/audio-websocket-mic.html
  • tests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/audio-websocket-mic.html

Comment thread tests/audio-websocket-speaker.html
@reisbauer03
Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/audio-websocket-speaker.html (1)

142-145: ⚠️ Potential issue | 🔴 Critical

Suppress played ACKs for interrupted sources.

Line 143 still calls stop() without disabling the completion handler, and Lines 225-233 only bind the ACK to the socket. AudioBufferSourceNode.onended also fires for explicit stops, so a reset can still emit a stale "played" on the same connection. The server consumes bare "played" as completion in src/glados/audio_io/websocket_io.py:356-362, which can let an interrupted chunk advance the next playback.

Suggested change
-                } else if (event.data === "reset" && speakerCurrentAudio) {
-                    speakerCurrentAudio.stop();
+                } else if (event.data === "reset" && speakerCurrentAudio) {
+                    const interruptedSource = speakerCurrentAudio;
+                    speakerCurrentAudio = null;
+                    interruptedSource.onended = null;
+                    try {
+                        interruptedSource.stop();
+                    } catch (_err) {}
                     speakerStatus("Stream reset");
                 }
@@
-        speakerCurrentAudio = speakerAudioCtx.createBufferSource();
-        speakerCurrentAudio.buffer = audioBuffer;
-        speakerCurrentAudio.connect(speakerAudioCtx.destination);
+        const scheduledSource = speakerAudioCtx.createBufferSource();
+        scheduledSource.buffer = audioBuffer;
+        scheduledSource.connect(speakerAudioCtx.destination);
+        speakerCurrentAudio = scheduledSource;
@@
-        speakerCurrentAudio.start(audioCtxPlayTime, playOffset);
+        scheduledSource.start(audioCtxPlayTime, playOffset);
@@
-        let localWs = speakerWs;
-        // send ACK on end
-        speakerCurrentAudio.onended = () => {
-            if (localWs && localWs.readyState === WebSocket.OPEN) {
+        const localWs = speakerWs;
+        scheduledSource.onended = () => {
+            if (
+                scheduledSource === speakerCurrentAudio &&
+                localWs === speakerWs &&
+                localWs.readyState === WebSocket.OPEN
+            ) {
+                speakerCurrentAudio = null;
                 localWs.send("played");
                 speakerStatus("Current audio played.");
             }
         };

Also applies to: 205-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-speaker.html` around lines 142 - 145, When handling the
"reset" event you must suppress the onended "played" ACK before calling
speakerCurrentAudio.stop(): set a short-lived flag (e.g. suppressPlayedAck =
true) or clear/unbind the onended handler attached to speakerCurrentAudio, then
call speakerCurrentAudio.stop() and update speakerStatus("Stream reset"); in the
onended handler (or where the socket "played" ACK is emitted/bound) check that
flag and only emit the "played" message when not suppressed, and finally clear
the flag after the reset path completes so future playbacks can ACK normally;
refer to speakerCurrentAudio.stop(), the AudioBufferSourceNode.onended handler,
and the socket ACK binding code that currently emits "played".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 142-145: When handling the "reset" event you must suppress the
onended "played" ACK before calling speakerCurrentAudio.stop(): set a
short-lived flag (e.g. suppressPlayedAck = true) or clear/unbind the onended
handler attached to speakerCurrentAudio, then call speakerCurrentAudio.stop()
and update speakerStatus("Stream reset"); in the onended handler (or where the
socket "played" ACK is emitted/bound) check that flag and only emit the "played"
message when not suppressed, and finally clear the flag after the reset path
completes so future playbacks can ACK normally; refer to
speakerCurrentAudio.stop(), the AudioBufferSourceNode.onended handler, and the
socket ACK binding code that currently emits "played".

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d381fcd-43c8-412a-b9e4-d9f361050a26

📥 Commits

Reviewing files that changed from the base of the PR and between 92c6b0b and 1111ce5.

📒 Files selected for processing (2)
  • tests/audio-websocket-mic.html
  • tests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/audio-websocket-mic.html

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 128-129: The constructor currently spawns self._server_thread
(threading.Thread target=lambda s, p: asyncio.run(self._run_server(s, p))) and
returns immediately so failures inside websockets.serve in _run_server are lost;
change the flow so _run_server captures binding/startup errors and communicates
success or exception back to the creator (e.g., via a threading.Event plus a
shared exception container or concurrent.futures.Future) and have __init__ wait
for that signal and re-raise the exception to the caller if present; apply the
same pattern to the other daemon thread creation that uses asyncio.run at the
other site so server startup failures propagate instead of leaving the app
thinking the websocket server started.
- Around line 200-216: The timeout path currently uses elapsed time to compute
percentage_played so a wait timeout makes played_samples ≈ total and yields
100%; change the logic in the Server playback wait block so you compute
played_samples/percentage_played only for a successful completion or a real
interruption that set _playback_was_interrupted earlier, and if
_playback_finished_event.wait() returned False (not completed) then set
interrupted = True and force percentage_played = 0 (instead of deriving from
elapsed); adjust the code around _playback_finished_event.wait(),
_playback_was_interrupted, elapsed, played_samples and percentage_played to
perform the override after the completed/interrupted checks so a timed-out
(never ACKed) utterance is not treated as fully spoken.

In `@src/glados/tools/slow_clap.py`:
- Around line 41-43: The constructor eagerly calls get_audio_system() even when
tool_config supplies "audio_io", which can create the wrong backend; change the
initialization so you first read audio_io = tool_config.get("audio_io") (no
default), and only call get_audio_system() to assign self.audio_io if that value
is None/missing (e.g., if tool_config.get("audio_io") is falsy then set
self.audio_io = get_audio_system()); keep self.audio_path assignment as-is and
update references to audio_io/self.audio_io in the SlowClap constructor
accordingly.

In `@tests/audio-websocket-mic.html`:
- Around line 79-91: The two separate startBtn click handlers cause duplicate
getUserMedia calls and premature disabling; consolidate into one async click
handler that first requests
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}), assigns
micStream, sets userHasInteracted, updates the button text/state only after
permission succeeds, then calls micConnect(); remove the other click listener
and ensure startMic() (the function invoked on sampleRate handshake) checks if
micStream already exists and does not call getUserMedia() again (adjust the
getUserMedia call at the startMic() fallback at the existing code around the
current lines 169-171).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a699fc67-511a-4536-830f-a81e7c2121be

📥 Commits

Reviewing files that changed from the base of the PR and between 69f6138 and 6914cdc.

📒 Files selected for processing (5)
  • src/glados/audio_io/sounddevice_io.py
  • src/glados/audio_io/websocket_io.py
  • src/glados/tools/slow_clap.py
  • tests/audio-websocket-mic.html
  • tests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/audio-websocket-speaker.html

Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py
Comment thread src/glados/tools/slow_clap.py Outdated
Comment on lines +79 to +91
startBtn.addEventListener('click', () => {
userHasInteracted = true;
startBtn.disabled = true;
startBtn.innerText = "Audio Enabled";

// Start microphone
if (!micWs || micWs.readyState === WebSocket.CLOSED) {
micConnect();
}
});
startBtn.addEventListener('click', async () => {
micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
})
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Collapse the start flow into one awaited click handler.

The first listener connects immediately while the second separately requests mic permission. If the sampleRate handshake arrives first, startMic() falls back to another getUserMedia() call at Lines 169-171, so one click can open two streams and later overwrite micStream. This also disables the button before permission succeeds, which leaves the page stuck after a denial.

🛠️ Proposed fix
-    startBtn.addEventListener('click', () => {
-        userHasInteracted = true;
-        startBtn.disabled = true;
-        startBtn.innerText = "Audio Enabled";
-
-        // Start microphone
-        if (!micWs || micWs.readyState === WebSocket.CLOSED) {
-            micConnect();
-        }
-    });
-    startBtn.addEventListener('click', async () => {
-        micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
-    })
+    startBtn.addEventListener('click', async () => {
+        if (userHasInteracted) return;
+        userHasInteracted = true;
+        startBtn.disabled = true;
+
+        try {
+            micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
+            startBtn.innerText = "Audio Enabled";
+
+            if (!micWs || micWs.readyState === WebSocket.CLOSED) {
+                micConnect();
+            }
+        } catch (err) {
+            userHasInteracted = false;
+            startBtn.disabled = false;
+            micStatus("Microphone permission denied");
+            console.error("Failed to access microphone:", err);
+        }
+    });

Also applies to: 169-171

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/audio-websocket-mic.html` around lines 79 - 91, The two separate
startBtn click handlers cause duplicate getUserMedia calls and premature
disabling; consolidate into one async click handler that first requests
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}), assigns
micStream, sets userHasInteracted, updates the button text/state only after
permission succeeds, then calls micConnect(); remove the other click listener
and ensure startMic() (the function invoked on sampleRate handshake) checks if
micStream already exists and does not call getUserMedia() again (adjust the
getUserMedia call at the startMic() fallback at the existing code around the
current lines 169-171).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a POC, im not gonna beautify this any further. It works well enough.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/glados/tools/slow_clap.py (1)

99-100: Broad exception catch is acceptable here for backend abstraction.

Static analysis flags BLE001 for catching Exception, but this is intentional—the tool must handle various audio backends that may raise different exception types (e.g., sounddevice.PortAudioError vs. WebSocket-related errors). The exception is properly logged and reported to the LLM rather than silently swallowed.

Consider documenting this intent with a brief comment if desired.

📝 Optional: Add clarifying comment
+        # Catch-all for audio backend errors (different backends raise different exceptions)
         except Exception as other_error:
             error_msg = f"error: other (possibly audio device) - {other_error}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/tools/slow_clap.py` around lines 99 - 100, The broad except block
using "except Exception as other_error" is intentional to centralize diverse
backend/audio and websocket errors; add a brief explanatory comment above this
except (referencing the except Exception as other_error and the error_msg
assignment) describing why catching Exception is required for multiple audio
backends, and suppress the BLE001 lint warning with an inline noqa comment
(e.g., # noqa: BLE001) so static analysis won't flag it while preserving the
current logging/reporting behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 272-273: The logging formatter string passed to
LogAdapter.setFormatter is using an incorrect "$(name)s" token; update the
format in the logging.Formatter call used with ws_log_handler (the LogAdapter
instance) to use the correct Python logging specifier "%(name)s" so the line
with ws_log_handler.setFormatter(logging.Formatter(...)) becomes a format string
using "%(name)s" instead of "$(name)s".

---

Nitpick comments:
In `@src/glados/tools/slow_clap.py`:
- Around line 99-100: The broad except block using "except Exception as
other_error" is intentional to centralize diverse backend/audio and websocket
errors; add a brief explanatory comment above this except (referencing the
except Exception as other_error and the error_msg assignment) describing why
catching Exception is required for multiple audio backends, and suppress the
BLE001 lint warning with an inline noqa comment (e.g., # noqa: BLE001) so static
analysis won't flag it while preserving the current logging/reporting behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d549a09-064b-4b62-a602-fd31783129f5

📥 Commits

Reviewing files that changed from the base of the PR and between 6914cdc and 4950d2d.

📒 Files selected for processing (2)
  • src/glados/audio_io/websocket_io.py
  • src/glados/tools/slow_clap.py

Comment thread src/glados/audio_io/websocket_io.py Outdated
@reisbauer03
Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/glados/audio_io/websocket_io.py (2)

279-286: Variable server shadows function parameter.

Line 280 reassigns server (the websocket server object) over the server parameter (the address string). While functionally correct, this shadowing can cause confusion during maintenance.

♻️ Proposed fix to use distinct variable name
         try:
-            server = await websockets.serve(self._server_listen, host=server, port=port)
+            ws_server = await websockets.serve(self._server_listen, host=server, port=port)
             result_future.set_result(None)
         except OSError as ex:
             result_future.set_exception(ex)
             raise
 
-        await server.serve_forever()
+        await ws_server.serve_forever()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 279 - 286, The local
variable named server is shadowing the function parameter server; rename the
websocket server object to a distinct name (e.g., ws_server or websocket_server)
where websockets.serve is assigned and where serve_forever() is awaited. Update
the assignment line that now reads await websockets.serve(self._server_listen,
host=server, port=port) to store the return in ws_server, replace
result_future.set_result(None)/set_exception(ex) moves unchanged, and call await
ws_server.serve_forever() so the parameter server remains unshadowed; reference
the _server_listen coroutine and result_future handling when making the change.

434-442: Consider using a bytearray or collections.deque for streaming audio accumulation.

np.append(current_data, data) allocates a new array on each call, leading to O(n²) complexity over time as chunks accumulate. For streaming audio, this could become a bottleneck with high-frequency small chunks.

♻️ Alternative approach using bytearray
-        # currently stored samples
-        current_data = np.empty((0,), dtype=np.float32)
+        # buffer for accumulating samples
+        audio_buffer = bytearray()

Then in the loop:

-                data = np.frombuffer(msg, dtype=np.float32)
-                current_data = np.append(current_data, data)
+                audio_buffer.extend(msg)
 
                 # if enough current data is there, run it through the VAD
-                if len(current_data) >= vad_needed_samples:
+                vad_needed_bytes = vad_needed_samples * 4
+                if len(audio_buffer) >= vad_needed_bytes:
                     # get data for VAD
-                    vad_data = current_data[:vad_needed_samples]
+                    vad_data = np.frombuffer(audio_buffer[:vad_needed_bytes], dtype=np.float32)
                     # extra data stays for next VAD
-                    current_data = current_data[vad_needed_samples:]
+                    del audio_buffer[:vad_needed_bytes]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 434 - 442, The current loop
uses np.append(current_data, data) which repeatedly reallocates and is O(n²);
replace the accumulation strategy for current_data with a streaming buffer
(e.g., a bytearray or collections.deque) to avoid repeated numpy copies, append
raw bytes/chunks to that buffer in the message loop, and only convert or
construct a numpy array when you need vad_needed_samples for VAD (use
slicing/pop from the bytearray or popleft from deque to remove used samples);
update code references around current_data, np.frombuffer(msg,
dtype=np.float32), and vad_needed_samples to perform the conversion/consumption
step atomically so you no longer call np.append for each chunk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 432-435: The bytes-to-float conversion in the message handler can
raise ValueError when the incoming msg length isn't a multiple of 4; before
calling np.frombuffer in the block guarded by isinstance(msg, bytes) and
self._is_listening, check that len(msg) % 4 == 0 and only call
np.frombuffer(msg, dtype=np.float32) when that holds, otherwise log a warning
(use the class logger or processLogger) and skip/drop the malformed chunk (or
increment a counter) so mic handling doesn't crash; additionally wrap
np.frombuffer in a try/except ValueError to defensively catch any remaining
malformed buffers and handle them the same way, referencing the same variables
current_data, self._is_listening and the np.frombuffer call.
- Around line 104-105: The assignment using bool(val) in the
"segregate_speakers" case will treat any non-empty string like "false" as True;
update the handler in websocket_io.py (the "segregate_speakers" case that sets
self._segregate_speakers) to detect and normalize string booleans: if val is
already a bool use it, if it's a string compare lowercased values against
accepted true tokens ("true","1","yes","y") and false tokens
("false","0","no","n") and set the attribute accordingly (defaulting to False
for unrecognized values); this ensures quoted "false" becomes False while
preserving actual boolean inputs.

---

Nitpick comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 279-286: The local variable named server is shadowing the function
parameter server; rename the websocket server object to a distinct name (e.g.,
ws_server or websocket_server) where websockets.serve is assigned and where
serve_forever() is awaited. Update the assignment line that now reads await
websockets.serve(self._server_listen, host=server, port=port) to store the
return in ws_server, replace result_future.set_result(None)/set_exception(ex)
moves unchanged, and call await ws_server.serve_forever() so the parameter
server remains unshadowed; reference the _server_listen coroutine and
result_future handling when making the change.
- Around line 434-442: The current loop uses np.append(current_data, data) which
repeatedly reallocates and is O(n²); replace the accumulation strategy for
current_data with a streaming buffer (e.g., a bytearray or collections.deque) to
avoid repeated numpy copies, append raw bytes/chunks to that buffer in the
message loop, and only convert or construct a numpy array when you need
vad_needed_samples for VAD (use slicing/pop from the bytearray or popleft from
deque to remove used samples); update code references around current_data,
np.frombuffer(msg, dtype=np.float32), and vad_needed_samples to perform the
conversion/consumption step atomically so you no longer call np.append for each
chunk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 602449a6-3b32-49f1-86ab-1d7a96ea4ca8

📥 Commits

Reviewing files that changed from the base of the PR and between 4950d2d and 36147fb.

📒 Files selected for processing (1)
  • src/glados/audio_io/websocket_io.py

Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
src/glados/audio_io/websocket_io.py (2)

169-172: ⚠️ Potential issue | 🟠 Major

Don't leave _is_playing latched when no speaker ever ACKs.

If no speaker is connected, Line 192 times out and returns, but _is_playing stays True. The next start_speaking() then hits the unbounded wait on Line 172 and can hang forever; src/glados/cli.py Lines 200-203 already call this path with wait=True. Handle the False return from both waits by forcing the track into a finished/interrupted state or by surfacing an error to the caller.

Also applies to: 190-192

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 169 - 172, When stopping
existing playback in start_speaking() you must handle the case where
_playback_finished_event.wait() (and the similar wait on lines ~190-192) returns
False due to a timeout so _is_playing does not remain latched; update the logic
in start_speaking() and stop_speaking() to check the boolean return of
_playback_finished_event.wait(), and if it returned False either (a) force the
track into a finished/interrupted state by clearing _is_playing and setting the
_playback_finished_event (or equivalent internal state) before proceeding, or
(b) raise/surface an error to the caller so the caller can handle the failed
stop; make this change around the uses of _is_playing, stop_speaking(),
_playback_finished_event.wait(), and start_speaking() so a timed-out wait cannot
leave _is_playing True.

133-140: ⚠️ Potential issue | 🟠 Major

Propagate every startup failure through startup_future, not just OSError.

Right now only the OSError path completes the future exceptionally. If _run_server() fails earlier in setup or raises a different exception type, Line 140 degrades the real startup error into a generic 10-second timeout.

💡 Suggested fix
     async def _run_server(self, server: str, port: int, result_future: concurrent.futures.Future) -> None:
         """Runs the websocket server.
@@
-        self._mic_state_lock = asyncio.Lock()
-
-        # re-route logging of websockets
-        class LogAdapter(logging.Handler):
-            def emit(self, record: logging.LogRecord) -> None:
-                msg = self.format(record)
-                level = record.levelname.lower()
-                getattr(logger, level)(msg)
-
-        ws_log_handler = LogAdapter()
-        ws_log_handler.setFormatter(logging.Formatter("[%(asctime)s] %(name)s %(message)s"))
-
-        ws_logger = logging.getLogger("websockets")
-        ws_logger.addHandler(ws_log_handler)
-        ws_logger.propagate = False
-
         try:
+            self._mic_state_lock = asyncio.Lock()
+
+            # re-route logging of websockets
+            class LogAdapter(logging.Handler):
+                def emit(self, record: logging.LogRecord) -> None:
+                    msg = self.format(record)
+                    level = record.levelname.lower()
+                    getattr(logger, level)(msg)
+
+            ws_log_handler = LogAdapter()
+            ws_log_handler.setFormatter(logging.Formatter("[%(asctime)s] %(name)s %(message)s"))
+
+            ws_logger = logging.getLogger("websockets")
+            ws_logger.addHandler(ws_log_handler)
+            ws_logger.propagate = False
+
             server = await websockets.serve(self._server_listen, host=server, port=port)
             result_future.set_result(None)
-        except OSError as ex:
-            result_future.set_exception(ex)
+        except Exception as ex:
+            if not result_future.done():
+                result_future.set_exception(ex)
             raise

Also applies to: 260-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 133 - 140, The
startup_future only gets set_exception for OSError, so any other exceptions from
_run_server or earlier setup are lost and manifest as a timeout; modify the
thread start wrapper (the lambda/target used for self._server_thread) to catch
all exceptions from asyncio.run(self._run_server(...)) and call
startup_future.set_exception(exc) for any Exception (not just OSError), ensuring
startup_future is also set_result on success; locate the threading.Thread
creation and the lambda/target that calls asyncio.run and add a broad
try/except/finally around the call to propagate all failures through
startup_future (also apply the same change to the other server-thread block
around the lines that correspond to the second server start).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 177-181: The AudioData instance stored under self._audio_data
inside start_speaking() currently holds a reference to caller-owned audio_data
which can be mutated later; to fix this, make a deep copy of the PCM buffer
before constructing AudioData so the _server_speaker() thread serializes
immutable data; update the start_speaking() block that acquires self._audio_lock
to copy audio_data (e.g., via bytes(...) or an explicit buffer copy) and then
set self._audio_data = AudioData(copied_buffer, sample_rate, play_time,
uuid.uuid4()), leaving all other logic and symbols (_audio_lock, _audio_data,
AudioData, _server_speaker, start_speaking) unchanged.
- Around line 456-491: The bytes-handling branch only processes one
vad_needed_samples window per recv, so large incoming chunks leave additional
full VAD frames buffered and potentially lost on disconnect; inside the bytes
and self._is_listening branch (where current_data is appended) replace the
single-frame processing with a loop: while len(current_data) >=
vad_needed_samples, slice vad_data = current_data[:vad_needed_samples], advance
current_data, run vad_model on vad_data and perform the same _mic_state and
_sample_queue logic (including updating room, silence_chunks, and current_id)
for each drained frame so all complete VAD frames are enqueued before returning
from recv.
- Around line 412-416: The reset path can fail before
set_flags_once(current_track_id, True) is called (e.g., if await
websocket.send("reset") raises), leaving _playback_finished_event unset and
causing deadlocks; modify the reset branch in the function handling
_stop_playback so that websocket.send("reset") is wrapped in try/except/finally
(or try/finally) and ensure set_flags_once(current_track_id, True) is always
executed in the finally block (log any send errors but do not skip the
finalization), referencing the existing websocket.send("reset"), set_flags_once,
_stop_playback, _playback_finished_event and current_track_id symbols.

---

Duplicate comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 169-172: When stopping existing playback in start_speaking() you
must handle the case where _playback_finished_event.wait() (and the similar wait
on lines ~190-192) returns False due to a timeout so _is_playing does not remain
latched; update the logic in start_speaking() and stop_speaking() to check the
boolean return of _playback_finished_event.wait(), and if it returned False
either (a) force the track into a finished/interrupted state by clearing
_is_playing and setting the _playback_finished_event (or equivalent internal
state) before proceeding, or (b) raise/surface an error to the caller so the
caller can handle the failed stop; make this change around the uses of
_is_playing, stop_speaking(), _playback_finished_event.wait(), and
start_speaking() so a timed-out wait cannot leave _is_playing True.
- Around line 133-140: The startup_future only gets set_exception for OSError,
so any other exceptions from _run_server or earlier setup are lost and manifest
as a timeout; modify the thread start wrapper (the lambda/target used for
self._server_thread) to catch all exceptions from
asyncio.run(self._run_server(...)) and call startup_future.set_exception(exc)
for any Exception (not just OSError), ensuring startup_future is also set_result
on success; locate the threading.Thread creation and the lambda/target that
calls asyncio.run and add a broad try/except/finally around the call to
propagate all failures through startup_future (also apply the same change to the
other server-thread block around the lines that correspond to the second server
start).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0b9396a-5bcd-4696-90a2-376441107884

📥 Commits

Reviewing files that changed from the base of the PR and between b37c7ef and 57c5e13.

📒 Files selected for processing (1)
  • src/glados/audio_io/websocket_io.py

Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py Outdated
Comment thread src/glados/audio_io/websocket_io.py
Speaker: copy audio data into Lock instead of reference exception handling on reset
Microphone: always process as much audio as possible
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/glados/audio_io/websocket_io.py (1)

283-285: Variable shadowing: server parameter reassigned to websocket server object.

The server parameter (the host address string) is shadowed by the assignment to the websocket server object on line 284. This works but reduces readability.

♻️ Suggested fix
-        try:
-            server = await websockets.serve(self._server_listen, host=server, port=port)
+        try:
+            ws_server = await websockets.serve(self._server_listen, host=server, port=port)
             result_future.set_result(None)
         except OSError as ex:
             result_future.set_exception(ex)
             raise
 
-        await server.serve_forever()
+        await ws_server.serve_forever()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/audio_io/websocket_io.py` around lines 283 - 285, The host
parameter named "server" is being shadowed when you assign the websockets.serve
return to it; rename the local variable to avoid confusion (e.g., ws_server or
server_obj) in the function/method containing the websockets.serve call (the
call referencing self._server_listen and result_future.set_result), update any
subsequent references that expect the websockets server object to use the new
name, and keep the original "server" parameter as the host string for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 169-173: The wait on self._playback_finished_event after calling
stop_speaking can block forever if no speakers are connected; update the logic
in the block that checks self._is_playing (the call site that currently calls
stop_speaking() and then self._playback_finished_event.wait()) to use a bounded
wait (e.g., self._playback_finished_event.wait(timeout=<reasonable-duration>))
and handle the timeout case (log a warning and proceed or forcibly clear state)
so start_speaking/stop_speaking and the _is_playing/_playback_finished_event
state cannot hang indefinitely.

---

Nitpick comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 283-285: The host parameter named "server" is being shadowed when
you assign the websockets.serve return to it; rename the local variable to avoid
confusion (e.g., ws_server or server_obj) in the function/method containing the
websockets.serve call (the call referencing self._server_listen and
result_future.set_result), update any subsequent references that expect the
websockets server object to use the new name, and keep the original "server"
parameter as the host string for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c11ac4b6-e208-49c1-823e-bb00d4766f04

📥 Commits

Reviewing files that changed from the base of the PR and between 57c5e13 and 0e4a229.

📒 Files selected for processing (1)
  • src/glados/audio_io/websocket_io.py

Comment thread src/glados/audio_io/websocket_io.py
@dnhkng
Copy link
Copy Markdown
Owner

dnhkng commented Apr 3, 2026

@reisbauer03 This requires a more extensive review than I can do on a mobile phone while ok the beach on holiday; I will get to it when I am home late April. Same for the other PRs. Thanks for the work!

@reisbauer03
Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

✅ Actions performed

Reviews resumed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants