Conversation
…ther code uses this interface
…eakers in the same room as the current microphone
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a WebSocket-based audio I/O backend with /speaker and /microphone endpoints (VAD-driven mic arbitration, scheduled playback, optional speaker segregation); extends AudioProtocol and factory to accept backend options and a blocking play flag; wires config options through Glados; replaces direct sounddevice usage with the backend abstraction; adds docs, browser tests, and a websockets runtime dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Browser as Browser Speaker
participant Server as WebsocketAudioIO
participant AudioCtx as Browser AudioContext
rect rgba(200,220,255,0.5)
Browser->>Server: CONNECT /speaker\nsend "room:<tag>"
Browser->>Server: "sync_ping"
Server->>Browser: "sync_pong:<ts>"
Server->>Browser: "sampleRate:<hz>"
Server->>Browser: "time:<play_time>"
Server->>Browser: <Float32 PCM bytes>
Browser->>AudioCtx: schedule playback at (time + offset)
AudioCtx-->>Browser: playback finished
Browser->>Server: "played"
end
sequenceDiagram
autonumber
participant MicBrowser as Browser Microphone
participant Server as WebsocketAudioIO
participant VAD as VAD Engine
participant Queue as Sample Queue
rect rgba(220,255,200,0.5)
MicBrowser->>Server: CONNECT /microphone\nsend "room:<tag>"
Server->>MicBrowser: "sampleRate:<hz>"
MicBrowser->>Server: <Float32 PCM chunk>
Server->>VAD: run VAD on chunk
VAD-->>Server: vad_confidence
Server->>Server: arbitrate MicState (owner/silence)
alt client owns mic
Server->>Queue: enqueue (samples, vad_confidence)
else not owner
Server--xMicBrowser: drop samples
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (6)
src/glados/tools/slow_clap.py (1)
97-99: Broad exception catch is acceptable here given backend abstraction.The static analysis flags
Exceptionas too broad (BLE001), but this is reasonable since different audio backends (sounddevice, websocket) may raise different exception types. The error message and logging provide sufficient context for debugging.If you want to be more explicit, you could consider logging the exception type:
Optional: More specific error logging
except Exception as other_error: - error_msg = f"error: other (possibly audio device) - {other_error}" + error_msg = f"error: {type(other_error).__name__} (possibly audio device) - {other_error}" logger.error(f"SlowClap: {error_msg}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/tools/slow_clap.py` around lines 97 - 99, The broad except Exception catch in slow_clap.py is acceptable given multiple audio backends, but improve diagnostics by including the exception type or full traceback in the log; update the except block that builds error_msg and calls logger.error (around the SlowClap handling) to either log the exception type (e.g., include type(other_error).__name__ in the message) or call logger.exception so the traceback is captured, keeping the existing message context ("SlowClap") intact.README_WEBSOCKET_PROTOCOL.md (1)
121-153: Add language specifier to fenced code blocks for lint compliance.The message flow example code blocks are missing language specifiers (markdownlint MD040). Using
textmakes the intent clear.Proposed fix
-``` +```text Client connects to /speaker Client: room:Living Room ...Apply similarly to the other two code blocks at lines 133 and 148.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README_WEBSOCKET_PROTOCOL.md` around lines 121 - 153, The three fenced code blocks that show the message flows (the speaker example starting "Client connects to /speaker", the Microphone Endpoint Flow starting "Client connects to /microphone", and the sync ping/pong example starting "Client connects to /speaker") need a language specifier for markdownlint compliance; update each opening fence from ``` to ```text so the blocks read as ```text ... ``` to make intent explicit and satisfy MD040.tests/audio-websocket-speaker.html (1)
273-279: Optional: Consider adding a retry limit tosendPingInit.The
sendPingInitfunction has no maximum retry count. While the retry loop is bounded by the WebSocket lifecycle (it stops once connected or when the connection closes and triggers reconnect), adding an explicit limit would be more defensive.♻️ Proposed fix
- function sendPingInit(i) { + function sendPingInit(i, maxRetries = 20) { + if (i >= maxRetries) { + console.warn("Failed to sync clock: max retries exceeded"); + return; + } if (ws.readyState === WebSocket.OPEN) { sendPing(); } else { setTimeout(() => sendPingInit(i + 1), 500); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/audio-websocket-speaker.html` around lines 273 - 279, The sendPingInit loop (function sendPingInit) has no retry cap; add a maxRetries parameter and counter (e.g., maxRetries = 10) and increment on each scheduled setTimeout; if ws.readyState !== WebSocket.OPEN and retries >= maxRetries, stop retrying and call a fallback (e.g., log via console.warn/processLogger and/or invoke the existing reconnect logic) instead of scheduling another setTimeout; ensure sendPingInit and any callers pass/initialize the retry count and reference ws and sendPing names unchanged.tests/audio-websocket-mic.html (2)
190-193: Consider revoking the blob URL to prevent memory leak.The blob URL created for the AudioWorklet module is never revoked. While minor for a test page, it's good practice to clean up.
♻️ Proposed fix
const blob = new Blob([workletCode], {type: 'application/javascript'}); const workletUrl = URL.createObjectURL(blob); await micAudioCtx.audioWorklet.addModule(workletUrl); + URL.revokeObjectURL(workletUrl); const workletNode = new AudioWorkletNode(micAudioCtx, "mic-processor");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/audio-websocket-mic.html` around lines 190 - 193, The Blob URL created as workletUrl for the AudioWorklet module is never revoked; after awaiting micAudioCtx.audioWorklet.addModule(workletUrl) revoke the object URL to avoid leaks by calling URL.revokeObjectURL(workletUrl) (ideally in a finally block or immediately after the await) so the temporary blob is released; locate the blob and workletUrl variables and the addModule call to insert the revoke.
75-87: Duplicate click listeners may cause race condition.There are two separate
clickevent listeners onstartBtn. Both will fire when the button is clicked:
- Lines 75-84: Sets flags and calls
micConnect()- Lines 85-87: Calls
navigator.mediaDevices.getUserMedia()This can cause issues because
micStreammight be set asynchronously by the second listener whilestartMic()(line 168) also attempts to get the stream ifmicStreamis null. Consider consolidating into a single listener.♻️ Proposed consolidation
- startBtn.addEventListener('click', () => { + startBtn.addEventListener('click', async () => { userHasInteracted = true; startBtn.disabled = true; startBtn.innerText = "Audio Enabled"; + // Get microphone stream first + micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}}); + // Start microphone if (!micWs || micWs.readyState === WebSocket.CLOSED) { micConnect(); } }); - startBtn.addEventListener('click', async () => { - micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}}); - })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/audio-websocket-mic.html` around lines 75 - 87, The button has two click listeners causing a race between setting flags/connect and acquiring micStream; consolidate into a single startBtn click handler that (1) sets userHasInteracted, disables the button and updates its text, (2) awaits navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}) and assigns micStream, handling errors, and then (3) calls micConnect() (or startMic() if that’s the intended flow) only after micStream is set; remove the duplicate listener so micStream is not populated concurrently and ensure any fallback in startMic() checks for null safely.src/glados/audio_io/websocket_io.py (1)
46-47: Binding to all interfaces (0.0.0.0) is intentional but consider documenting security implications.Static analysis (S104) flags binding to
0.0.0.0. This is intentional for a websocket server accepting remote connections, but consider adding a comment noting that this exposes the service on all network interfaces and should be firewalled in production if not intended.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 46 - 47, The SERVER constant currently set to "0.0.0.0" exposes the websocket server on all network interfaces; update the comment next to SERVER (and optionally mention PORT) to document this security implication by noting that binding to 0.0.0.0 allows remote connections from any interface and that this should be restricted, firewalled, or changed to a specific interface in production if not intended; include guidance to prefer a configurable value (env/config) and to ensure appropriate firewalling or access controls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Line 293: Replace the direct type comparison in the websocket message
conditional so it uses isinstance(ws_msg, str) instead of type(ws_msg) == str;
update the branch in the conditional that checks ws_msg (the line with "elif
type(ws_msg) == str and ws_msg.startswith('room')") so it reads an isinstance
check, ensuring subclasses of str are handled correctly while preserving the
existing startswith('room') logic and control flow in the message handler.
- Line 402: Replace the type equality check "type(msg) == str" with the Pythonic
isinstance check "isinstance(msg, str)" where the websocket message is examined
(the conditional that currently reads 'if type(msg) == str and
msg.startswith("room")'); update that conditional in the message-handling
function/method that uses the variable msg so it becomes "if isinstance(msg,
str) and msg.startswith('room')", ensuring correct behavior for subclasses of
str and clearer intent.
- Around line 36-41: The module/class docstring incorrectly mentions "using
sounddevice" even though this file implements audio I/O over WebSocket; update
the docstring in src/glados/audio_io/websocket_io.py (the module/class docstring
for WebSocketAudioIO/WebsocketAudioIO) to accurately describe that this is a
WebSocket-based Audio I/O implementation (handling sending/receiving audio
frames over WebSocket, VAD integration, and playback/capture orchestration) and
remove any references to sounddevice.
- Around line 343-351: The WAITING PHASE currently treats any
non-`sync_ping`/`room:*` message as an ACK; change the logic so playback is only
considered acknowledged when the received message equals the explicit "played"
ACK (as sent by tests). Locate the await websocket.recv() loop in
websocket_io.py (the WAITING PHASE) and update either the handle_default_msg
function or the loop's conditional to check message == "played" (or the exact
expected payload) before setting self._playback_was_interrupted = False and
breaking; ignore or continue on other unexpected/malformed messages.
- Around line 299-364: The handler currently uses shared flags (_is_playing,
_playback_finished_event, _playback_was_interrupted) so the first connection
that ACKs ("played") clears playback for everyone; change to per-connection ACK
tracking so completion is only signaled once all active speakers have
acknowledged. Add a per-connection identifier (e.g., websocket or speaker_id)
and maintain a set/dict like pending_play_acks keyed by that id; in
_server_speaker replace the code path that on ACK sets _is_playing=False /
_playback_finished_event.set() with code that marks that connection as ACKed
(via pending_play_acks.discard(id) or pending_play_acks[id]=True) and only when
the central playback manager (the code that sets _is_playing to True and sends
audio) observes pending_play_acks empty should it clear _is_playing, set
_playback_finished_event, and update _playback_was_interrupted; also ensure
per-connection resets/send "reset" are done only for that websocket and that the
segregated-room logic still checks room before waiting for its own ACK (see
_server_speaker, handle_default_msg, _audio_lock, and where
_is_playing/_playback_finished_event/_playback_was_interrupted are mutated).
In `@tests/audio-websocket-both.html`:
- Line 6: Update the <title> element that currently reads "GLaDOS Microphone" so
it accurately reflects that the page contains both audio components (speaker and
microphone) — e.g., change the title string to "GLaDOS Speaker & Microphone" or
"GLaDOS Audio — Speaker and Microphone"; locate the <title> tag containing
"GLaDOS Microphone" in tests/audio-websocket-both.html and replace its text
accordingly.
---
Nitpick comments:
In `@README_WEBSOCKET_PROTOCOL.md`:
- Around line 121-153: The three fenced code blocks that show the message flows
(the speaker example starting "Client connects to /speaker", the Microphone
Endpoint Flow starting "Client connects to /microphone", and the sync ping/pong
example starting "Client connects to /speaker") need a language specifier for
markdownlint compliance; update each opening fence from ``` to ```text so the
blocks read as ```text ... ``` to make intent explicit and satisfy MD040.
In `@src/glados/audio_io/websocket_io.py`:
- Around line 46-47: The SERVER constant currently set to "0.0.0.0" exposes the
websocket server on all network interfaces; update the comment next to SERVER
(and optionally mention PORT) to document this security implication by noting
that binding to 0.0.0.0 allows remote connections from any interface and that
this should be restricted, firewalled, or changed to a specific interface in
production if not intended; include guidance to prefer a configurable value
(env/config) and to ensure appropriate firewalling or access controls.
In `@src/glados/tools/slow_clap.py`:
- Around line 97-99: The broad except Exception catch in slow_clap.py is
acceptable given multiple audio backends, but improve diagnostics by including
the exception type or full traceback in the log; update the except block that
builds error_msg and calls logger.error (around the SlowClap handling) to either
log the exception type (e.g., include type(other_error).__name__ in the message)
or call logger.exception so the traceback is captured, keeping the existing
message context ("SlowClap") intact.
In `@tests/audio-websocket-mic.html`:
- Around line 190-193: The Blob URL created as workletUrl for the AudioWorklet
module is never revoked; after awaiting
micAudioCtx.audioWorklet.addModule(workletUrl) revoke the object URL to avoid
leaks by calling URL.revokeObjectURL(workletUrl) (ideally in a finally block or
immediately after the await) so the temporary blob is released; locate the blob
and workletUrl variables and the addModule call to insert the revoke.
- Around line 75-87: The button has two click listeners causing a race between
setting flags/connect and acquiring micStream; consolidate into a single
startBtn click handler that (1) sets userHasInteracted, disables the button and
updates its text, (2) awaits
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}) and assigns
micStream, handling errors, and then (3) calls micConnect() (or startMic() if
that’s the intended flow) only after micStream is set; remove the duplicate
listener so micStream is not populated concurrently and ensure any fallback in
startMic() checks for null safely.
In `@tests/audio-websocket-speaker.html`:
- Around line 273-279: The sendPingInit loop (function sendPingInit) has no
retry cap; add a maxRetries parameter and counter (e.g., maxRetries = 10) and
increment on each scheduled setTimeout; if ws.readyState !== WebSocket.OPEN and
retries >= maxRetries, stop retrying and call a fallback (e.g., log via
console.warn/processLogger and/or invoke the existing reconnect logic) instead
of scheduling another setTimeout; ensure sendPingInit and any callers
pass/initialize the retry count and reference ws and sendPing names unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d3275803-1a71-4565-8dee-5de8d8ae3f79
📒 Files selected for processing (13)
README.mdREADME_WEBSOCKET_PROTOCOL.mdconfigs/glados_websocket_config.yamlpyproject.tomlsrc/glados/audio_io/__init__.pysrc/glados/audio_io/sounddevice_io.pysrc/glados/audio_io/websocket_io.pysrc/glados/cli.pysrc/glados/core/engine.pysrc/glados/tools/slow_clap.pytests/audio-websocket-both.htmltests/audio-websocket-mic.htmltests/audio-websocket-speaker.html
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README_WEBSOCKET_PROTOCOL.md`:
- Line 121: In README_WEBSOCKET_PROTOCOL.md there are three unlabeled fenced
code blocks (the blocks beginning with the lines "Client connects to /speaker"
and "Client connects to /microphone" and the repeated speaker block) that
trigger MD040; fix them by adding a language identifier (e.g., text) to each
opening fence so they read ```text instead of ```, ensuring all three
occurrences are updated consistently.
- Line 128: The example flow uses the generic label "Client: ACK" but the
protocol defines the ACK message payload as "played"; update the flow example so
the client step sends the explicit payload "played" (replace "Client: ACK" with
"Client: played") and ensure any surrounding text or examples refer to the ACK
message by its defined payload name "played" to avoid mismatch.
In `@src/glados/audio_io/websocket_io.py`:
- Line 61: The constructor docstring for WebsocketIO.__init__ currently
references "Initialize the sounddevice audio I/O", which is stale and
misleading; update the docstring to accurately describe that this backend is
websocket-based (e.g., "Initialize the websocket-based audio I/O" or similar) so
anyone reading the class WebsocketIO (and its __init__ method) understands the
actual transport; make the change only to the docstring text with no behavior
changes.
- Around line 300-303: The room-tag parsing can raise IndexError when ws_msg
lacks a colon; update the handler that processes ws_msg (the branch using
ws_msg.startswith("room") which assigns to nonlocal room) to validate the
payload before accessing split()[1] — either check that ":" is present and the
right-hand part is non-empty or use a safe split with maxsplit=1 and verify
length, and if malformed log/warn and ignore the message (return False) instead
of crashing.
- Line 53: The SERVER constant currently defaults to "0.0.0.0" exposing the
websocket to all interfaces; change the default value of SERVER in
websocket_io.py to the loopback address "127.0.0.1" so the service binds to
localhost by default, and ensure any code that reads/overrides SERVER (e.g.,
WebsocketServer initialization or config loader) still accepts explicit
overrides (env/config) for remote binding; update any related documentation or
config comments to state that remote exposure requires an explicit change.
- Line 397: The sent microphone sample-rate frame is missing the required
prefix; change the send so it follows the protocol format "sampleRate:<hz>"
instead of sending only the numeric value — update the websocket.send call that
currently uses self.SAMPLE_RATE to send f"sampleRate:{self.SAMPLE_RATE}"
(referencing the existing self.SAMPLE_RATE and the websocket.send invocation in
websocket_io.py).
- Around line 412-415: The code assumes every incoming websocket message (msg)
is raw PCM bytes and directly calls np.frombuffer(msg, dtype=np.float32), which
will raise on text/control frames; update the handling in the websocket receive
loop (references: self._is_listening, current_data, msg, np.frombuffer) to first
validate the payload is binary (bytes or bytearray) before decoding, e.g., skip
or log and ignore non-bytes messages, and wrap the np.frombuffer call in a
minimal try/except to catch/ignore decoding errors so the mic handling doesn't
crash.
- Around line 175-176: The current wait path uses
self._playback_finished_event.wait() with no timeout which can block forever;
change the waiting logic in the method that checks "if wait:" to call
self._playback_finished_event.wait(timeout=some_seconds) (or accept a
wait_timeout parameter) and then check the boolean return value; if it returns
False, log or raise a timeout/RuntimeError so callers don't hang. Update any
callers or signature as needed to expose a configurable timeout and reference
_playback_finished_event in the adjusted logic.
- Around line 337-342: The code currently holds self._audio_lock (a
threading.Lock) across await websocket.send() calls; change it to acquire the
lock only to synchronously snapshot the needed fields and a safe copy of the
buffer, then release the lock before any awaits. Concretely: inside the block
guarded by self._audio_lock read self._audio_data.play_time and .sample_rate
into local vars and make a thread-safe copy of the audio bytes (e.g., data_bytes
= self._audio_data.data.tobytes() or data_copy =
self._audio_data.data.copy().tobytes()), release the lock, then call await
websocket.send(...) three times using those locals (play_time, sample_rate,
data_bytes). Ensure you reference self._audio_lock, self._audio_data, and
websocket.send in the change and do not perform any awaits while holding the
lock.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3a17d1ea-8395-4712-b0e4-dbd51f022a86
📒 Files selected for processing (3)
README_WEBSOCKET_PROTOCOL.mdsrc/glados/audio_io/websocket_io.pytests/audio-websocket-both.html
✅ Files skipped from review due to trivial changes (1)
- tests/audio-websocket-both.html
… protocol and validation fixes (review by coderabbit)
037a45f to
d169799
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 99-107: When the websocket reconnects reapply the currently
selected room to the server so routing remains correct: in speakerWs.onopen (and
the reconnect path around the earlier 285-290 block) call the same routine that
the room-selection button uses to send the room tag (or directly send the room
message) — e.g., invoke the existing room-send handler/function with the current
selected room value so the server receives the room assignment on every fresh
connection.
- Around line 273-282: sendPingInit() currently retries forever for any non-OPEN
state; update it to stop retrying when the socket is CLOSING or CLOSED and/or
allow cancellation by storing the timer ID. Specifically, inside sendPingInit
(and where setTimeout is scheduled) check ws.readyState against
WebSocket.CLOSING and WebSocket.CLOSED and return early if either is true, or
assign the setTimeout return value to a variable (e.g., pingRetryTimer) and
clearTimeout(pingRetryTimer) from the ws.onclose handler to avoid orphaned
loops; ensure sendPingInit still calls sendPing() when ws.readyState ===
WebSocket.OPEN.
- Around line 127-135: The code currently closes and recreates speakerAudioCtx
on every packet; change the logic to only recreate when the incoming sample rate
differs from speakerCurrentSampleRate or when speakerAudioCtx is null/closed. In
the message handler that reads sampleRate and audio payload, compare the new
sample rate to speakerCurrentSampleRate, and if different (or if speakerAudioCtx
is falsy) then close existing speakerAudioCtx and create a new
window.AudioContext({ sampleRate: newRate }); otherwise reuse the existing
speakerAudioCtx; ensure you update speakerCurrentSampleRate when you create a
new context and honor userHasInteracted when deciding to create the context.
- Around line 139-142: The onended ACK uses mutable globals (speakerCurrentAudio
and speakerWs) so a delayed onended from a stopped or old source can send a
stale "played"; when scheduling playback (where you set speakerCurrentAudio and
install AudioBufferSourceNode.onended) capture the current source and current
WebSocket in local constants (e.g., scheduledSource and scheduledWs) and in the
onended handler only send speakerWs.send("played") if scheduledSource ===
speakerCurrentAudio and scheduledWs === speakerWs (or equivalently check a
captured connection id), thereby suppressing ACKs from stopped or replaced
sources; apply the same captured-check pattern to all places that attach onended
(and any delayed ACK logic around lines mentioned) so the ACK is bound to the
scheduled source+connection, not global mutable state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5af4d73b-912f-425f-942c-553036f082ab
📒 Files selected for processing (7)
README_WEBSOCKET_PROTOCOL.mdconfigs/glados_websocket_config.yamlsrc/glados/audio_io/__init__.pysrc/glados/audio_io/websocket_io.pytests/audio-websocket-both.htmltests/audio-websocket-mic.htmltests/audio-websocket-speaker.html
✅ Files skipped from review due to trivial changes (4)
- tests/audio-websocket-both.html
- README_WEBSOCKET_PROTOCOL.md
- configs/glados_websocket_config.yaml
- src/glados/audio_io/websocket_io.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/glados/audio_io/init.py
- tests/audio-websocket-mic.html
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/audio-websocket-speaker.html (1)
142-144:⚠️ Potential issue | 🔴 CriticalSuppress
playedfor reset/cancelled sources.Line 143 calls
stop(), andAudioBufferSourceNode.onendedstill fires for that path. Because Lines 214-218 only capture the socket, a reset or replaced source can still send"played"on the same connection;src/glados/audio_io/websocket_io.py:356-373treats that as playback completion and can advance the next playback early.Suggested change
- } else if (event.data === "reset" && speakerCurrentAudio) { - speakerCurrentAudio.stop(); + } else if (event.data === "reset" && speakerCurrentAudio) { + const source = speakerCurrentAudio; + speakerCurrentAudio = null; + source.onended = null; + source.stop(); speakerStatus("Stream reset"); } @@ - speakerCurrentAudio = speakerAudioCtx.createBufferSource(); - speakerCurrentAudio.buffer = audioBuffer; - speakerCurrentAudio.connect(speakerAudioCtx.destination); + const source = speakerAudioCtx.createBufferSource(); + source.buffer = audioBuffer; + source.connect(speakerAudioCtx.destination); + speakerCurrentAudio = source; @@ - speakerCurrentAudio.start(audioCtxPlayTime, playOffset); + source.start(audioCtxPlayTime, playOffset); @@ - let localWs = speakerWs; - // send ACK on end - speakerCurrentAudio.onended = () => { - if (localWs && localWs.readyState === WebSocket.OPEN) { + const localWs = speakerWs; + // send ACK on natural end only + source.onended = () => { + if ( + source === speakerCurrentAudio && + localWs === speakerWs && + localWs.readyState === WebSocket.OPEN + ) { + speakerCurrentAudio = null; localWs.send("played"); speakerStatus("Current audio played."); } };Does `AudioBufferSourceNode.onended` fire when `stop()` is called, and is guarding the callback with source identity the recommended way to suppress stale completion ACKs when cancelling playback?Also applies to: 192-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/audio-websocket-speaker.html` around lines 142 - 144, The onended handler for speakerCurrentAudio still sends a "played" ACK even when stop() is called or the source is replaced; fix by tracking source identity or a cancelled flag so the onended callback only sends "played" for the active source: when creating a new AudioBufferSourceNode assign a unique id or set speakerCurrentAudio._id and store currentSpeakerId, clear or set a cancelled flag before calling stop() on the old source (or remove its onended handler), and have the onended callback check that the fired source's id matches currentSpeakerId (or that cancelled is false) before invoking speakerStatus or sending the "played" socket message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 100-107: The onclose handler must fully tear down the audio graph
to avoid leaking AudioContexts and scheduled sources: when the socket closes,
explicitly stop and disconnect speakerCurrentAudio (if non-null) and set it to
null, and close speakerAudioCtx (if non-null) via its close() method and set it
to null; update the onclose handler that currently lives near the socket close
logic to perform these steps so resources are released before the next onopen
recreates speakerAudioCtx or speakerCurrentAudio.
---
Duplicate comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 142-144: The onended handler for speakerCurrentAudio still sends a
"played" ACK even when stop() is called or the source is replaced; fix by
tracking source identity or a cancelled flag so the onended callback only sends
"played" for the active source: when creating a new AudioBufferSourceNode assign
a unique id or set speakerCurrentAudio._id and store currentSpeakerId, clear or
set a cancelled flag before calling stop() on the old source (or remove its
onended handler), and have the onended callback check that the fired source's id
matches currentSpeakerId (or that cancelled is false) before invoking
speakerStatus or sending the "played" socket message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87454683-bb72-4cdc-b8c8-809fdfb976cf
📒 Files selected for processing (2)
tests/audio-websocket-mic.htmltests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/audio-websocket-mic.html
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/audio-websocket-speaker.html (1)
142-145:⚠️ Potential issue | 🔴 CriticalSuppress
playedACKs for interrupted sources.Line 143 still calls
stop()without disabling the completion handler, and Lines 225-233 only bind the ACK to the socket.AudioBufferSourceNode.onendedalso fires for explicit stops, so aresetcan still emit a stale"played"on the same connection. The server consumes bare"played"as completion insrc/glados/audio_io/websocket_io.py:356-362, which can let an interrupted chunk advance the next playback.Suggested change
- } else if (event.data === "reset" && speakerCurrentAudio) { - speakerCurrentAudio.stop(); + } else if (event.data === "reset" && speakerCurrentAudio) { + const interruptedSource = speakerCurrentAudio; + speakerCurrentAudio = null; + interruptedSource.onended = null; + try { + interruptedSource.stop(); + } catch (_err) {} speakerStatus("Stream reset"); } @@ - speakerCurrentAudio = speakerAudioCtx.createBufferSource(); - speakerCurrentAudio.buffer = audioBuffer; - speakerCurrentAudio.connect(speakerAudioCtx.destination); + const scheduledSource = speakerAudioCtx.createBufferSource(); + scheduledSource.buffer = audioBuffer; + scheduledSource.connect(speakerAudioCtx.destination); + speakerCurrentAudio = scheduledSource; @@ - speakerCurrentAudio.start(audioCtxPlayTime, playOffset); + scheduledSource.start(audioCtxPlayTime, playOffset); @@ - let localWs = speakerWs; - // send ACK on end - speakerCurrentAudio.onended = () => { - if (localWs && localWs.readyState === WebSocket.OPEN) { + const localWs = speakerWs; + scheduledSource.onended = () => { + if ( + scheduledSource === speakerCurrentAudio && + localWs === speakerWs && + localWs.readyState === WebSocket.OPEN + ) { + speakerCurrentAudio = null; localWs.send("played"); speakerStatus("Current audio played."); } };Also applies to: 205-233
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/audio-websocket-speaker.html` around lines 142 - 145, When handling the "reset" event you must suppress the onended "played" ACK before calling speakerCurrentAudio.stop(): set a short-lived flag (e.g. suppressPlayedAck = true) or clear/unbind the onended handler attached to speakerCurrentAudio, then call speakerCurrentAudio.stop() and update speakerStatus("Stream reset"); in the onended handler (or where the socket "played" ACK is emitted/bound) check that flag and only emit the "played" message when not suppressed, and finally clear the flag after the reset path completes so future playbacks can ACK normally; refer to speakerCurrentAudio.stop(), the AudioBufferSourceNode.onended handler, and the socket ACK binding code that currently emits "played".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/audio-websocket-speaker.html`:
- Around line 142-145: When handling the "reset" event you must suppress the
onended "played" ACK before calling speakerCurrentAudio.stop(): set a
short-lived flag (e.g. suppressPlayedAck = true) or clear/unbind the onended
handler attached to speakerCurrentAudio, then call speakerCurrentAudio.stop()
and update speakerStatus("Stream reset"); in the onended handler (or where the
socket "played" ACK is emitted/bound) check that flag and only emit the "played"
message when not suppressed, and finally clear the flag after the reset path
completes so future playbacks can ACK normally; refer to
speakerCurrentAudio.stop(), the AudioBufferSourceNode.onended handler, and the
socket ACK binding code that currently emits "played".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d381fcd-43c8-412a-b9e4-d9f361050a26
📒 Files selected for processing (2)
tests/audio-websocket-mic.htmltests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/audio-websocket-mic.html
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 128-129: The constructor currently spawns self._server_thread
(threading.Thread target=lambda s, p: asyncio.run(self._run_server(s, p))) and
returns immediately so failures inside websockets.serve in _run_server are lost;
change the flow so _run_server captures binding/startup errors and communicates
success or exception back to the creator (e.g., via a threading.Event plus a
shared exception container or concurrent.futures.Future) and have __init__ wait
for that signal and re-raise the exception to the caller if present; apply the
same pattern to the other daemon thread creation that uses asyncio.run at the
other site so server startup failures propagate instead of leaving the app
thinking the websocket server started.
- Around line 200-216: The timeout path currently uses elapsed time to compute
percentage_played so a wait timeout makes played_samples ≈ total and yields
100%; change the logic in the Server playback wait block so you compute
played_samples/percentage_played only for a successful completion or a real
interruption that set _playback_was_interrupted earlier, and if
_playback_finished_event.wait() returned False (not completed) then set
interrupted = True and force percentage_played = 0 (instead of deriving from
elapsed); adjust the code around _playback_finished_event.wait(),
_playback_was_interrupted, elapsed, played_samples and percentage_played to
perform the override after the completed/interrupted checks so a timed-out
(never ACKed) utterance is not treated as fully spoken.
In `@src/glados/tools/slow_clap.py`:
- Around line 41-43: The constructor eagerly calls get_audio_system() even when
tool_config supplies "audio_io", which can create the wrong backend; change the
initialization so you first read audio_io = tool_config.get("audio_io") (no
default), and only call get_audio_system() to assign self.audio_io if that value
is None/missing (e.g., if tool_config.get("audio_io") is falsy then set
self.audio_io = get_audio_system()); keep self.audio_path assignment as-is and
update references to audio_io/self.audio_io in the SlowClap constructor
accordingly.
In `@tests/audio-websocket-mic.html`:
- Around line 79-91: The two separate startBtn click handlers cause duplicate
getUserMedia calls and premature disabling; consolidate into one async click
handler that first requests
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}), assigns
micStream, sets userHasInteracted, updates the button text/state only after
permission succeeds, then calls micConnect(); remove the other click listener
and ensure startMic() (the function invoked on sampleRate handshake) checks if
micStream already exists and does not call getUserMedia() again (adjust the
getUserMedia call at the startMic() fallback at the existing code around the
current lines 169-171).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a699fc67-511a-4536-830f-a81e7c2121be
📒 Files selected for processing (5)
src/glados/audio_io/sounddevice_io.pysrc/glados/audio_io/websocket_io.pysrc/glados/tools/slow_clap.pytests/audio-websocket-mic.htmltests/audio-websocket-speaker.html
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/audio-websocket-speaker.html
| startBtn.addEventListener('click', () => { | ||
| userHasInteracted = true; | ||
| startBtn.disabled = true; | ||
| startBtn.innerText = "Audio Enabled"; | ||
|
|
||
| // Start microphone | ||
| if (!micWs || micWs.readyState === WebSocket.CLOSED) { | ||
| micConnect(); | ||
| } | ||
| }); | ||
| startBtn.addEventListener('click', async () => { | ||
| micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}}); | ||
| }) |
There was a problem hiding this comment.
Collapse the start flow into one awaited click handler.
The first listener connects immediately while the second separately requests mic permission. If the sampleRate handshake arrives first, startMic() falls back to another getUserMedia() call at Lines 169-171, so one click can open two streams and later overwrite micStream. This also disables the button before permission succeeds, which leaves the page stuck after a denial.
🛠️ Proposed fix
- startBtn.addEventListener('click', () => {
- userHasInteracted = true;
- startBtn.disabled = true;
- startBtn.innerText = "Audio Enabled";
-
- // Start microphone
- if (!micWs || micWs.readyState === WebSocket.CLOSED) {
- micConnect();
- }
- });
- startBtn.addEventListener('click', async () => {
- micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
- })
+ startBtn.addEventListener('click', async () => {
+ if (userHasInteracted) return;
+ userHasInteracted = true;
+ startBtn.disabled = true;
+
+ try {
+ micStream = await navigator.mediaDevices.getUserMedia({audio: {channelCount: 1}});
+ startBtn.innerText = "Audio Enabled";
+
+ if (!micWs || micWs.readyState === WebSocket.CLOSED) {
+ micConnect();
+ }
+ } catch (err) {
+ userHasInteracted = false;
+ startBtn.disabled = false;
+ micStatus("Microphone permission denied");
+ console.error("Failed to access microphone:", err);
+ }
+ });Also applies to: 169-171
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/audio-websocket-mic.html` around lines 79 - 91, The two separate
startBtn click handlers cause duplicate getUserMedia calls and premature
disabling; consolidate into one async click handler that first requests
navigator.mediaDevices.getUserMedia({audio:{channelCount:1}}), assigns
micStream, sets userHasInteracted, updates the button text/state only after
permission succeeds, then calls micConnect(); remove the other click listener
and ensure startMic() (the function invoked on sampleRate handshake) checks if
micStream already exists and does not call getUserMedia() again (adjust the
getUserMedia call at the startMic() fallback at the existing code around the
current lines 169-171).
There was a problem hiding this comment.
It's a POC, im not gonna beautify this any further. It works well enough.
…gation, slow_clap: lazy default init
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/glados/tools/slow_clap.py (1)
99-100: Broad exception catch is acceptable here for backend abstraction.Static analysis flags
BLE001for catchingException, but this is intentional—the tool must handle various audio backends that may raise different exception types (e.g.,sounddevice.PortAudioErrorvs. WebSocket-related errors). The exception is properly logged and reported to the LLM rather than silently swallowed.Consider documenting this intent with a brief comment if desired.
📝 Optional: Add clarifying comment
+ # Catch-all for audio backend errors (different backends raise different exceptions) except Exception as other_error: error_msg = f"error: other (possibly audio device) - {other_error}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/tools/slow_clap.py` around lines 99 - 100, The broad except block using "except Exception as other_error" is intentional to centralize diverse backend/audio and websocket errors; add a brief explanatory comment above this except (referencing the except Exception as other_error and the error_msg assignment) describing why catching Exception is required for multiple audio backends, and suppress the BLE001 lint warning with an inline noqa comment (e.g., # noqa: BLE001) so static analysis won't flag it while preserving the current logging/reporting behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 272-273: The logging formatter string passed to
LogAdapter.setFormatter is using an incorrect "$(name)s" token; update the
format in the logging.Formatter call used with ws_log_handler (the LogAdapter
instance) to use the correct Python logging specifier "%(name)s" so the line
with ws_log_handler.setFormatter(logging.Formatter(...)) becomes a format string
using "%(name)s" instead of "$(name)s".
---
Nitpick comments:
In `@src/glados/tools/slow_clap.py`:
- Around line 99-100: The broad except block using "except Exception as
other_error" is intentional to centralize diverse backend/audio and websocket
errors; add a brief explanatory comment above this except (referencing the
except Exception as other_error and the error_msg assignment) describing why
catching Exception is required for multiple audio backends, and suppress the
BLE001 lint warning with an inline noqa comment (e.g., # noqa: BLE001) so static
analysis won't flag it while preserving the current logging/reporting behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d549a09-064b-4b62-a602-fd31783129f5
📒 Files selected for processing (2)
src/glados/audio_io/websocket_io.pysrc/glados/tools/slow_clap.py
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/glados/audio_io/websocket_io.py (2)
279-286: Variableservershadows function parameter.Line 280 reassigns
server(the websocket server object) over theserverparameter (the address string). While functionally correct, this shadowing can cause confusion during maintenance.♻️ Proposed fix to use distinct variable name
try: - server = await websockets.serve(self._server_listen, host=server, port=port) + ws_server = await websockets.serve(self._server_listen, host=server, port=port) result_future.set_result(None) except OSError as ex: result_future.set_exception(ex) raise - await server.serve_forever() + await ws_server.serve_forever()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 279 - 286, The local variable named server is shadowing the function parameter server; rename the websocket server object to a distinct name (e.g., ws_server or websocket_server) where websockets.serve is assigned and where serve_forever() is awaited. Update the assignment line that now reads await websockets.serve(self._server_listen, host=server, port=port) to store the return in ws_server, replace result_future.set_result(None)/set_exception(ex) moves unchanged, and call await ws_server.serve_forever() so the parameter server remains unshadowed; reference the _server_listen coroutine and result_future handling when making the change.
434-442: Consider using abytearrayorcollections.dequefor streaming audio accumulation.
np.append(current_data, data)allocates a new array on each call, leading to O(n²) complexity over time as chunks accumulate. For streaming audio, this could become a bottleneck with high-frequency small chunks.♻️ Alternative approach using bytearray
- # currently stored samples - current_data = np.empty((0,), dtype=np.float32) + # buffer for accumulating samples + audio_buffer = bytearray()Then in the loop:
- data = np.frombuffer(msg, dtype=np.float32) - current_data = np.append(current_data, data) + audio_buffer.extend(msg) # if enough current data is there, run it through the VAD - if len(current_data) >= vad_needed_samples: + vad_needed_bytes = vad_needed_samples * 4 + if len(audio_buffer) >= vad_needed_bytes: # get data for VAD - vad_data = current_data[:vad_needed_samples] + vad_data = np.frombuffer(audio_buffer[:vad_needed_bytes], dtype=np.float32) # extra data stays for next VAD - current_data = current_data[vad_needed_samples:] + del audio_buffer[:vad_needed_bytes]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 434 - 442, The current loop uses np.append(current_data, data) which repeatedly reallocates and is O(n²); replace the accumulation strategy for current_data with a streaming buffer (e.g., a bytearray or collections.deque) to avoid repeated numpy copies, append raw bytes/chunks to that buffer in the message loop, and only convert or construct a numpy array when you need vad_needed_samples for VAD (use slicing/pop from the bytearray or popleft from deque to remove used samples); update code references around current_data, np.frombuffer(msg, dtype=np.float32), and vad_needed_samples to perform the conversion/consumption step atomically so you no longer call np.append for each chunk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 432-435: The bytes-to-float conversion in the message handler can
raise ValueError when the incoming msg length isn't a multiple of 4; before
calling np.frombuffer in the block guarded by isinstance(msg, bytes) and
self._is_listening, check that len(msg) % 4 == 0 and only call
np.frombuffer(msg, dtype=np.float32) when that holds, otherwise log a warning
(use the class logger or processLogger) and skip/drop the malformed chunk (or
increment a counter) so mic handling doesn't crash; additionally wrap
np.frombuffer in a try/except ValueError to defensively catch any remaining
malformed buffers and handle them the same way, referencing the same variables
current_data, self._is_listening and the np.frombuffer call.
- Around line 104-105: The assignment using bool(val) in the
"segregate_speakers" case will treat any non-empty string like "false" as True;
update the handler in websocket_io.py (the "segregate_speakers" case that sets
self._segregate_speakers) to detect and normalize string booleans: if val is
already a bool use it, if it's a string compare lowercased values against
accepted true tokens ("true","1","yes","y") and false tokens
("false","0","no","n") and set the attribute accordingly (defaulting to False
for unrecognized values); this ensures quoted "false" becomes False while
preserving actual boolean inputs.
---
Nitpick comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 279-286: The local variable named server is shadowing the function
parameter server; rename the websocket server object to a distinct name (e.g.,
ws_server or websocket_server) where websockets.serve is assigned and where
serve_forever() is awaited. Update the assignment line that now reads await
websockets.serve(self._server_listen, host=server, port=port) to store the
return in ws_server, replace result_future.set_result(None)/set_exception(ex)
moves unchanged, and call await ws_server.serve_forever() so the parameter
server remains unshadowed; reference the _server_listen coroutine and
result_future handling when making the change.
- Around line 434-442: The current loop uses np.append(current_data, data) which
repeatedly reallocates and is O(n²); replace the accumulation strategy for
current_data with a streaming buffer (e.g., a bytearray or collections.deque) to
avoid repeated numpy copies, append raw bytes/chunks to that buffer in the
message loop, and only convert or construct a numpy array when you need
vad_needed_samples for VAD (use slicing/pop from the bytearray or popleft from
deque to remove used samples); update code references around current_data,
np.frombuffer(msg, dtype=np.float32), and vad_needed_samples to perform the
conversion/consumption step atomically so you no longer call np.append for each
chunk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 602449a6-3b32-49f1-86ab-1d7a96ea4ca8
📒 Files selected for processing (1)
src/glados/audio_io/websocket_io.py
…rs could set exit flags multiple times
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
src/glados/audio_io/websocket_io.py (2)
169-172:⚠️ Potential issue | 🟠 MajorDon't leave
_is_playinglatched when no speaker ever ACKs.If no speaker is connected, Line 192 times out and returns, but
_is_playingstaysTrue. The nextstart_speaking()then hits the unbounded wait on Line 172 and can hang forever;src/glados/cli.pyLines 200-203 already call this path withwait=True. Handle theFalsereturn from both waits by forcing the track into a finished/interrupted state or by surfacing an error to the caller.Also applies to: 190-192
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 169 - 172, When stopping existing playback in start_speaking() you must handle the case where _playback_finished_event.wait() (and the similar wait on lines ~190-192) returns False due to a timeout so _is_playing does not remain latched; update the logic in start_speaking() and stop_speaking() to check the boolean return of _playback_finished_event.wait(), and if it returned False either (a) force the track into a finished/interrupted state by clearing _is_playing and setting the _playback_finished_event (or equivalent internal state) before proceeding, or (b) raise/surface an error to the caller so the caller can handle the failed stop; make this change around the uses of _is_playing, stop_speaking(), _playback_finished_event.wait(), and start_speaking() so a timed-out wait cannot leave _is_playing True.
133-140:⚠️ Potential issue | 🟠 MajorPropagate every startup failure through
startup_future, not justOSError.Right now only the
OSErrorpath completes the future exceptionally. If_run_server()fails earlier in setup or raises a different exception type, Line 140 degrades the real startup error into a generic 10-second timeout.💡 Suggested fix
async def _run_server(self, server: str, port: int, result_future: concurrent.futures.Future) -> None: """Runs the websocket server. @@ - self._mic_state_lock = asyncio.Lock() - - # re-route logging of websockets - class LogAdapter(logging.Handler): - def emit(self, record: logging.LogRecord) -> None: - msg = self.format(record) - level = record.levelname.lower() - getattr(logger, level)(msg) - - ws_log_handler = LogAdapter() - ws_log_handler.setFormatter(logging.Formatter("[%(asctime)s] %(name)s %(message)s")) - - ws_logger = logging.getLogger("websockets") - ws_logger.addHandler(ws_log_handler) - ws_logger.propagate = False - try: + self._mic_state_lock = asyncio.Lock() + + # re-route logging of websockets + class LogAdapter(logging.Handler): + def emit(self, record: logging.LogRecord) -> None: + msg = self.format(record) + level = record.levelname.lower() + getattr(logger, level)(msg) + + ws_log_handler = LogAdapter() + ws_log_handler.setFormatter(logging.Formatter("[%(asctime)s] %(name)s %(message)s")) + + ws_logger = logging.getLogger("websockets") + ws_logger.addHandler(ws_log_handler) + ws_logger.propagate = False + server = await websockets.serve(self._server_listen, host=server, port=port) result_future.set_result(None) - except OSError as ex: - result_future.set_exception(ex) + except Exception as ex: + if not result_future.done(): + result_future.set_exception(ex) raiseAlso applies to: 260-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 133 - 140, The startup_future only gets set_exception for OSError, so any other exceptions from _run_server or earlier setup are lost and manifest as a timeout; modify the thread start wrapper (the lambda/target used for self._server_thread) to catch all exceptions from asyncio.run(self._run_server(...)) and call startup_future.set_exception(exc) for any Exception (not just OSError), ensuring startup_future is also set_result on success; locate the threading.Thread creation and the lambda/target that calls asyncio.run and add a broad try/except/finally around the call to propagate all failures through startup_future (also apply the same change to the other server-thread block around the lines that correspond to the second server start).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 177-181: The AudioData instance stored under self._audio_data
inside start_speaking() currently holds a reference to caller-owned audio_data
which can be mutated later; to fix this, make a deep copy of the PCM buffer
before constructing AudioData so the _server_speaker() thread serializes
immutable data; update the start_speaking() block that acquires self._audio_lock
to copy audio_data (e.g., via bytes(...) or an explicit buffer copy) and then
set self._audio_data = AudioData(copied_buffer, sample_rate, play_time,
uuid.uuid4()), leaving all other logic and symbols (_audio_lock, _audio_data,
AudioData, _server_speaker, start_speaking) unchanged.
- Around line 456-491: The bytes-handling branch only processes one
vad_needed_samples window per recv, so large incoming chunks leave additional
full VAD frames buffered and potentially lost on disconnect; inside the bytes
and self._is_listening branch (where current_data is appended) replace the
single-frame processing with a loop: while len(current_data) >=
vad_needed_samples, slice vad_data = current_data[:vad_needed_samples], advance
current_data, run vad_model on vad_data and perform the same _mic_state and
_sample_queue logic (including updating room, silence_chunks, and current_id)
for each drained frame so all complete VAD frames are enqueued before returning
from recv.
- Around line 412-416: The reset path can fail before
set_flags_once(current_track_id, True) is called (e.g., if await
websocket.send("reset") raises), leaving _playback_finished_event unset and
causing deadlocks; modify the reset branch in the function handling
_stop_playback so that websocket.send("reset") is wrapped in try/except/finally
(or try/finally) and ensure set_flags_once(current_track_id, True) is always
executed in the finally block (log any send errors but do not skip the
finalization), referencing the existing websocket.send("reset"), set_flags_once,
_stop_playback, _playback_finished_event and current_track_id symbols.
---
Duplicate comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 169-172: When stopping existing playback in start_speaking() you
must handle the case where _playback_finished_event.wait() (and the similar wait
on lines ~190-192) returns False due to a timeout so _is_playing does not remain
latched; update the logic in start_speaking() and stop_speaking() to check the
boolean return of _playback_finished_event.wait(), and if it returned False
either (a) force the track into a finished/interrupted state by clearing
_is_playing and setting the _playback_finished_event (or equivalent internal
state) before proceeding, or (b) raise/surface an error to the caller so the
caller can handle the failed stop; make this change around the uses of
_is_playing, stop_speaking(), _playback_finished_event.wait(), and
start_speaking() so a timed-out wait cannot leave _is_playing True.
- Around line 133-140: The startup_future only gets set_exception for OSError,
so any other exceptions from _run_server or earlier setup are lost and manifest
as a timeout; modify the thread start wrapper (the lambda/target used for
self._server_thread) to catch all exceptions from
asyncio.run(self._run_server(...)) and call startup_future.set_exception(exc)
for any Exception (not just OSError), ensuring startup_future is also set_result
on success; locate the threading.Thread creation and the lambda/target that
calls asyncio.run and add a broad try/except/finally around the call to
propagate all failures through startup_future (also apply the same change to the
other server-thread block around the lines that correspond to the second server
start).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0b9396a-5bcd-4696-90a2-376441107884
📒 Files selected for processing (1)
src/glados/audio_io/websocket_io.py
Speaker: copy audio data into Lock instead of reference exception handling on reset Microphone: always process as much audio as possible
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/glados/audio_io/websocket_io.py (1)
283-285: Variable shadowing:serverparameter reassigned to websocket server object.The
serverparameter (the host address string) is shadowed by the assignment to the websocket server object on line 284. This works but reduces readability.♻️ Suggested fix
- try: - server = await websockets.serve(self._server_listen, host=server, port=port) + try: + ws_server = await websockets.serve(self._server_listen, host=server, port=port) result_future.set_result(None) except OSError as ex: result_future.set_exception(ex) raise - await server.serve_forever() + await ws_server.serve_forever()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/glados/audio_io/websocket_io.py` around lines 283 - 285, The host parameter named "server" is being shadowed when you assign the websockets.serve return to it; rename the local variable to avoid confusion (e.g., ws_server or server_obj) in the function/method containing the websockets.serve call (the call referencing self._server_listen and result_future.set_result), update any subsequent references that expect the websockets server object to use the new name, and keep the original "server" parameter as the host string for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 169-173: The wait on self._playback_finished_event after calling
stop_speaking can block forever if no speakers are connected; update the logic
in the block that checks self._is_playing (the call site that currently calls
stop_speaking() and then self._playback_finished_event.wait()) to use a bounded
wait (e.g., self._playback_finished_event.wait(timeout=<reasonable-duration>))
and handle the timeout case (log a warning and proceed or forcibly clear state)
so start_speaking/stop_speaking and the _is_playing/_playback_finished_event
state cannot hang indefinitely.
---
Nitpick comments:
In `@src/glados/audio_io/websocket_io.py`:
- Around line 283-285: The host parameter named "server" is being shadowed when
you assign the websockets.serve return to it; rename the local variable to avoid
confusion (e.g., ws_server or server_obj) in the function/method containing the
websockets.serve call (the call referencing self._server_listen and
result_future.set_result), update any subsequent references that expect the
websockets server object to use the new name, and keep the original "server"
parameter as the host string for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c11ac4b6-e208-49c1-823e-bb00d4766f04
📒 Files selected for processing (1)
src/glados/audio_io/websocket_io.py
|
@reisbauer03 This requires a more extensive review than I can do on a mobile phone while ok the beach on holiday; I will get to it when I am home late April. Same for the other PRs. Thanks for the work! |
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
This PR introduces the previously unimplemented WebSocket audio input/output backend.
New Configuration – A top‑level
audio_io_optionskey is added so you can supply backend‑specific settings. A reference example is now inglados_websocket_config.yaml.Protocol Documentation – The implemented WebSocket protocol is documented in
README_WEBSOCKET_PROTOCOL.md. I verified the AI‑generated description for correctness and made a few edits.Multi‑Channel Support – The backend can handle multiple audio inputs and outputs. Each input/output can be assigned a
roomtag (a simple text label) for matching purposes.Speaker Segregation – When the backend option
segregate_speakersis enabled, a speaker will only play audio from an input that shares the same tag. This is useful for multi‑room setups, letting GLaDOS “follow” you as you move between rooms.Testing – I used the example client
tests/audio-websocket-both.html; all features work as expected.Summary by CodeRabbit
New Features
Documentation
Tests
Chores