Skip to content

Add a wakeword recognition module based on OpenWakeWord#199

Open
reisbauer03 wants to merge 7 commits intodnhkng:mainfrom
reisbauer03:pr-wakeword
Open

Add a wakeword recognition module based on OpenWakeWord#199
reisbauer03 wants to merge 7 commits intodnhkng:mainfrom
reisbauer03:pr-wakeword

Conversation

@reisbauer03
Copy link
Copy Markdown
Contributor

@reisbauer03 reisbauer03 commented Apr 12, 2026

This PR adds a new wake word recognition based on OpenWakeWord, because I'm definitely unable to pronounce "GLaDOS" in a way the ASR recognizes.

  • The functionality is a stripped version of the OpenWakeWord code: originally I wanted to add it as a dependency, but the package conflicts with onnxruntime-gpu and always adds a dependency on tflite, so I took model.py and utils.py from their code and integrated them into this project. The original license is preserved, original authorship is clearly visible in the files from the OpenWakeWord project. The Apache 2.0 license is compatible with the MIT license used for GLaDOS.
  • New configuration parameter wake_word_model_path: if this is set to a OpenWakeWord .onnx model, recognized text is only passed to the LLM if it contains the wake word. This configuration option takes precedence over the old wake_word option (if set).
  • 2 new model files are automatically downloaded with glados download, I suspect you will want to change the URLs.
  • No performance impact if the option isn't enabled.

Summary by CodeRabbit

  • New Features

    • Optional ONNX wake-word model via new configuration field; wake word must be exactly one word.
    • Model-based wake-word detection with improved streaming feature extraction for more reliable, lower-latency gating (ASR text gating kept as fallback).
  • Chores

    • Added OpenWakeWord model artifacts to download/verification registry and included Apache 2.0 license.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Adds OpenWakeWord ONNX wake‑word support: new config field and model artifacts, CLI registry entries, engine wiring to pass a wake‑word model path, SpeechListener gating via ONNX recognizer (with ASR/text fallback), and a new ONNX-based wake‑word implementation plus license.

Changes

Cohort / File(s) Summary
Configuration
configs/glados_config.yaml
Added Glados.wake_word_model_path: null and clarified Glados.wake_word must be exactly one word.
Model Registry / CLI
src/glados/cli.py
Added OpenWakeWord artifacts to MODEL_DETAILS: models/openwakeword/melspec.onnx, models/openwakeword/embedding.onnx (URLs + checksums) for download/validation.
Engine Integration
src/glados/core/engine.py
GladosConfig gains `wake_word_model_path: Path
Speech Listener
src/glados/core/speech_listener.py
SpeechListener accepts wake_word_path, may construct OpenWakeWord recognizer; _process_detected_audio concatenates samples and uses predict_multi_sample to gate ASR when model present; otherwise uses text-based wake-word detection; asr now takes a single audio array; reset() moved to finally.
OpenWakeWord: Public wrapper
src/glados/openwakeword/__init__.py
New Model wrapper: sample rate, default ONNX paths, runtime provider selection, and reset/predict/predict_multi_sample APIs.
OpenWakeWord: Core model
src/glados/openwakeword/model.py
New ONNX-based inference engine: session loading, feature integration, streaming/frame buffering, optional custom verifiers, and public predict/predict_clip APIs.
OpenWakeWord: Feature utils
src/glados/openwakeword/utils.py
New AudioFeatures for melspec/embedding extraction (streaming + offline), ONNX session handling, multithreaded embedding, and re_arg deprecation helper.
License
src/glados/openwakeword/LICENSE
Added Apache-2.0 license text for included OpenWakeWord code.

Sequence Diagram(s)

sequenceDiagram
    participant Audio as Audio Buffer
    participant SL as SpeechListener
    participant WWR as OpenWakeWord Model
    participant ASR as ASR Engine
    participant Cmd as Command Handler

    Audio->>SL: _process_detected_audio(samples)
    SL->>SL: concatenate samples -> audio

    alt wake_word_model_path configured
        SL->>WWR: predict_multi_sample(audio)
        WWR-->>SL: detected:boolean
        alt not detected
            SL->>SL: return early
        else detected
            SL->>ASR: asr(audio)
            ASR-->>SL: transcript
            SL->>Cmd: handle transcript
        end
    else no model configured
        SL->>ASR: asr(audio)
        ASR-->>SL: transcript
        SL->>SL: _wakeword_detected(transcript)?
        alt contains wake word
            SL->>Cmd: handle transcript
        else
            SL->>SL: return early
        end
    end

    SL->>SL: reset() (finally)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Refactor #142 — Modifies engine and SpeechListener for wake_word_model_path wiring and related listener changes (strong overlap).
  • Refactor Project Installation #112 — Updates CLI model registry/download logic; related to adding OpenWakeWord artifacts in MODEL_DETAILS.

Poem

🐇 I tuned my ears to mel and frame,

ONNX hummed and called the name,
Buffers hopped and scores took flight,
I guard the gate both day and night,
A rabbit's ears, alert and bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a wakeword recognition module based on OpenWakeWord, which is the primary objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
src/glados/openwakeword/utils.py (2)

55-58: Use exception chaining for better debugging context.

When re-raising an exception with a different message, use raise ... from err to preserve the original traceback.

🧹 Suggested fix
         try:
             import onnxruntime as ort
-        except ImportError:
-            raise ValueError("Tried to import onnxruntime, but it was not found. Please install it using `pip install onnxruntime`")
+        except ImportError as err:
+            raise ValueError("Tried to import onnxruntime, but it was not found. Please install it using `pip install onnxruntime`") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/utils.py` around lines 55 - 58, The ImportError
handler in the import block for onnxruntime should preserve the original
traceback; update the except clause in utils.py to capture the exception (e.g.,
"except ImportError as err") and re-raise the ValueError using exception
chaining ("raise ValueError(...) from err") so the original ImportError context
is preserved when raising the new error.

42-42: Consider using explicit T | None type annotation.

The onnx_providers parameter uses an implicit Optional (defaulting to None). While this works correctly (line 69-70 handles the None case), modern Python style prefers explicit union types.

🧹 Suggested fix
-                 onnx_providers: list[str] = None
+                 onnx_providers: list[str] | None = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/utils.py` at line 42, The parameter onnx_providers
currently uses an implicit Optional by being annotated as "list[str] = None";
update its type annotation to an explicit union such as "list[str] | None" (or
"Optional[list[str]]" and import Optional from typing for <3.10 compatibility)
on the function/constructor that declares onnx_providers so the signature
clearly reflects None is allowed while keeping the existing None-handling logic
intact.
src/glados/openwakeword/model.py (2)

113-114: Use exception chaining for better debugging context.

🧹 Suggested fix
-        except ImportError:
-            raise ValueError("Tried to import onnxruntime, but it was not found. Please install it using `pip install onnxruntime`")
+        except ImportError as err:
+            raise ValueError("Tried to import onnxruntime, but it was not found. Please install it using `pip install onnxruntime`") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/model.py` around lines 113 - 114, The except
ImportError block currently raises a ValueError without chaining the original
ImportError; change the handler to capture the ImportError (e.g., except
ImportError as e) and re-raise the ValueError using exception chaining (raise
ValueError("Tried to import onnxruntime...") from e) so the original traceback
is preserved for debugging in the module-level import in
src/glados/openwakeword/model.py.

59-60: Consider replacing mutable default arguments with None.

Mutable default arguments ([], {}) are a known Python anti-pattern. While this code handles them correctly by not mutating the defaults, the pattern can lead to subtle bugs if the code is modified later.

🧹 Suggested fix for __init__
     def __init__(
             self,
             wakeword_models: List[str],
-            class_mapping_dicts: List[dict] = [],
-            custom_verifier_models: dict = {},
+            class_mapping_dicts: List[dict] | None = None,
+            custom_verifier_models: dict | None = None,
             custom_verifier_threshold: float = 0.1,
-            onnx_providers: list[str] = None,
+            onnx_providers: list[str] | None = None,
             **kwargs
             ):
+        if class_mapping_dicts is None:
+            class_mapping_dicts = []
+        if custom_verifier_models is None:
+            custom_verifier_models = {}
🧹 Suggested fix for predict
-    def predict(self, x: np.ndarray, patience: dict = {},
-                threshold: dict = {}, debounce_time: float = 0.0, timing: bool = False):
+    def predict(self, x: np.ndarray, patience: dict | None = None,
+                threshold: dict | None = None, debounce_time: float = 0.0, timing: bool = False):
         ...
+        if patience is None:
+            patience = {}
+        if threshold is None:
+            threshold = {}

Note: This requires updating the checks from if patience != {} to checking both conditions.

Also applies to: 176-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/model.py` around lines 59 - 60, The constructor for
the model uses mutable default arguments (class_mapping_dicts: List[dict] = []
and custom_verifier_models: dict = {}) which is an anti-pattern; change those
defaults to None in __init__ and inside __init__ set self.class_mapping_dicts =
class_mapping_dicts or [] and self.custom_verifier_models =
custom_verifier_models or {} to preserve current behavior. Update any methods
that check those parameters (e.g., predict and any checks that currently use
equality like `if patience != {}`) to instead check for None or use truthiness
(e.g., `if custom_verifier_models is not None` or `if patience is not None`) so
logic remains correct after the default change; update references to
class_mapping_dicts and custom_verifier_models accordingly.
src/glados/cli.py (1)

133-143: Update the docstring to include the new OpenWakeWord models.

The docstring lists specific models but doesn't include the newly added melspec.onnx and embedding.onnx models. Based on learnings, this function should have comprehensive documentation including the full model list.

📝 Suggested docstring update
 async def download_models() -> int:
     """
     Main async controller for downloading all the specified models:
         - ASR model: nemo-parakeet_tdt_ctc_110m.onnx
         - VAD model: silero_vad_16k_op15.onnx
         - TTS model: glados.onnx
         - Phonemizer model: phomenizer_en.onnx
+        - OpenWakeWord models: melspec.onnx, embedding.onnx
 
     Returns:
         int: Exit code (0 for success, 1 for failure)
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/cli.py` around lines 133 - 143, Update the download_models()
docstring to reflect all models being downloaded: keep the existing ASR
(nemo-parakeet_tdt_ctc_110m.onnx), VAD (silero_vad_16k_op15.onnx), TTS
(glados.onnx), and phonemizer (phonemizer_en.onnx) entries and add the new
OpenWakeWord models melspec.onnx and embedding.onnx so the docstring lists the
complete set of artifacts the function downloads.
src/glados/openwakeword/__init__.py (1)

30-31: Remove extraneous f-string prefixes.

These f-strings contain no placeholders.

🧹 Proposed fix
-    DEFAULT_MELSPEC_MODEL_PATH = resource_path(f"models/openwakeword/melspec.onnx")  # Model path to melspectogram model
-    DEFAULT_EMBEDDING_MODEL_PATH = resource_path(f"models/openwakeword/embedding.onnx")  # Model path to embedding model
+    DEFAULT_MELSPEC_MODEL_PATH = resource_path("models/openwakeword/melspec.onnx")  # Model path to melspectogram model
+    DEFAULT_EMBEDDING_MODEL_PATH = resource_path("models/openwakeword/embedding.onnx")  # Model path to embedding model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 30 - 31, The constants
DEFAULT_MELSPEC_MODEL_PATH and DEFAULT_EMBEDDING_MODEL_PATH use unnecessary
f-strings around static literals when calling resource_path; remove the
extraneous f-string prefixes so both calls use plain string literals (e.g., call
resource_path("models/openwakeword/melspec.onnx") and
resource_path("models/openwakeword/embedding.onnx")) to avoid misleading
f-string usage while keeping the same resource_path invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 84-106: The bug is that the local variable threshold is
overwritten with a dict and then used in a numeric comparison, causing a
TypeError; fix by keeping the scalar threshold separate and build a dict only
for the call to self._model.predict (e.g., create threshold_dict =
{self.model_name: threshold} or keep threshold_scalar and assign threshold_dict
for the predict call), similarly do the same for patience if needed, then call
self._model.predict(..., threshold=threshold_dict) and compare the returned
array to the original scalar (threshold_scalar) when evaluating > threshold;
ensure you reference self.model_name, self._model.predict, and the local
threshold variable names consistently.

In `@src/glados/openwakeword/model.py`:
- Around line 140-143: The code currently uses pickle.load to deserialize
user-provided custom verifier model files (the custom_verifier_models load block
in model.py), which is unsafe; replace this direct pickle.load with a safer
approach: either (a) require and use a safe, explicit serialization format for
verifier models (e.g., JSON/YAML for simple configs or a validated model format)
or (b) if binary model files are required, perform strict validation before
loading (ensure the path is in an allowed directory, check file
signatures/whitelists, and load inside a sandboxed/isolated process) and log a
clear security warning whenever a custom verifier model is loaded. Update the
code that assigns self.custom_verifier_models[mdl_name] (and related docs) to
perform the validation and emit the warning via the existing logger instead of
calling pickle.load directly.

---

Nitpick comments:
In `@src/glados/cli.py`:
- Around line 133-143: Update the download_models() docstring to reflect all
models being downloaded: keep the existing ASR
(nemo-parakeet_tdt_ctc_110m.onnx), VAD (silero_vad_16k_op15.onnx), TTS
(glados.onnx), and phonemizer (phonemizer_en.onnx) entries and add the new
OpenWakeWord models melspec.onnx and embedding.onnx so the docstring lists the
complete set of artifacts the function downloads.

In `@src/glados/openwakeword/__init__.py`:
- Around line 30-31: The constants DEFAULT_MELSPEC_MODEL_PATH and
DEFAULT_EMBEDDING_MODEL_PATH use unnecessary f-strings around static literals
when calling resource_path; remove the extraneous f-string prefixes so both
calls use plain string literals (e.g., call
resource_path("models/openwakeword/melspec.onnx") and
resource_path("models/openwakeword/embedding.onnx")) to avoid misleading
f-string usage while keeping the same resource_path invocation.

In `@src/glados/openwakeword/model.py`:
- Around line 113-114: The except ImportError block currently raises a
ValueError without chaining the original ImportError; change the handler to
capture the ImportError (e.g., except ImportError as e) and re-raise the
ValueError using exception chaining (raise ValueError("Tried to import
onnxruntime...") from e) so the original traceback is preserved for debugging in
the module-level import in src/glados/openwakeword/model.py.
- Around line 59-60: The constructor for the model uses mutable default
arguments (class_mapping_dicts: List[dict] = [] and custom_verifier_models: dict
= {}) which is an anti-pattern; change those defaults to None in __init__ and
inside __init__ set self.class_mapping_dicts = class_mapping_dicts or [] and
self.custom_verifier_models = custom_verifier_models or {} to preserve current
behavior. Update any methods that check those parameters (e.g., predict and any
checks that currently use equality like `if patience != {}`) to instead check
for None or use truthiness (e.g., `if custom_verifier_models is not None` or `if
patience is not None`) so logic remains correct after the default change; update
references to class_mapping_dicts and custom_verifier_models accordingly.

In `@src/glados/openwakeword/utils.py`:
- Around line 55-58: The ImportError handler in the import block for onnxruntime
should preserve the original traceback; update the except clause in utils.py to
capture the exception (e.g., "except ImportError as err") and re-raise the
ValueError using exception chaining ("raise ValueError(...) from err") so the
original ImportError context is preserved when raising the new error.
- Line 42: The parameter onnx_providers currently uses an implicit Optional by
being annotated as "list[str] = None"; update its type annotation to an explicit
union such as "list[str] | None" (or "Optional[list[str]]" and import Optional
from typing for <3.10 compatibility) on the function/constructor that declares
onnx_providers so the signature clearly reflects None is allowed while keeping
the existing None-handling logic intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddd6d56a-e1cd-4de8-9160-0ae62584959f

📥 Commits

Reviewing files that changed from the base of the PR and between 878b144 and edd3be4.

📒 Files selected for processing (8)
  • configs/glados_config.yaml
  • src/glados/cli.py
  • src/glados/core/engine.py
  • src/glados/core/speech_listener.py
  • src/glados/openwakeword/LICENSE
  • src/glados/openwakeword/__init__.py
  • src/glados/openwakeword/model.py
  • src/glados/openwakeword/utils.py

Comment thread src/glados/openwakeword/__init__.py Outdated
Comment on lines +140 to +143
# Load custom verifier models
if isinstance(custom_verifier_models, dict):
if custom_verifier_models.get(mdl_name, False):
self.custom_verifier_models[mdl_name] = pickle.load(open(custom_verifier_models[mdl_name], 'rb'))
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Security: pickle.load can execute arbitrary code.

Loading pickle files from user-provided paths is a security risk. Pickle deserialization can execute arbitrary code, so malicious pickle files could compromise the system.

Since this is for custom verifier models (an optional advanced feature) and the paths are user-configured, the risk is somewhat mitigated. However, consider adding a warning in documentation or logging when custom verifier models are loaded.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 143-143: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/model.py` around lines 140 - 143, The code currently
uses pickle.load to deserialize user-provided custom verifier model files (the
custom_verifier_models load block in model.py), which is unsafe; replace this
direct pickle.load with a safer approach: either (a) require and use a safe,
explicit serialization format for verifier models (e.g., JSON/YAML for simple
configs or a validated model format) or (b) if binary model files are required,
perform strict validation before loading (ensure the path is in an allowed
directory, check file signatures/whitelists, and load inside a
sandboxed/isolated process) and log a clear security warning whenever a custom
verifier model is loaded. Update the code that assigns
self.custom_verifier_models[mdl_name] (and related docs) to perform the
validation and emit the warning via the existing logger instead of calling
pickle.load directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a) not my code, but from the openWakeWord project and b) the functionality isn't exposed in the public API / user configuration. Won't fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reisbauer03, understood — since this is vendored code from the openWakeWord project and the custom_verifier_models path is not exposed through the public API or user configuration, the practical risk is minimal. Acknowledged, no action needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/glados/openwakeword/__init__.py (2)

96-101: Remove unreachable threshold is None branch.

After Line 84, threshold is always a float, so this conditional can be simplified to a single dict assignment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 96 - 101, Remove the dead
conditional that creates an empty dict when threshold is None and always build
threshold_param mapping directly; replace the if/else around threshold with a
single assignment that sets threshold_param = { self.model_name: threshold }
(update references to threshold_param and ensure this change is applied in the
initializer/function where threshold and self.model_name are used).

128-132: Avoid repeated array slicing in chunk loop.

x = x[self.MIN_SAMPLES:] reallocates each iteration and scales poorly on long audio. Iterate by index to keep this linear and allocation-light.

♻️ Proposed refactor
-        prediction = 0
-        while len(x) >= self.MIN_SAMPLES:
-            current = x[:self.MIN_SAMPLES]
-            x = x[self.MIN_SAMPLES:]
-            prediction = max(self._model.predict(current)[self.model_name], prediction)
+        prediction = 0.0
+        for start in range(0, len(x) - self.MIN_SAMPLES + 1, self.MIN_SAMPLES):
+            current = x[start:start + self.MIN_SAMPLES]
+            prediction = max(self._model.predict(current)[self.model_name], prediction)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 128 - 132, The loop
repeatedly slices x causing repeated reallocations; change it to iterate by
index (e.g., use a start pointer or a for-range stepping by self.MIN_SAMPLES) so
you call self._model.predict on x[start:start+self.MIN_SAMPLES] without
reassigning x each iteration, updating prediction with
max(self._model.predict(...)[self.model_name], prediction); this keeps memory
linear and avoids creating new arrays each loop while preserving the same logic
in the block using prediction, self.MIN_SAMPLES, self._model.predict and
self.model_name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 86-87: The float-to-int16 conversion uses x = (x *
32767).astype(np.int16) which can overflow if samples fall outside [-1.0,1.0];
before scaling, clamp the array with np.clip(x, -1.0, 1.0) to avoid wraparound
(replace occurrences of the conversion in this module — the lines where x is
scaled/converted to int16, and the other occurrence around line 126 — with
x_clipped = np.clip(x, -1.0, 1.0) and then scale: x = (x_clipped *
32767).astype(np.int16)).

---

Nitpick comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 96-101: Remove the dead conditional that creates an empty dict
when threshold is None and always build threshold_param mapping directly;
replace the if/else around threshold with a single assignment that sets
threshold_param = { self.model_name: threshold } (update references to
threshold_param and ensure this change is applied in the initializer/function
where threshold and self.model_name are used).
- Around line 128-132: The loop repeatedly slices x causing repeated
reallocations; change it to iterate by index (e.g., use a start pointer or a
for-range stepping by self.MIN_SAMPLES) so you call self._model.predict on
x[start:start+self.MIN_SAMPLES] without reassigning x each iteration, updating
prediction with max(self._model.predict(...)[self.model_name], prediction); this
keeps memory linear and avoids creating new arrays each loop while preserving
the same logic in the block using prediction, self.MIN_SAMPLES,
self._model.predict and self.model_name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a3896bb-60ae-4aa5-8e07-b0b4193da5e8

📥 Commits

Reviewing files that changed from the base of the PR and between edd3be4 and aa548ff.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Line 115: The comparison uses strict greater-than which excludes scores equal
to the threshold; update the comparisons to inclusive (>=) so a prediction score
exactly equal to threshold counts as a hit. Replace occurrences of
"self._model.predict(...)[self.model_name] > threshold" with
"self._model.predict(...)[self.model_name] >= threshold" (the lines using
self._model.predict, self.model_name and threshold/threshold_param), and make
the same change at the second occurrence referenced (around the other occurrence
noted).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c532a268-51cf-41a0-bf30-650457ec02cb

📥 Commits

Reviewing files that changed from the base of the PR and between aa548ff and 7c9f991.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/glados/openwakeword/__init__.py (1)

137-141: Process the trailing partial frame instead of dropping it.

Line 137/Line 140 currently ignore len(x) % MIN_SAMPLES samples. That can miss detections near the end of the clip. Consider zero-padding and evaluating the remainder chunk too.

💡 Suggested approach
         count_samples = len(x) // self.MIN_SAMPLES
         prediction = 0
         for i in range(count_samples):
             current = x[i * self.MIN_SAMPLES: (i + 1) * self.MIN_SAMPLES]
             prediction = max(self._model.predict(current)[self.model_name], prediction)
+
+        remainder_start = count_samples * self.MIN_SAMPLES
+        if remainder_start < len(x):
+            tail = x[remainder_start:]
+            padded = np.pad(tail, (0, self.MIN_SAMPLES - len(tail)), mode="constant")
+            prediction = max(self._model.predict(padded)[self.model_name], prediction)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 137 - 141, The loop drops
the trailing partial frame when len(x) % self.MIN_SAMPLES != 0 causing missed
detections; modify the logic in the block that computes count_samples/prediction
(using x, self.MIN_SAMPLES, self._model.predict, self.model_name) to include the
remainder by creating a final chunk of length self.MIN_SAMPLES padded with zeros
(or appropriate padding) from the tail of x, call self._model.predict on that
padded chunk and incorporate its score via the same max(...) logic so the last
partial frame is evaluated instead of ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 96-97: The threshold assignment accepts any float and must be
validated: after computing threshold = self.DEFAULT_THRESHOLD if threshold is
None else float(threshold) (and in the other identical assignment later), check
for NaN (math.isnan) and ensure 0.0 <= threshold <= 1.0; if the value is out of
range or NaN, raise a ValueError with a clear message (or clamp if preferred)
before continuing to prediction/gating logic; reference DEFAULT_THRESHOLD and
the local threshold variable in the predict/is_wake_word methods where the
assignment occurs.

---

Nitpick comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 137-141: The loop drops the trailing partial frame when len(x) %
self.MIN_SAMPLES != 0 causing missed detections; modify the logic in the block
that computes count_samples/prediction (using x, self.MIN_SAMPLES,
self._model.predict, self.model_name) to include the remainder by creating a
final chunk of length self.MIN_SAMPLES padded with zeros (or appropriate
padding) from the tail of x, call self._model.predict on that padded chunk and
incorporate its score via the same max(...) logic so the last partial frame is
evaluated instead of ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fbe4038-4668-41b3-af7a-f73f790e20b0

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9f991 and 5d83c5e.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/glados/openwakeword/__init__.py (1)

124-133: Validate patience and debounce_time inputs explicitly.

Negative/zero patience and negative debounce_time currently pass through silently, which can lead to confusing gating behavior instead of a clear configuration error.

🛠️ Suggested guardrails
-        if patience is None:
+        if patience is None:
             patience = {}
         else:
+            if not isinstance(patience, int) or patience <= 0:
+                raise ValueError("patience must be a positive integer")
             patience = {
                 self.model_name: patience,
             }
 
         if debounce_time is None:
             debounce_time = 0.0
+        elif debounce_time < 0:
+            raise ValueError("debounce_time must be >= 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 124 - 133, The constructor
currently allows non-positive patience and negative debounce_time; add explicit
validation in the __init__ around the patience and debounce_time handling: after
normalizing patience into either {} or {self.model_name: patience}, ensure any
provided patience value is numeric and > 0 (raise ValueError if <= 0 or
non-numeric), and for debounce_time ensure it is numeric and >= 0 (raise
ValueError if negative); reference the variables patience, debounce_time and
self.model_name in the error messages so callers can see which parameter failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 163-169: predict_multi_sample currently computes count_samples =
len(x) // self.MIN_SAMPLES and then uses a default prediction of 0 so an empty
input with threshold == 0.0 can incorrectly return True; modify
predict_multi_sample to explicitly return False when count_samples == 0 (i.e.,
not enough full chunks) before running the loop, keeping the rest of the logic
that slices chunks, calls self._model.predict(current)[self.model_name], and
compares the final prediction to threshold.

---

Nitpick comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 124-133: The constructor currently allows non-positive patience
and negative debounce_time; add explicit validation in the __init__ around the
patience and debounce_time handling: after normalizing patience into either {}
or {self.model_name: patience}, ensure any provided patience value is numeric
and > 0 (raise ValueError if <= 0 or non-numeric), and for debounce_time ensure
it is numeric and >= 0 (raise ValueError if negative); reference the variables
patience, debounce_time and self.model_name in the error messages so callers can
see which parameter failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a6d7c7e-e4bf-46a8-8ae3-32a29503738e

📥 Commits

Reviewing files that changed from the base of the PR and between 5d83c5e and 23a2983.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py
@reisbauer03
Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/glados/openwakeword/__init__.py (1)

44-52: Consider simplifying provider selection logic.

The current flow modifies providers in-place then immediately reassigns it. A cleaner approach would check membership directly without mutation:

♻️ Suggested simplification
-        providers = ort.get_available_providers()
-        for excluded in ["TensorrtExecutionProvider", "CoreMLExecutionProvider"]:
-            if excluded in providers:
-                providers.remove(excluded)
-
-        if "CUDAExecutionProvider" in providers:
+        available = ort.get_available_providers()
+        excluded = {"TensorrtExecutionProvider", "CoreMLExecutionProvider"}
+        if "CUDAExecutionProvider" in available and "CUDAExecutionProvider" not in excluded:
             providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
         else:
             providers = ["CPUExecutionProvider"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/glados/openwakeword/__init__.py` around lines 44 - 52, The code mutates
providers from ort.get_available_providers() then reassigns it immediately;
instead, avoid the in-place removal and directly base selection on membership:
call ort.get_available_providers() (keep that symbol), check if
"CUDAExecutionProvider" is in that returned list and set providers =
["CUDAExecutionProvider","CPUExecutionProvider"] else set providers =
["CPUExecutionProvider"]; remove the for loop that iterates over the excluded
list and any in-place .remove() calls so provider selection is clearer and
simpler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 96-98: The threshold validation currently enforces 0.0 < threshold
<= 1.0 but raises a generic "threshold must be between 0.0 and 1.0" message
which is misleading; update the check or the message in the validation block
that inspects the threshold variable (in __init__.py) so they agree: either
allow 0.0 by changing the condition to 0.0 <= threshold <= 1.0, or keep the
strict lower bound and change the ValueError text to something precise like
"threshold must be > 0.0 and <= 1.0"; modify the ValueError raised in that
function accordingly so the condition and error message are consistent.

---

Nitpick comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 44-52: The code mutates providers from
ort.get_available_providers() then reassigns it immediately; instead, avoid the
in-place removal and directly base selection on membership: call
ort.get_available_providers() (keep that symbol), check if
"CUDAExecutionProvider" is in that returned list and set providers =
["CUDAExecutionProvider","CPUExecutionProvider"] else set providers =
["CPUExecutionProvider"]; remove the for loop that iterates over the excluded
list and any in-place .remove() calls so provider selection is clearer and
simpler.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e544104-6459-44e9-b20c-dcc0a70080aa

📥 Commits

Reviewing files that changed from the base of the PR and between 23a2983 and 46eff3f.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/glados/openwakeword/__init__.py`:
- Around line 156-167: predict_multi_sample() currently calls self.reset() at
the start which reinitializes the predictor and causes zeroed outputs for the
first frames on every call; remove the per-call reset in predict_multi_sample()
so the model warm-up only happens once (e.g., in __init__ or an explicit reset
path) and leave the rest of the method (threshold validation via
_validate_threshold, conversion via _np_f32_to_i16, chunking by MIN_SAMPLES and
calling _model.predict(...)[self.model_name]) unchanged; if a forced
reinitialization is needed, provide a separate public reset method or a boolean
parameter to predict_multi_sample to opt into resetting rather than invoking
self.reset() unconditionally.
- Around line 94-98: The threshold check currently compares the raw `threshold`
which can raise TypeError for non-numeric inputs; normalize `threshold` to a
float at the top of the method (before the 0.0 < threshold <= 1.0 check) and
handle conversion failures by raising ValueError with the same message. Locate
the method using cls.DEFAULT_THRESHOLD in __init__.py (the threshold handling
block) and change the logic to convert `threshold` to float first, then perform
the bounds check and return the float or raise ValueError("threshold must be 0.0
< threshold <= 1.0") on conversion or bounds failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c85afecf-4bb2-477d-8cfe-0b2696288f19

📥 Commits

Reviewing files that changed from the base of the PR and between 46eff3f and c13d77b.

📒 Files selected for processing (1)
  • src/glados/openwakeword/__init__.py

Comment thread src/glados/openwakeword/__init__.py
Comment thread src/glados/openwakeword/__init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant