Skip to content

Use encoder sliding_window in encode_step KV cache init#4

Open
artuskg wants to merge 1 commit into
awni:mainfrom
artuskg:codex/awni-fix2-encoder-window
Open

Use encoder sliding_window in encode_step KV cache init#4
artuskg wants to merge 1 commit into
awni:mainfrom
artuskg:codex/awni-fix2-encoder-window

Conversation

@artuskg
Copy link
Copy Markdown

@artuskg artuskg commented Feb 9, 2026

Bug fix: use encoder sliding_window for streaming encoder KV cache

Problem

VoxtralRealtime.encode_step() initialized encoder cache with a large hardcoded size (100000) instead of the model-configured encoder sliding window.

Resolution

  • Initialize per-layer encoder caches with:
    • RotatingKVCache(int(self.encoder.sliding_window))

Why this helps

  • Restores model/runtime contract.
  • Prevents avoidable memory and compute growth for long-running streams.

Regression test

  • tests/test_bugfix_encoder_window_optional.py::test_encode_step_uses_encoder_sliding_window

How to run

VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 python3 -m unittest -v tests.test_bugfix_encoder_window_optional

bpcnpz added a commit to bpcnpz/voxmlx that referenced this pull request Mar 28, 2026
Two bugs caused the encoder to produce garbage embeddings once audio
exceeded the sliding window size:

1. Encoder KV cache was initialized with hardcoded 100_000 instead of
   the actual encoder.sliding_window (750). This meant the cache never
   rotated, growing unbounded until memory or attention degraded.

2. RotatingKVCache._update_concat trim calculation didn't account for
   the size of the appended keys, trimming one entry too few.

Fixes awni#7. Based on patches from PRs awni#3 and awni#4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant