Skip to content

Fix KV cache concat trimming for multi-token updates#3

Open
artuskg wants to merge 1 commit into
awni:mainfrom
artuskg:codex/awni-fix1-kvcache-trim
Open

Fix KV cache concat trimming for multi-token updates#3
artuskg wants to merge 1 commit into
awni:mainfrom
artuskg:codex/awni-fix1-kvcache-trim

Conversation

@artuskg
Copy link
Copy Markdown

@artuskg artuskg commented Feb 9, 2026

Bug fix: KV cache concat trim for multi-token appends

Problem

RotatingKVCache._update_concat() trimmed as if each update appended exactly one token:

trim_size = self._idx - self.max_size + 1

For S>1 appends, cache length can exceed max_size and stay above cap.

Resolution

  • Compute trim from current length + appended length:
    • trim_size = max(0, cur + add - self.max_size)
  • Handle oversized first update by trimming to the latest max_size positions.
  • Validate max_size > 0 in constructor.
  • Add debug assertions ensuring key/value cache length never exceeds cap.

Why this helps

This restores bounded-window semantics and prevents memory/context drift when updates append multiple tokens.

Regression tests

  • tests/test_bugfix_kvcache_optional.py::test_concat_append_respects_max_size
  • tests/test_bugfix_kvcache_optional.py::test_oversized_first_update_trims_to_max_size

How to run

VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 python3 -m unittest -v tests.test_bugfix_kvcache_optional

bpcnpz added a commit to bpcnpz/voxmlx that referenced this pull request Mar 28, 2026
Two bugs caused the encoder to produce garbage embeddings once audio
exceeded the sliding window size:

1. Encoder KV cache was initialized with hardcoded 100_000 instead of
   the actual encoder.sliding_window (750). This meant the cache never
   rotated, growing unbounded until memory or attention degraded.

2. RotatingKVCache._update_concat trim calculation didn't account for
   the size of the appended keys, trimming one entry too few.

Fixes awni#7. Based on patches from PRs awni#3 and awni#4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant