Fix KV cache concat trimming for multi-token updates by artuskg · Pull Request #3 · awni/voxmlx

artuskg · 2026-02-09T18:13:14Z

Bug fix: KV cache concat trim for multi-token appends

Problem

RotatingKVCache._update_concat() trimmed as if each update appended exactly one token:

trim_size = self._idx - self.max_size + 1

For S>1 appends, cache length can exceed max_size and stay above cap.

Resolution

Compute trim from current length + appended length:
- trim_size = max(0, cur + add - self.max_size)
Handle oversized first update by trimming to the latest max_size positions.
Validate max_size > 0 in constructor.
Add debug assertions ensuring key/value cache length never exceeds cap.

Why this helps

This restores bounded-window semantics and prevents memory/context drift when updates append multiple tokens.

Regression tests

tests/test_bugfix_kvcache_optional.py::test_concat_append_respects_max_size
tests/test_bugfix_kvcache_optional.py::test_oversized_first_update_trims_to_max_size

How to run

VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 python3 -m unittest -v tests.test_bugfix_kvcache_optional

Two bugs caused the encoder to produce garbage embeddings once audio exceeded the sliding window size: 1. Encoder KV cache was initialized with hardcoded 100_000 instead of the actual encoder.sliding_window (750). This meant the cache never rotated, growing unbounded until memory or attention degraded. 2. RotatingKVCache._update_concat trim calculation didn't account for the size of the appended keys, trimming one entry too few. Fixes awni#7. Based on patches from PRs awni#3 and awni#4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix KV concat trimming for multi-token updates and add regression tests

9920ebb

artuskg mentioned this pull request Feb 9, 2026

Add optional regression tests for 4 confirmed bug behaviors #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix KV cache concat trimming for multi-token updates#3

Fix KV cache concat trimming for multi-token updates#3
artuskg wants to merge 1 commit into
awni:mainfrom
artuskg:codex/awni-fix1-kvcache-trim

artuskg commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

artuskg commented Feb 9, 2026

Bug fix: KV cache concat trim for multi-token appends

Problem

Resolution

Why this helps

Regression tests

How to run

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant