Add optional regression tests for 4 confirmed bug behaviors by artuskg · Pull Request #2 · awni/voxmlx

artuskg · 2026-02-09T18:09:11Z

Summary

This PR adds a small, MLX-backed optional regression test suite for four high-impact behavior bugs we observed in voxmlx.

The tests are intentionally synthetic and fast, and they are gated behind VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 so default CI remains lightweight.

Why this PR

We wanted a precise way to validate whether the following issues are real bugs (not just quality drift or perf noise), and to prevent regressions after fixes.

The 4 bugs (with test mapping)

1) `RotatingKVCache` concat path can exceed `max_size` for multi-token appends

Behavior: when appending S > 1 tokens with concat updates, cache length can exceed its cap and keep growing above the intended window.
Why it is a bug: windowed KV cache semantics are violated; this affects memory and can change attention context unexpectedly.
Tests:
- test_rotating_kv_cache_concat_respects_max_size
- test_rotating_kv_cache_concat_keeps_expected_tail_for_multi_token_append

2) `encode_step()` cache window should come from encoder config (`sliding_window`), not a large hardcoded constant

Behavior: cache is created with an oversized fixed window (historically 100000) instead of model-config window.
Why it is a bug: ignores model/runtime contract and can cause unnecessary memory/computation growth for long streams.
Test:
- test_encode_step_uses_encoder_sliding_window_for_cache_size

3) Offline `encode()` trims from the head instead of the tail

Behavior: when frame count is odd or not divisible by downsample factor, leading frames are dropped.
Why it is a bug: shifts alignment and makes offline behavior inconsistent with incremental path expectations.
Test:
- test_encode_trims_trailing_frames_not_leading_frames

4) Offline `generate()` drops the final pending token when loop ends without EOS

Behavior: decode loop uses a pending-token pattern but does not flush the last pending token at natural loop termination.
Why it is a bug: deterministic one-token truncation at end-of-audio in non-EOS termination path.
Test:
- test_generate_flushes_final_pending_token

Test file

tests/test_mlx_runtime_optional.py

How to run

VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 python3 -m unittest -v tests.test_mlx_runtime_optional

Notes

These are regression tests only; they do not enforce any specific implementation strategy.
They are designed to be stable and minimal, using synthetic tensors and stubs where possible.

artuskg · 2026-02-09T18:15:43Z

Superseded by four focused fix+test PRs: #3 (KV cache trim), #4 (encoder sliding window), #5 (offline encode trim direction), and #6 (generate final-token flush). Closing this broad test-only PR to keep review scope tight.

artuskg added 2 commits February 9, 2026 19:07

Add synthetic regression tests for key bug fixes

1f65730

Focus optional runtime tests on four confirmed bug regressions

2821e11

artuskg closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional regression tests for 4 confirmed bug behaviors#2

Add optional regression tests for 4 confirmed bug behaviors#2
artuskg wants to merge 2 commits into
awni:mainfrom
artuskg:codex/awni-4bug-tests-pr

artuskg commented Feb 9, 2026

Uh oh!

artuskg commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

artuskg commented Feb 9, 2026

Summary

Why this PR

The 4 bugs (with test mapping)

1) RotatingKVCache concat path can exceed max_size for multi-token appends

2) encode_step() cache window should come from encoder config (sliding_window), not a large hardcoded constant

3) Offline encode() trims from the head instead of the tail

4) Offline generate() drops the final pending token when loop ends without EOS

Test file

How to run

Notes

Uh oh!

artuskg commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1) `RotatingKVCache` concat path can exceed `max_size` for multi-token appends

2) `encode_step()` cache window should come from encoder config (`sliding_window`), not a large hardcoded constant

3) Offline `encode()` trims from the head instead of the tail

4) Offline `generate()` drops the final pending token when loop ends without EOS