Skip to content

Add optional regression tests for 4 confirmed bug behaviors#2

Closed
artuskg wants to merge 2 commits into
awni:mainfrom
artuskg:codex/awni-4bug-tests-pr
Closed

Add optional regression tests for 4 confirmed bug behaviors#2
artuskg wants to merge 2 commits into
awni:mainfrom
artuskg:codex/awni-4bug-tests-pr

Conversation

@artuskg
Copy link
Copy Markdown

@artuskg artuskg commented Feb 9, 2026

Summary

This PR adds a small, MLX-backed optional regression test suite for four high-impact behavior bugs we observed in voxmlx.

The tests are intentionally synthetic and fast, and they are gated behind VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 so default CI remains lightweight.

Why this PR

We wanted a precise way to validate whether the following issues are real bugs (not just quality drift or perf noise), and to prevent regressions after fixes.

The 4 bugs (with test mapping)

1) RotatingKVCache concat path can exceed max_size for multi-token appends

  • Behavior: when appending S > 1 tokens with concat updates, cache length can exceed its cap and keep growing above the intended window.
  • Why it is a bug: windowed KV cache semantics are violated; this affects memory and can change attention context unexpectedly.
  • Tests:
    • test_rotating_kv_cache_concat_respects_max_size
    • test_rotating_kv_cache_concat_keeps_expected_tail_for_multi_token_append

2) encode_step() cache window should come from encoder config (sliding_window), not a large hardcoded constant

  • Behavior: cache is created with an oversized fixed window (historically 100000) instead of model-config window.
  • Why it is a bug: ignores model/runtime contract and can cause unnecessary memory/computation growth for long streams.
  • Test:
    • test_encode_step_uses_encoder_sliding_window_for_cache_size

3) Offline encode() trims from the head instead of the tail

  • Behavior: when frame count is odd or not divisible by downsample factor, leading frames are dropped.
  • Why it is a bug: shifts alignment and makes offline behavior inconsistent with incremental path expectations.
  • Test:
    • test_encode_trims_trailing_frames_not_leading_frames

4) Offline generate() drops the final pending token when loop ends without EOS

  • Behavior: decode loop uses a pending-token pattern but does not flush the last pending token at natural loop termination.
  • Why it is a bug: deterministic one-token truncation at end-of-audio in non-EOS termination path.
  • Test:
    • test_generate_flushes_final_pending_token

Test file

  • tests/test_mlx_runtime_optional.py

How to run

VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1 python3 -m unittest -v tests.test_mlx_runtime_optional

Notes

  • These are regression tests only; they do not enforce any specific implementation strategy.
  • They are designed to be stable and minimal, using synthetic tensors and stubs where possible.

@artuskg
Copy link
Copy Markdown
Author

artuskg commented Feb 9, 2026

Superseded by four focused fix+test PRs: #3 (KV cache trim), #4 (encoder sliding window), #5 (offline encode trim direction), and #6 (generate final-token flush). Closing this broad test-only PR to keep review scope tight.

@artuskg artuskg closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant