Add optional regression tests for 4 confirmed bug behaviors#2
Closed
artuskg wants to merge 2 commits into
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a small, MLX-backed optional regression test suite for four high-impact behavior bugs we observed in
voxmlx.The tests are intentionally synthetic and fast, and they are gated behind
VOXMLX_ENABLE_MLX_RUNTIME_TESTS=1so default CI remains lightweight.Why this PR
We wanted a precise way to validate whether the following issues are real bugs (not just quality drift or perf noise), and to prevent regressions after fixes.
The 4 bugs (with test mapping)
1)
RotatingKVCacheconcat path can exceedmax_sizefor multi-token appendsS > 1tokens with concat updates, cache length can exceed its cap and keep growing above the intended window.test_rotating_kv_cache_concat_respects_max_sizetest_rotating_kv_cache_concat_keeps_expected_tail_for_multi_token_append2)
encode_step()cache window should come from encoder config (sliding_window), not a large hardcoded constant100000) instead of model-config window.test_encode_step_uses_encoder_sliding_window_for_cache_size3) Offline
encode()trims from the head instead of the tailtest_encode_trims_trailing_frames_not_leading_frames4) Offline
generate()drops the final pending token when loop ends without EOStest_generate_flushes_final_pending_tokenTest file
tests/test_mlx_runtime_optional.pyHow to run
Notes