dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14
Open
dusterbloom wants to merge 1 commit intomainfrom
Open
dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14dusterbloom wants to merge 1 commit intomainfrom
dusterbloom wants to merge 1 commit intomainfrom
Conversation
Audit finding (2026-05-04): of the 3 commits originally planned for PR-3 (paged prefix cache, chunked prefill, trim_by), 2 are already on origin/main in superset form via panbanda's PR panbanda#74 squash and the 1514737 TurboQuant landing: - paged_prefix_cache.rs: main has 1072 lines (1072 vs our 855) with TqBlock + slice_axis1 + conv_pos additions on top of our work. - chunked_prefill: main wires it into a 2453-line simple.rs (vs our 1508), with compute_prefill_chunk_size + forward_chunked + the early validation our version was missing. - SteppingKeyValueCache::trim_by(usize): main has the per-layer trim with saturating_sub overflow guard (lib.rs:457). The genuinely-new piece is the AnyCache-level dispatcher: a single helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched. Required by upcoming spec-decode verify-rollback paths in higgs-engine that operate on AnyCache rather than reaching into per-layer types directly. Tests: - any_cache_trim_by_kv_dispatches_to_each_layer: KV variant with mixed Some/None layers, no panic, all caches saturate to 0. - any_cache_trim_by_hybrid_skips_arrays_layers: Hybrid with KV+Arrays mix, asserts Arrays.offset stays unchanged (recurrent state cannot trim by offset alone). Suite: 332 passed, 0 failed, 24 ignored. Clippy + rustfmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an
AnyCache::trim_by(count: usize)helper that walks every layer in eitherKVorHybridvariant and trims the underlyingSteppingKeyValueCache, while intentionally leavingLayerCache::Arrays(recurrent SSM state) untouched.Required by the upcoming spec-decode verify-rollback paths in
higgs-engine(PR-4 territory): after a draft batch's verify rejects the last N tokens, the dispatcher rolls back the KV portion of the cache without reaching into per-layer types.Audit context
PR-3 was originally planned to bundle 3 commits from
feat/magic-canvas— paged prefix cache, chunked prefill, and trim_by. Audit vs currentorigin/mainfound the first two are already there in superset form:826794b0paged_prefix_cache.rsd24e4a92chunked_prefillfb48230ctrim_bySteppingKeyValueCache::trim_by(usize)(panbanda 1514737) but no AnyCache-level dispatcher. This PR ships only the missing piece.PR-3 therefore shrinks to the single dispatcher (+ tests).
Behaviour
Hybrid'sLayerCache::Arraysvariant (qwen3-next SSM recurrent state) is intentionally skipped — its state cannot be trimmed by offset alone. Documented in the doc comment.Test plan
cargo clippy --all-targets --all-features— cleancargo fmt --all -- --check— cleancargo test -p higgs-models --lib -- --test-threads=1— 332 passed, 0 failed, 24 ignored (+2 new)New tests:
tests::any_cache_trim_by_kv_dispatches_to_each_layer— KV variant withSome/None/Somelayer mix, no panic, all saturate to 0.tests::any_cache_trim_by_hybrid_skips_arrays_layers— Hybrid withLayerCache::KV+LayerCache::Arrays, assertsArrays.offsetstays unchanged.Notes
crates/higgs-models/src/lib.rs(+78 lines, all inimpl AnyCache+ test module).