Skip to content

dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14

Open
dusterbloom wants to merge 1 commit intomainfrom
dusterbloom/paged-prefix-cache
Open

dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14
dusterbloom wants to merge 1 commit intomainfrom
dusterbloom/paged-prefix-cache

Conversation

@dusterbloom
Copy link
Copy Markdown
Owner

Summary

Adds an AnyCache::trim_by(count: usize) helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched.

Required by the upcoming spec-decode verify-rollback paths in higgs-engine (PR-4 territory): after a draft batch's verify rejects the last N tokens, the dispatcher rolls back the KV portion of the cache without reaching into per-layer types.

Audit context

PR-3 was originally planned to bundle 3 commits from feat/magic-canvas — paged prefix cache, chunked prefill, and trim_by. Audit vs current origin/main found the first two are already there in superset form:

Original commit State on origin/main
826794b0 paged_prefix_cache.rs SUPERSET — 1072 lines on main vs our 855, with TqBlock + slice_axis1 + conv_pos additions on top
d24e4a92 chunked_prefill SUPERSET — wired into a 2453-line simple.rs (vs 1508 in feat/magic-canvas) with the early-validation guard our version was missing
fb48230c trim_by PARTIAL — main has SteppingKeyValueCache::trim_by(usize) (panbanda 1514737) but no AnyCache-level dispatcher. This PR ships only the missing piece.

PR-3 therefore shrinks to the single dispatcher (+ tests).

Behaviour

impl AnyCache {
    pub fn trim_by(&mut self, count: usize) {
        match self {
            Self::KV(layers) => {
                for layer in layers.iter_mut().flatten() {
                    layer.trim_by(count);
                }
            }
            Self::Hybrid(layers) => {
                for layer in layers.iter_mut().flatten() {
                    if let LayerCache::KV(kv) = layer {
                        kv.trim_by(count);
                    }
                }
            }
        }
    }
}

Hybrid's LayerCache::Arrays variant (qwen3-next SSM recurrent state) is intentionally skipped — its state cannot be trimmed by offset alone. Documented in the doc comment.

Test plan

  • cargo clippy --all-targets --all-features — clean
  • cargo fmt --all -- --check — clean
  • cargo test -p higgs-models --lib -- --test-threads=1332 passed, 0 failed, 24 ignored (+2 new)

New tests:

  • tests::any_cache_trim_by_kv_dispatches_to_each_layer — KV variant with Some/None/Some layer mix, no panic, all saturate to 0.
  • tests::any_cache_trim_by_hybrid_skips_arrays_layers — Hybrid with LayerCache::KV + LayerCache::Arrays, asserts Arrays.offset stays unchanged.

Notes

  • Single-file change: crates/higgs-models/src/lib.rs (+78 lines, all in impl AnyCache + test module).
  • Foundation for the upcoming DraftModel-trait / spec-decode-wiring PR (PR-4).
  • Co-authored with Claude Opus 4.7

Audit finding (2026-05-04): of the 3 commits originally planned for PR-3
(paged prefix cache, chunked prefill, trim_by), 2 are already on
origin/main in superset form via panbanda's PR panbanda#74 squash and the 1514737
TurboQuant landing:

- paged_prefix_cache.rs: main has 1072 lines (1072 vs our 855) with
  TqBlock + slice_axis1 + conv_pos additions on top of our work.
- chunked_prefill: main wires it into a 2453-line simple.rs (vs our
  1508), with compute_prefill_chunk_size + forward_chunked + the early
  validation our version was missing.
- SteppingKeyValueCache::trim_by(usize): main has the per-layer trim
  with saturating_sub overflow guard (lib.rs:457).

The genuinely-new piece is the AnyCache-level dispatcher: a single
helper that walks every layer in either KV or Hybrid variant and trims
the underlying SteppingKeyValueCache, while intentionally leaving
LayerCache::Arrays (recurrent SSM state) untouched. Required by
upcoming spec-decode verify-rollback paths in higgs-engine that operate
on AnyCache rather than reaching into per-layer types directly.

Tests:
- any_cache_trim_by_kv_dispatches_to_each_layer: KV variant with mixed
  Some/None layers, no panic, all caches saturate to 0.
- any_cache_trim_by_hybrid_skips_arrays_layers: Hybrid with KV+Arrays
  mix, asserts Arrays.offset stays unchanged (recurrent state cannot
  trim by offset alone).

Suite: 332 passed, 0 failed, 24 ignored. Clippy + rustfmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant