dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback by dusterbloom · Pull Request #14 · dusterbloom/higgs

dusterbloom · 2026-05-04T09:54:09Z

Summary

Adds an AnyCache::trim_by(count: usize) helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched.

Required by the upcoming spec-decode verify-rollback paths in higgs-engine (PR-4 territory): after a draft batch's verify rejects the last N tokens, the dispatcher rolls back the KV portion of the cache without reaching into per-layer types.

Audit context

PR-3 was originally planned to bundle 3 commits from feat/magic-canvas — paged prefix cache, chunked prefill, and trim_by. Audit vs current origin/main found the first two are already there in superset form:

Original commit	State on origin/main
`826794b0` paged_prefix_cache.rs	SUPERSET — 1072 lines on main vs our 855, with TqBlock + slice_axis1 + conv_pos additions on top
`d24e4a92` chunked_prefill	SUPERSET — wired into a 2453-line simple.rs (vs 1508 in feat/magic-canvas) with the early-validation guard our version was missing
`fb48230c` trim_by	PARTIAL — main has `SteppingKeyValueCache::trim_by(usize)` (panbanda `1514737`) but no AnyCache-level dispatcher. This PR ships only the missing piece.

PR-3 therefore shrinks to the single dispatcher (+ tests).

Behaviour

impl AnyCache {
    pub fn trim_by(&mut self, count: usize) {
        match self {
            Self::KV(layers) => {
                for layer in layers.iter_mut().flatten() {
                    layer.trim_by(count);
                }
            }
            Self::Hybrid(layers) => {
                for layer in layers.iter_mut().flatten() {
                    if let LayerCache::KV(kv) = layer {
                        kv.trim_by(count);
                    }
                }
            }
        }
    }
}

Hybrid's LayerCache::Arrays variant (qwen3-next SSM recurrent state) is intentionally skipped — its state cannot be trimmed by offset alone. Documented in the doc comment.

Test plan

cargo clippy --all-targets --all-features — clean
cargo fmt --all -- --check — clean
cargo test -p higgs-models --lib -- --test-threads=1 — 332 passed, 0 failed, 24 ignored (+2 new)

New tests:

tests::any_cache_trim_by_kv_dispatches_to_each_layer — KV variant with Some/None/Some layer mix, no panic, all saturate to 0.
tests::any_cache_trim_by_hybrid_skips_arrays_layers — Hybrid with LayerCache::KV + LayerCache::Arrays, asserts Arrays.offset stays unchanged.

Notes

Single-file change: crates/higgs-models/src/lib.rs (+78 lines, all in impl AnyCache + test module).
Foundation for the upcoming DraftModel-trait / spec-decode-wiring PR (PR-4).
Co-authored with Claude Opus 4.7

Audit finding (2026-05-04): of the 3 commits originally planned for PR-3 (paged prefix cache, chunked prefill, trim_by), 2 are already on origin/main in superset form via panbanda's PR panbanda#74 squash and the 1514737 TurboQuant landing: - paged_prefix_cache.rs: main has 1072 lines (1072 vs our 855) with TqBlock + slice_axis1 + conv_pos additions on top of our work. - chunked_prefill: main wires it into a 2453-line simple.rs (vs our 1508), with compute_prefill_chunk_size + forward_chunked + the early validation our version was missing. - SteppingKeyValueCache::trim_by(usize): main has the per-layer trim with saturating_sub overflow guard (lib.rs:457). The genuinely-new piece is the AnyCache-level dispatcher: a single helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched. Required by upcoming spec-decode verify-rollback paths in higgs-engine that operate on AnyCache rather than reaching into per-layer types directly. Tests: - any_cache_trim_by_kv_dispatches_to_each_layer: KV variant with mixed Some/None layers, no panic, all caches saturate to 0. - any_cache_trim_by_hybrid_skips_arrays_layers: Hybrid with KV+Arrays mix, asserts Arrays.offset stays unchanged (recurrent state cannot trim by offset alone). Suite: 332 passed, 0 failed, 24 ignored. Clippy + rustfmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 4, 2026

feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback panbanda/higgs#143

Open

dusterbloom: feat(speculative): DraftModel trait + spec-decode primitives #15

Open

dusterbloom: feat(pld): Prompt Lookup Decoding drafter (1.84× headline) #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14

dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback#14
dusterbloom wants to merge 1 commit intomainfrom
dusterbloom/paged-prefix-cache

dusterbloom commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dusterbloom commented May 4, 2026

Summary

Audit context

Behaviour

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant