dusterbloom: feat(pld): Prompt Lookup Decoding drafter (1.84× headline) by dusterbloom · Pull Request #16 · dusterbloom/higgs

dusterbloom · 2026-05-04T11:01:28Z

Summary

Adds pld::PldDraftModel — the DraftModel (PR-4) impl that proposes draft tokens by n-gram suffix matching against the running sequence. No model weights, no GPU calls.

Headline (validated on feat/magic-canvas)

Carnice-9B verbatim, max_tokens=384, T=0, 3-run median:

Metric	Before	After	Speedup
Decode	21.68 tps	39.87 tps	1.84×

Source: benchmarks/pld_carnice_20260426/results.json on feat/magic-canvas.

What ships in this PR

crates/higgs-engine/src/pld.rs (NEW, 292 lines incl. 14 unit tests)
pub mod pld; declaration in lib.rs

That's it. The implementation is fully standalone — no simple.rs changes, no CLI plumbing, no doctor wiring.

What's deferred to a follow-up

--pld --pld-max-ngram --pld-min-ngram CLI flags
ModelConfig.pld / pld_max_ngram / pld_min_ngram fields
doctor::check_pld validation
FSM-aware verify (lets PLD compound with constrained decoding)
AR-spec K=2..3 + DFlash FSM gate lift

These all touch heavily-evolved simple.rs / constrained.rs regions and several need PR-6 (DFlash baseline) or the deferred-half of PR-4 (engine wiring with concrete drafter). Splitting them out keeps this PR a clean, reviewable primitive.

Stacked

This PR is stacked on:

PR dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback #14 (AnyCache::trim_by)
PR dusterbloom: feat(speculative): DraftModel trait + spec-decode primitives #15 (DraftModel trait + spec-decode primitives)

Merging those first reduces this diff. After all three merge, PldDraftModel is a usable DraftModel impl waiting for the engine wiring.

Senior-Rust hygiene

File-level #![allow(clippy::indexing_slicing)] is documented and scoped — PLD lookup is pure CPU-side n-gram matching with locally-checked bounds.
No unwrap_used / panic / shadow_* allows — code is otherwise workspace-clippy-clean.

Test plan

cargo clippy --all-targets --all-features — clean
cargo fmt --all -- --check — clean
cargo test -p higgs-engine --lib pld -- --test-threads=1 — 14 passed, 0 failed

Notes

Co-authored with Claude Opus 4.7 (originally Sonnet 4.6 on feat/magic-canvas)

Audit finding (2026-05-04): of the 3 commits originally planned for PR-3 (paged prefix cache, chunked prefill, trim_by), 2 are already on origin/main in superset form via panbanda's PR panbanda#74 squash and the 1514737 TurboQuant landing: - paged_prefix_cache.rs: main has 1072 lines (1072 vs our 855) with TqBlock + slice_axis1 + conv_pos additions on top of our work. - chunked_prefill: main wires it into a 2453-line simple.rs (vs our 1508), with compute_prefill_chunk_size + forward_chunked + the early validation our version was missing. - SteppingKeyValueCache::trim_by(usize): main has the per-layer trim with saturating_sub overflow guard (lib.rs:457). The genuinely-new piece is the AnyCache-level dispatcher: a single helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched. Required by upcoming spec-decode verify-rollback paths in higgs-engine that operate on AnyCache rather than reaching into per-layer types directly. Tests: - any_cache_trim_by_kv_dispatches_to_each_layer: KV variant with mixed Some/None layers, no panic, all caches saturate to 0. - any_cache_trim_by_hybrid_skips_arrays_layers: Hybrid with KV+Arrays mix, asserts Arrays.offset stays unchanged (recurrent state cannot trim by offset alone). Suite: 332 passed, 0 failed, 24 ignored. Clippy + rustfmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ports pure-logic speculative-decoding primitives from feat/ane-spec-decode (efa05de + prefill method from 293793a) into higgs-engine: - accept_prefix() — prefix-match acceptance logic - speculative_step() — one draft→verify→accept cycle - speculative_loop() — run until EOS/max_tokens - DraftModel trait: prefill, draft, advance, rollback (all Send) MlxDraftModel CoreML impl intentionally not ported; magic-canvas has native AneBonsaiEngine / AneArDecodeEngine / finalize_ane_gdn_inline paths instead. Bonsai shim lands separately. 19 unit tests (accept_prefix 7, MockDraft 3, step 5, loop 4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Ports config plumbing from feat/ane-spec-decode 8e3a5b8: - ServeArgs gains --draft-model (Option<String>) and --num-draft (usize, default 8) CLI flags. - ModelConfig gains draft_model + num_draft fields, threaded through build_simple_config, load_config_file, and ensure_auto_router_model. - Test fixtures updated across config.rs, doctor.rs, tui/mod.rs. No load-path wiring yet — that arrives with the simple.rs engine port. MlxDraftModel from the source commit deliberately omitted; magic-canvas uses AneBonsaiDraftModel and native paths instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Ports doctor validation from feat/ane-spec-decode 2a7480d: - check_draft_models() — resolves ModelConfig.draft_model path (FAIL if not found) and warns when a draft model is paired with batch=true (speculative decoding is SimpleEngine-only). - Documents draft_model / num_draft in the daemon init template and README.md config reference. 3 new doctor tests covering: unresolvable draft path, batch=true warn, no-draft-model short-circuit. Doctor suite now 29 tests (was 26). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Cherry-picked PR-4 commits had a file-level `#![allow(clippy::unwrap_used, clippy::indexing_slicing, clippy::panic)]` on speculative.rs as a quick fix to satisfy strict workspace deny lints. That's a workaround, not how senior Rust code is written. Refactor: - `accept_prefix`: replace `for i in 0..k { target_ids[i] }` indexing with `target_ids.iter().zip(draft_ids.iter())`. Lock-step iteration, types enforce bounds, no library indexing. The K+1th token (the bonus when every draft matched) is appended via `target_ids.last()` after the loop. - File-level allow: deleted. Library code is clippy-clean. - Test module allow: scoped tightly to `#[cfg(test)] mod tests` only — `#[allow(clippy::panic, clippy::unwrap_used, clippy::indexing_slicing, clippy::unreachable)]`. Tests legitimately use these (failure → test fail, which is the intent). doctor.rs new tests: introduce `test_model_config(path)` helper instead of 13-field literals duplicated 3×. DRY test setup with struct-update syntax overrides for the field each test cares about. Verify: `cargo clippy --all-targets --all-features` clean, `cargo test -p higgs-engine --lib speculative -- --test-threads=1` 19/19, `cargo test -p higgs --lib doctor -- --test-threads=1` 39/39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `pld::PldDraftModel` implementing the `DraftModel` trait (PR-4) by n-gram suffix matching against the running sequence. No model weights, no GPU calls — the lookup is microseconds; the verify step still runs the target model on the proposed batch. When no n-gram match is found, `draft()` returns an empty Vec; the verify step then runs at length-1, equivalent to plain autoregressive decode. Reference: Apoorv Saxena, "Prompt Lookup Decoding" (github.com/apoorvumang/prompt-lookup-decoding). Headline number from feat/magic-canvas branch validation (`benchmarks/pld_carnice_20260426/results.json`): Carnice-9B verbatim, max_tokens=384, T=0: 21.68 → 39.87 tps median (3-run) — **1.84× decode** This PR ships the standalone PLD drafter primitive only. The follow-up work (CLI flags --pld/--pld-max-ngram/--pld-min-ngram, ModelConfig fields, doctor checks, FSM-aware verify) lives in a future PR once the broader spec-decode wiring (PR-4 deferred half) lands. Senior-Rust note: pld.rs has a documented `#![allow(clippy::indexing_slicing)]` at file scope. The slicing is provably bounded by local length checks (see comment at top of file). Other PR-related lints (unwrap_used, shadow_*, etc.) are NOT blanket-allowed — code is otherwise clippy-clean. Verify: `cargo clippy --all-targets --all-features` clean, `cargo fmt` clean, `cargo test -p higgs-engine --lib pld -- --test-threads=1` 14/14 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR panbanda#145 Lint job blocked on three clippy errors in pld.rs: 1. PldDraftModel::new is trivially const — mark it const fn 2. _assert_send (and its caller _assert) used underscore prefix to dodge dead_code, but clippy's used_underscore_items now flags the call site. Replace the wrapper with a direct monomorphization at const-block scope: const _: () = { const fn check_send<T: DraftModel + Send>() {} check_send::<PldDraftModel>(); }; The call exercises the trait bound at compile time without leaving any uncalled fn around for dead_code to complain about. Locally: cargo clippy --all-targets --all-features -D warnings clean, 14 pld tests pass, 98 higgs tests pass, fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nfig default - README.md: resolve stale merge conflict markers from feef8e4 by taking HEAD; the daemon/Profiles/Features block is already covered on main via the "Core commands" section. - pld.rs: add /// doc comments on PldDraftModel and ::new constructor. - config.rs: replace hardcoded num_draft: 8 in auto-router model injection with default_num_draft() to centralize the default. Validation: fmt clean, clippy clean (higgs + higgs-engine), 14/14 pld unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dusterbloom mentioned this pull request May 4, 2026

feat(pld): Prompt Lookup Decoding drafter (1.84× headline) panbanda/higgs#145

Open

4 tasks

dusterbloom and others added 7 commits May 5, 2026 11:44

dusterbloom force-pushed the dusterbloom/pld-fsm-verify branch from 4fa52f4 to 6e5da16 Compare May 5, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dusterbloom: feat(pld): Prompt Lookup Decoding drafter (1.84× headline)#16

dusterbloom: feat(pld): Prompt Lookup Decoding drafter (1.84× headline)#16
dusterbloom wants to merge 8 commits intomainfrom
dusterbloom/pld-fsm-verify

dusterbloom commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dusterbloom commented May 4, 2026

Summary

Headline (validated on feat/magic-canvas)

What ships in this PR

What's deferred to a follow-up

Stacked

Senior-Rust hygiene

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant