ngram-mod: Reset i_last when low acceptance streak occurs by treo · Pull Request #22168 · ggml-org/llama.cpp

treo · 2026-04-20T13:05:51Z

Overview

By resetting i_last to zero, we will include the current context when rebuilding the speculative map.

The existing behavior would skip the current context, thereby often losing the benefit of speculative decoding during later parts of the generation.

The effect of this seems to depend on both the model and the speculation parameters.

Benchmark:
vllm bench serve --model google/gemma-4-26B-A4B-it --host 127.0.0.1 --port 9876 --num-prompts 96 --dataset-name hf --dataset-path vdaita/edit_5k_char --backend openai-chat --endpoint '/v1/chat/completions' --max-concurrency 1

Gemma 4 26B-A4B:

Baseline: 97.57 t/s (peak 108)
without i_last = 0: 141.08 t/s (peak 1032)
with i_last = 0: 155.10 t/s (peak 1032)

Qwen 3.6 35B-A3B:

Baseline: 112.54 t/s (peak 127)
without i_last = 0: 148.56 t/s (peak 774)
with i_last = 0: 153.65 t/s (peak 768)

As we can see, the effect isn't huge but at 3% to 9% it is still measurable.
The price we pay for it is 1 line of code and a moment longer for speculative map repopulation whenever a low acceptance streak occours.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES. Used to help understand the ngram-mod flow.

By resetting i_last to zero, we will include the current context when rebuilding the speculative map.

Reset i_last when low acceptance streak occurs

5dc0dea

By resetting i_last to zero, we will include the current context when rebuilding the speculative map.

treo requested a review from a team as a code owner April 20, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngram-mod: Reset i_last when low acceptance streak occurs#22168

ngram-mod: Reset i_last when low acceptance streak occurs#22168
treo wants to merge 1 commit intoggml-org:masterfrom
treo:master

treo commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

treo commented Apr 20, 2026

Overview

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant