server : do not cap slot context to training context (#22140) by jinweihan-ai · Pull Request #22145 · ggml-org/llama.cpp

jinweihan-ai · 2026-04-20T06:30:16Z

Summary

server_context silently capped each slot's n_ctx to the model's training context, so any user who extended the context via RoPE scaling (YaRN) — the whole point of models like Qwen3 — effectively had their --ctx-size ignored once the slot was created, even though the KV cache had already been sized for the full n_ctx_seq.

This PR drops the cap and keeps only the warning. llama_context itself already logs "n_ctx_seq (...) > n_ctx_train (...) -- possible training context overflow", so users still see the safety signal.

Before

llama_context: n_ctx_seq     = 4096
llama_kv_cache: size =    2.50 MiB (  4096 cells, ...)
srv    load_model: the slot context (4096) exceeds the training context of the model (2048) - capping
slot   load_model: id  0 | task -1 | new slot, n_ctx = 2048   ← halved

After

llama_context: n_ctx_seq     = 4096
llama_kv_cache: size =    2.50 MiB (  4096 cells, ...)
srv    load_model: the slot context (4096) exceeds the training context of the model (2048) - generation quality may degrade beyond the training context unless RoPE scaling is configured
slot   load_model: id  0 | task -1 | new slot, n_ctx = 4096  ← as requested

/props now reports default_generation_settings.n_ctx = 4096 (previously 2048).

Test plan

Reproduced the bug on master with stories260K.gguf (n_ctx_train = 2048) and -c 4096.
Verified the patched build preserves the user-requested n_ctx in both the slot init log and the /props endpoint.
/completion still returns correctly after the change (20 tokens, stop=true, coherent output).

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes. This PR was produced in an AI-assisted workflow — an agent helped surface the candidate issue, drafted the patch, and wrote this description; the fix and reproduction were reviewed and verified locally (bug reproduced on master, fix built cleanly, slot n_ctx and /completion checked) before submitting. Human review and validation in the loop.

The per-slot cap overrides the user-requested context size even when it was explicitly extended via RoPE scaling (YaRN), which is the whole point of YaRN-aware models such as Qwen3. The KV cache is already allocated for the full n_ctx_seq, so capping slot.n_ctx only throws away addressable cells that the user paid memory for. llama_context already warns about "possible training context overflow" when n_ctx_seq > n_ctx_train, so dropping the server-side cap keeps the safety signal without silently ignoring --ctx-size. Closes ggml-org#22140

ggml-gh-bot · 2026-04-20T06:34:42Z

Hi @jinweihan-ai, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

jinweihan-ai requested a review from a team as a code owner April 20, 2026 06:30

github-actions bot added examples server labels Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : do not cap slot context to training context (#22140)#22145

server : do not cap slot context to training context (#22140)#22145
jinweihan-ai wants to merge 1 commit intoggml-org:masterfrom
jinweihan-ai:server-no-cap-slot-ctx

jinweihan-ai commented Apr 20, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinweihan-ai commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Test plan

Requirements

Uh oh!

ggml-gh-bot bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinweihan-ai commented Apr 20, 2026 •

edited

Loading