Name and Version
latest codebase at b8850
Operating systems
Linux
GGML backends
CUDA
Hardware
9900k/4070
Models
Qwen 3 8B
Problem description & steps to reproduce
If yarn is used to extend context length (many qwen models such as Qwen3 use yarn extension), the extension is blocked by this logic in server-context.cpp which incorrectly limits the context length to the base training context:
int n_ctx_slot = llama_n_ctx_seq(ctx);
if (n_ctx_slot > n_ctx_train) {
SRV_WRN("the slot context (%d) exceeds the training context of the model (%d) - capping\n", n_ctx_slot, n_ctx_train);
n_ctx_slot = n_ctx_train;
}
Instead just a warning should be output as follows:
int n_ctx_slot = llama_n_ctx_seq(ctx);
if (n_ctx_slot > n_ctx_train) {
SRV_WRN("the slot context (%d) exceeds the training context of the model (%d)\n", n_ctx_slot, n_ctx_train);
}
First Bad Commit
Unknown, whenever the cap was added.
Relevant log output
Logs
Name and Version
latest codebase at b8850
Operating systems
Linux
GGML backends
CUDA
Hardware
9900k/4070
Models
Qwen 3 8B
Problem description & steps to reproduce
If yarn is used to extend context length (many qwen models such as Qwen3 use yarn extension), the extension is blocked by this logic in server-context.cpp which incorrectly limits the context length to the base training context:
Instead just a warning should be output as follows:
First Bad Commit
Unknown, whenever the cap was added.
Relevant log output
Logs