Skip to content

Eval bug: context length incorrectly capped in server for yarn extendable context models #22140

@steampunque

Description

@steampunque

Name and Version

latest codebase at b8850

Operating systems

Linux

GGML backends

CUDA

Hardware

9900k/4070

Models

Qwen 3 8B

Problem description & steps to reproduce

If yarn is used to extend context length (many qwen models such as Qwen3 use yarn extension), the extension is blocked by this logic in server-context.cpp which incorrectly limits the context length to the base training context:

   int n_ctx_slot = llama_n_ctx_seq(ctx);
    if (n_ctx_slot > n_ctx_train) {
        SRV_WRN("the slot context (%d) exceeds the training context of the model (%d) - capping\n", n_ctx_slot, n_ctx_train);
        n_ctx_slot = n_ctx_train;
    }

Instead just a warning should be output as follows:

   int n_ctx_slot = llama_n_ctx_seq(ctx);
    if (n_ctx_slot > n_ctx_train) {
        SRV_WRN("the slot context (%d) exceeds the training context of the model (%d)\n", n_ctx_slot, n_ctx_train);
    }

First Bad Commit

Unknown, whenever the cap was added.

Relevant log output

Logs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions