Skip to content

Misc. bug: model parameter in preset config is not resolved from huggingface cache #22130

@fillg1

Description

@fillg1

Name and Version

ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB
version: 8845 (037bfe3)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

lama-server -kvu --no-mmap -ngl 999 -fa 1 --host 0.0.0.0 --models-preset config.ini

Problem description & steps to reproduce

I am using a presets model configuration file and I want to have two different configurations for the same model eg.

[qwen3.6-35b-a3b-coding]
model = unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
image-min-tokens = 1024
temp = 0.6
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 0.0
repeat-penalty = 1.0
c = 262144

[qwen3.6-35b-a3b-general]
model = unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
alias = qwen3.6-35b-a3b
image-min-tokens = 1024
temp = 1.0
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
c = 262144

usually my models stay in the huggingface cache ~/.cache/huggingface/, so I can reference with their "huggingface path or id" unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL. This works in the section name, but not for the model attribute as above

I would expect that the model attribute is used either as an relative path to model storage or relativ to the hugging face cache , as this other section from my config works

[unsloth/Qwen3.5-35B-A3B-GGUF:Q8_K_XL]
alias = qwen3.5-35b-a3b
image-min-tokens = 1024
temp = 0.6
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 0.0
repeat-penalty = 1.0

First Bad Commit

No response

Relevant log output

Logs
[58093] gguf_init_from_file: failed to open GGUF file 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL' (No such file or directory)
[58093] llama_model_load: error loading model: llama_model_loader: failed to load model from unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
[58093] llama_model_load_from_file_impl: failed to load model
[58093] llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
[58093] llama_params_fit: fitting params to free memory took 0.00 seconds
[58093] llama_model_load_from_file_impl: using device ROCm0 (Radeon 8060S Graphics) (0000:bd:00.0) - 30570 MiB free
[58093] gguf_init_from_file: failed to open GGUF file 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL' (No such file or directory)
[58093] llama_model_load: error loading model: llama_model_loader: failed to load model from unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
[58093] llama_model_load_from_file_impl: failed to load model
[58093] common_init_from_params: failed to load model 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL'
[58093] srv    load_model: failed to load model, 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL'
[58093] srv    operator(): operator(): cleaning up before exit...
[58093] main: exiting due to model loading error
srv    operator(): instance name=qwen3.6-35b-a3b-general exited with status 1
srv    operator(): got exception: {"error":{"code":500,"message":"model name=qwen3.6-35b-a3b-general failed to load","type":"server_error"}}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions