Name and Version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB
version: 8845 (037bfe3)
built with GNU 15.2.1 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
lama-server -kvu --no-mmap -ngl 999 -fa 1 --host 0.0.0.0 --models-preset config.ini
Problem description & steps to reproduce
I am using a presets model configuration file and I want to have two different configurations for the same model eg.
[qwen3.6-35b-a3b-coding]
model = unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
image-min-tokens = 1024
temp = 0.6
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 0.0
repeat-penalty = 1.0
c = 262144
[qwen3.6-35b-a3b-general]
model = unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
alias = qwen3.6-35b-a3b
image-min-tokens = 1024
temp = 1.0
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
c = 262144
usually my models stay in the huggingface cache ~/.cache/huggingface/, so I can reference with their "huggingface path or id" unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL. This works in the section name, but not for the model attribute as above
I would expect that the model attribute is used either as an relative path to model storage or relativ to the hugging face cache , as this other section from my config works
[unsloth/Qwen3.5-35B-A3B-GGUF:Q8_K_XL]
alias = qwen3.5-35b-a3b
image-min-tokens = 1024
temp = 0.6
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 0.0
repeat-penalty = 1.0
First Bad Commit
No response
Relevant log output
Logs
[58093] gguf_init_from_file: failed to open GGUF file 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL' (No such file or directory)
[58093] llama_model_load: error loading model: llama_model_loader: failed to load model from unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
[58093] llama_model_load_from_file_impl: failed to load model
[58093] llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
[58093] llama_params_fit: fitting params to free memory took 0.00 seconds
[58093] llama_model_load_from_file_impl: using device ROCm0 (Radeon 8060S Graphics) (0000:bd:00.0) - 30570 MiB free
[58093] gguf_init_from_file: failed to open GGUF file 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL' (No such file or directory)
[58093] llama_model_load: error loading model: llama_model_loader: failed to load model from unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL
[58093] llama_model_load_from_file_impl: failed to load model
[58093] common_init_from_params: failed to load model 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL'
[58093] srv load_model: failed to load model, 'unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL'
[58093] srv operator(): operator(): cleaning up before exit...
[58093] main: exiting due to model loading error
srv operator(): instance name=qwen3.6-35b-a3b-general exited with status 1
srv operator(): got exception: {"error":{"code":500,"message":"model name=qwen3.6-35b-a3b-general failed to load","type":"server_error"}}
Name and Version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB
version: 8845 (037bfe3)
built with GNU 15.2.1 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
I am using a presets model configuration file and I want to have two different configurations for the same model eg.
usually my models stay in the huggingface
cache ~/.cache/huggingface/, so I can reference with their "huggingface path or id"unsloth/Qwen3.6-35B-A3B-GGUF:Q8_K_XL. This works in the section name, but not for the model attribute as aboveI would expect that the model attribute is used either as an relative path to model storage or relativ to the hugging face cache , as this other section from my config works
First Bad Commit
No response
Relevant log output
Logs