Name and Version
version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64
Thank you.
ovadmani
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
#!/bin/bash
# Define the exact absolute path based on your verified folder structure
BUILD_DIR="/home/ovadm/beellama.cpp/rocm-build"
# 1. Map both local build libraries and global ROCm targets cleanly
export LD_LIBRARY_PATH="$BUILD_DIR/bin:/opt/rocm/lib:/opt/rocm/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
# 2. Hardware Architecture Overrides for Ryzen AI Max+ 395 (Strix Halo)
export HSA_OVERRIDE_GFX_VERSION=11.5.1
export HCC_AMDGPU_TARGET=gfx1151
export PYTORCH_ROCM_ARCH=gfx1151
export GGML_HIP_NO_PINNED=0
export AMD_DISABLE_GFXOFF=1
# 3. Unified Memory and Power Constraints Optimization
export AMD_DISABLE_GFXOFF=1
export HIP_FORCE_POINTER_MAPPING=1
echo "Starting BeeLlama Server with DFlash speculation on gfx1151..."
# 4. Invoke the binary using its absolute location path
ARGS=(
# --verbose
#-m /home/ovadm/models/bartowski--google_gemma-4-31B-it-GGUF/google_gemma-4-31B-it-IQ4_NL.gguf
# -m /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-q6_k.gguf
-m /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
--alias local_model
#--mmproj /home/ovadm/models/mmproj-BF16.gguf
-np 1 #second app
--kv-unified # not splitting context
--spec-type dflash
#--ctx-size-draft 8
--spec-draft-model /home/ovadm/models/Anbeeld--gemma-4-31B-it-DFlash-GGUF/gemma4-31b-it-dflash-Q8_0.gguf
#--spec-dflash-cross-ctx 1024
#--spec-draft-ngl 99 #draft engine on cpu
#--spec-draft-threads 32
#--spec-draft-threads-batch 32
-ngl 99 #brain on gpu
-t 32
--draft-max 8
--draft-min 0
#--no-spec-dm-adaptive
--spec-dflash-default
--spec-draft-p-min 0.25
--flash-attn on
--cache-type-k q8_0
--cache-type-v q8_0
-b 2048
-ub 2048
-c 65536
--port 9091
--no-mmap
--mlock
--metrics
--perf
--presence-penalty 0
--temp 0.3
--top-p 1.0
--top-k 20
--min-p 0
-to 1800
#--jinja
#--reasoning-format deepseek
#--chat-template-kwargs '{"enable_thinking":true}'
--webui-mcp-proxy
)
sudo pkill -9 llama-server
sleep 1
"$BUILD_DIR/bin/llama-server" "${ARGS[@]}" 2>&1 | tee >(stdbuf -oL socat - UDP4-SENDTO:127.0.0.1:5001) | grep --line-buffered -iE "error|failed|exception|critical|fatal"
**********************************************************************************
./gemma-31b-dflsh.sh
Starting BeeLlama Server with DFlash speculation on gfx1151...
gguf_init_from_file: failed to open GGUF file '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file: failed to open GGUF file '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf'
srv load_model: failed to load model, '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf'
Problem description & steps to reproduce
load any xmfp6 model (xfmp4 can only do moe)
First Bad Commit
No response
Relevant log output
Logs
Name and Version
version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64
Thank you.
ovadmani
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
load any xmfp6 model (xfmp4 can only do moe)
First Bad Commit
No response
Relevant log output
Logs