Name and Version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 118784 MiB):
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 118784 MiB
version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64
Thank you,
ovadmani
Operating systems
Linux
GGML backends
HIP
Hardware
Ryzen395 AI+
Models
qwen3.5-122b-A10
Problem description & steps to reproduce
teh problem only with specific version with any model and ant draft quant
ARGS=(
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/MXFP4_MOE/Qwen3.5-122B-A10B-MXFP4_MOE-00001-of-00003.gguf
-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/UD-IQ4_NL/Qwen3.5-122B-A10B-UD-IQ4_NL-00001-of-00003.gguf
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/Q4_K_M/Qwen3.5-122B-A10B-Q4_K_M-00001-of-00003.gguf
--alias local_model
-np 1 #second app
--spec-draft-model /home/ovadm/models/z-lab--Qwen3.5-122B-A10B-DFlash/qwen3.5-dflash.v2.q5_k_m.gguf
--spec-type dflash
-ngl 99
First Bad Commit
it is working fine with: version: 9344 (75ae2a6)
Relevant log output
terminate called after throwing an instance of 'std::runtime_error'
what(): dflash: target and drafter vocab are incompatible; DFlash cannot retokenize draft outputs (target_vocab=248320 drafter_vocab=248320)
Name and Version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 118784 MiB):
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 118784 MiB
version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64
Thank you,
ovadmani
Operating systems
Linux
GGML backends
HIP
Hardware
Ryzen395 AI+
Models
qwen3.5-122b-A10
Problem description & steps to reproduce
teh problem only with specific version with any model and ant draft quant
ARGS=(
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/MXFP4_MOE/Qwen3.5-122B-A10B-MXFP4_MOE-00001-of-00003.gguf
-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/UD-IQ4_NL/Qwen3.5-122B-A10B-UD-IQ4_NL-00001-of-00003.gguf
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/Q4_K_M/Qwen3.5-122B-A10B-Q4_K_M-00001-of-00003.gguf
--alias local_model
-np 1 #second app
--spec-draft-model /home/ovadm/models/z-lab--Qwen3.5-122B-A10B-DFlash/qwen3.5-dflash.v2.q5_k_m.gguf
--spec-type dflash
-ngl 99
First Bad Commit
it is working fine with: version: 9344 (75ae2a6)
Relevant log output
terminate called after throwing an instance of 'std::runtime_error'
what(): dflash: target and drafter vocab are incompatible; DFlash cannot retokenize draft outputs (target_vocab=248320 drafter_vocab=248320)