Name and Version
b8850 llama.cpp server
RX 7800XT (latest driver)
R7 7700x
32gb DDR5 6000 MHz RAM
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
-m "unsloth\Qwen\Qwen3.6-35B-A3B-UD-Q6_K.gguf"
--flash-attn on
--ctx-size 120000
--fit on
--fit-target 384
--threads 16
--parallel 1
--no-mmap
--mlock
--cache-ram 2048
--ctx-checkpoints 12
--temp 0.75
--repeat-penalty 1.08
--repeat-last-n 384
--min-p 0.05
--top-k 30
--alias Qwen3.6-35B_L
--chat-template-kwargs "{\"preserve_thinking\":true}"
--reasoning on
Problem description & steps to reproduce
Something really odd happens with the b8850 and this quant. When I load it up with 100k context I have 15.5 gb VRAM usage and around 16GB ram usage. But when I increase the context to 120k my whole ram is filled up (32gb) and even page file, even had a whole system freeze because of it. It never happened with any previous models or in older versions.
The crash is 0xc0000005 = Windows STATUS_ACCESS_VIOLATION
First Bad Commit
No response
Relevant log output
sched_reserve: reserving ...
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve: ROCm0 compute buffer size = 659.77 MiB
sched_reserve: ROCm_Host compute buffer size = 242.52 MiB
sched_reserve: graph nodes = 3729
sched_reserve: graph splits = 74 (with bs=512), 52 (with bs=1)
sched_reserve: reserve took 17654.00 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[WARN] <Qwen3.6-35B_L> ExitError >> exit status 0xc0000005, exit code: 3221225477
[INFO] <Qwen3.6-35B_L> process exited but not StateStopping, current state: starting
[WARN] metrics skipped, HTTP status=502, path=/v1/chat/completions
Name and Version
b8850 llama.cpp server
RX 7800XT (latest driver)
R7 7700x
32gb DDR5 6000 MHz RAM
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Something really odd happens with the b8850 and this quant. When I load it up with 100k context I have 15.5 gb VRAM usage and around 16GB ram usage. But when I increase the context to 120k my whole ram is filled up (32gb) and even page file, even had a whole system freeze because of it. It never happened with any previous models or in older versions.
The crash is 0xc0000005 = Windows STATUS_ACCESS_VIOLATION
First Bad Commit
No response
Relevant log output
sched_reserve: reserving ...
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve: ROCm0 compute buffer size = 659.77 MiB
sched_reserve: ROCm_Host compute buffer size = 242.52 MiB
sched_reserve: graph nodes = 3729
sched_reserve: graph splits = 74 (with bs=512), 52 (with bs=1)
sched_reserve: reserve took 17654.00 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[WARN] <Qwen3.6-35B_L> ExitError >> exit status 0xc0000005, exit code: 3221225477
[INFO] <Qwen3.6-35B_L> process exited but not StateStopping, current state: starting
[WARN] metrics skipped, HTTP status=502, path=/v1/chat/completions