I build llama.cpp on the ubuntu-24.04-riscv runners. The same workflow passed through late May, then started failing at the test stage in early June, with no change to the build flags or the binaries. 26 of 45 ctest cases die with ***Exception: Illegal (SIGILL), all of them ggml-linked.
ggml compiles its riscv64 CPU backend with a fixed march that includes vector:
-march=rv64gcv_zfh_zvfh_zicbop_zihintpause -mabi=lp64d
That string is identical in the May runs that passed and the June runs that fail. The only thing that differs is which runner the job lands on.
I first assumed it was the half-float part (zfh/zvfh), so I ran a small probe to check instead of guessing. Six shards, each dumping the kernel ISA and running three tiny programs compiled to force the hardware: scalar half-float (rv64gc_zfh), base vector (rv64gcv), and vector half-float (rv64gcv_zfh_zvfh).
Probe run: https://github.com/gounthar/llama.cpp/actions/runs/27096601517
Same result on all six nodes (riscv-runner-5, 11, 12, 46, 50, 51):
isa : rv64imafdcsu
RESULT scalar_zfh: PASS
RESULT vector_v: SIGILL rc=132 (illegal insn) m=rv64gcv
RESULT vector_zvfh: SIGILL rc=132 (illegal insn) m=rv64gcv_zfh_zvfh
So my half-float guess was wrong. It is the base V extension itself (RVV 1.0) that is missing, not a profile-level subset. The vector_v probe is plain rv64gcv, just GC plus V, no Zvfh or anything RVA23-specific, and it still traps. The advertised ISA is rv64imafdcsu with no v at all. Scalar runs fine, anything vector does not.
That explains the llama.cpp failures: ggml's default riscv64 build has V on, so the first vector instruction in any ggml binary traps. The May runs passed because they landed on V-capable nodes, so it looks like this installation's pool lost its vector hardware (or gained scalar-only nodes that now dominate) around early June.
Failing llama.cpp runs for reference:
Upstream ggml-org/llama.cpp riscv CI is currently green, so its installation still has V somewhere.
Two things I am trying to figure out:
- Are these scalar-only nodes (riscv-runner-5, 11, 12, 46, 50, 51) expected in the
ubuntu-24.04-riscv pool, or did the V-capable hardware drop out?
- If the fleet is mixed scalar and vector, is there a way to target V-capable nodes for a job, or to keep an installation homogeneous?
Related to #20 (vector availability in general). This one is concrete: the nodes serving me right now have no V (RVV 1.0), and it used to be there.
I can run any other probe you would find useful.
I build llama.cpp on the
ubuntu-24.04-riscvrunners. The same workflow passed through late May, then started failing at the test stage in early June, with no change to the build flags or the binaries. 26 of 45 ctest cases die with***Exception: Illegal(SIGILL), all of them ggml-linked.ggml compiles its riscv64 CPU backend with a fixed march that includes vector:
That string is identical in the May runs that passed and the June runs that fail. The only thing that differs is which runner the job lands on.
I first assumed it was the half-float part (zfh/zvfh), so I ran a small probe to check instead of guessing. Six shards, each dumping the kernel ISA and running three tiny programs compiled to force the hardware: scalar half-float (
rv64gc_zfh), base vector (rv64gcv), and vector half-float (rv64gcv_zfh_zvfh).Probe run: https://github.com/gounthar/llama.cpp/actions/runs/27096601517
Same result on all six nodes (riscv-runner-5, 11, 12, 46, 50, 51):
So my half-float guess was wrong. It is the base V extension itself (RVV 1.0) that is missing, not a profile-level subset. The
vector_vprobe is plainrv64gcv, just GC plus V, no Zvfh or anything RVA23-specific, and it still traps. The advertised ISA isrv64imafdcsuwith novat all. Scalar runs fine, anything vector does not.That explains the llama.cpp failures: ggml's default riscv64 build has V on, so the first vector instruction in any ggml binary traps. The May runs passed because they landed on V-capable nodes, so it looks like this installation's pool lost its vector hardware (or gained scalar-only nodes that now dominate) around early June.
Failing llama.cpp runs for reference:
Upstream ggml-org/llama.cpp riscv CI is currently green, so its installation still has V somewhere.
Two things I am trying to figure out:
ubuntu-24.04-riscvpool, or did the V-capable hardware drop out?Related to #20 (vector availability in general). This one is concrete: the nodes serving me right now have no V (RVV 1.0), and it used to be there.
I can run any other probe you would find useful.