feat(llama-cpp): expose split_mode option for multi-GPU placement by mudler · Pull Request #9560 · mudler/LocalAI

mudler · 2026-04-25T10:22:53Z

Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size.

Assisted-by: Claude:claude-opus-4-7

Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7

mudler merged commit 21eace4 into master Apr 25, 2026
52 of 53 checks passed

mudler deleted the feat/llama-cpp-split-mode branch April 25, 2026 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llama-cpp): expose split_mode option for multi-GPU placement#9560

feat(llama-cpp): expose split_mode option for multi-GPU placement#9560
mudler merged 1 commit intomasterfrom
feat/llama-cpp-split-mode

mudler commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mudler commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant