Skip to content

feat(llama-cpp): expose split_mode option for multi-GPU placement#9560

Merged
mudler merged 1 commit intomasterfrom
feat/llama-cpp-split-mode
Apr 25, 2026
Merged

feat(llama-cpp): expose split_mode option for multi-GPU placement#9560
mudler merged 1 commit intomasterfrom
feat/llama-cpp-split-mode

Conversation

@mudler
Copy link
Copy Markdown
Owner

@mudler mudler commented Apr 25, 2026

Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size.

Assisted-by: Claude:claude-opus-4-7

Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
@mudler mudler merged commit 21eace4 into master Apr 25, 2026
52 of 53 checks passed
@mudler mudler deleted the feat/llama-cpp-split-mode branch April 25, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant