fix(metal): add turbo2/3/4 types to FLASH_ATTN_EXT and CPY support ch… by tomwolfe · Pull Request #86 · TheTom/llama-cpp-turboquant

tomwolfe · 2026-04-17T19:12:17Z

Fix: Enable GPU offloading for TurboQuant KV cache types in Metal backend

Issue: GGML_OP_FLASH_ATTN_EXT and GGML_OP_CPY were falling back to the CPU for TurboQuant KV cache types due to missing support in the Metal backend's supports_op check.
Solution: Updated ggml/src/ggml-metal/ggml-metal-device.m to explicitly support the following types for these operations:
- GGML_TYPE_TURBO2_0
- GGML_TYPE_TURBO3_0
- GGML_TYPE_TURBO4_0
Result: Operations are now correctly offloaded to the GPU.
Verification: The project builds successfully with these changes.

…ecks

fix(metal): add turbo2/3/4 types to FLASH_ATTN_EXT and CPY support ch…

93bf21d

…ecks

github-actions bot added ggml Apple Metal labels Apr 17, 2026

Provide feedback