feat(ggml): add Q3_K and Q5_K dequantization (types 11 and 13) by mvkorobkov · Pull Request #103 · chrishayuk/larql

mvkorobkov · 2026-05-15T08:58:45Z

What

Implements scalar dequantization for two missing K-quant formats:

Type	ID	Block size	Elements/block
Q3_K	11	110 bytes	256
Q5_K	13	176 bytes	256

Both tensor_data_size() and dequantize() in mod.rs are wired up.

Why

Without these, larql convert gguf-to-vindex fails with unsupported type id 11/13 on any model that uses Q3_K or Q5_K tensors. This includes:

DeepSeek-R1-0528-Qwen3-8B-Q3_K_L — 145 Q3_K tensors + 108 Q5_K tensors + 1 Q6_K
DeepSeek-V4-Flash-Q3_K_M (multi-shard, separate PR) — same types
Any model quantised with llama.cpp Q3_K_S / Q3_K_M / Q3_K_L / Q5_K_S / Q5_K_M

Implementation

q3_k.rs — dequantize_q3_k()

Block layout: hmask[0..32] · qs[32..96] · scales[96..108] · d[108..110]
unpack_q3k_scales(): 12 bytes → 16 six-bit signed values using the kmask1=0x03030303 / kmask2=0x0F0F0F0F shuffle from dequantize_row_q3_K in llama.cpp
Two-half loop (128 + 128 elements), m bitmask walks through hmask; clear bit → subtract 4 from q2 value

q5_k.rs — dequantize_q5_k()

Block layout: d[0..2] · dmin[2..4] · scales[4..16] · qh[16..48] · qs[48..176]
Reuses pub(super) unpack_q4k_scales() from q4_k.rs (same 12-byte format as Q4_K)
u1/u2 bitmask pair walks through qh; set bit → add 16 to 4-bit nibble
4 iterations × 64 elements, matching dequantize_row_q5_K in llama.cpp

q4_k.rs — unpack_q4k_scales visibility changed fn → pub(super) so Q5_K can share it without duplication.

Testing

Unit tests in each module:

q3_k: zero-scale all-zero output, hmask-clear subtracts 4, wrong-size error
q5_k: zero-scale all-zero output, high-bit adds 16, wrong-size error

End-to-end: larql convert gguf-to-vindex on DeepSeek-R1-0528-Qwen3-8B-Q3_K_L.gguf completes through dequantization without errors (145 Q3_K + 108 Q5_K tensors dequantized cleanly).

All 82 existing larql-models tests continue to pass.

Implements scalar dequantize for Q3_K (110 B/block) and Q5_K (176 B/block) so that DeepSeek-R1-0528-Qwen3-8B-Q3_K_L and similar models can be converted via larql gguf-to-vindex. - q3_k.rs: unpack_q3k_scales (kmask1/kmask2 per llama.cpp), two-half-block loop with m-bitmask for high bits, signed-scale centred at 32. - q5_k.rs: reuses pub(super) unpack_q4k_scales from q4_k; u1/u2 mask walk for high bits, 4 iterations of 64 elements each. - mod.rs: Q3_K_BLOCK_BYTES=110, Q5_K_BLOCK_BYTES=176, dispatch in tensor_data_size() and dequantize(). - q4_k.rs: unpack_q4k_scales promoted to pub(super) for Q5_K reuse.

Mykhailo Korobkov added 2 commits May 15, 2026 11:57

chore: remove inherited workflows (not part of this PR)

179a07b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ggml): add Q3_K and Q5_K dequantization (types 11 and 13)#103

feat(ggml): add Q3_K and Q5_K dequantization (types 11 and 13)#103
mvkorobkov wants to merge 2 commits into
chrishayuk:mainfrom
mvkorobkov:feat/q3k-q5k-dequant

mvkorobkov commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvkorobkov commented May 15, 2026

What

Why

Implementation

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant