Grid and Larql-Compute-Metal by chrishayuk · Pull Request #104 · chrishayuk/larql

chrishayuk · 2026-05-16T22:59:14Z

No description provided.

These names lied about behavior: `is_q4k_family` returns true for Q4_K, Q4_KF, *and* Q6_K; `uses_q4k` / `ffn_uses_q4k` flag the same mixed family; `q4_index` is just a `VectorIndex` holding kquant weights. `COMP_ATTN_Q4K`'s value was already `"attn_kquant"` — the constant name was the only thing still saying q4k. No on-disk or wire-format change in this commit. Filename constants (`INTERLEAVED_Q4K_BIN`, `ATTN_WEIGHTS_Q4K_BIN`, ...) keep their q4k names here because their *values* are real on-disk filenames; those need dual-read for back-compat and land in a later commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Touches the Rust-level public surface that mixed Q4_K / Q6_K behind q4k-prefixed names. No on-disk file rename, no JSON wire format change, no CLI-flag rename — those land in the next commit with dual-read for back-compat. Renames: - module `format/weights/write_q4k/` → `write_kquant/` - `Q4kWriteOptions` → `KquantWriteOptions`; `down_q4k: bool` → `down_proj: DownProjFormat::{Q6K, Q4K}` (default unchanged: Q6K) - `write_model_weights_q4k` → `write_model_weights_kquant` (and `_with_opts`, `write_attn_weights_q4k`, `write_interleaved_ffn_q4k`, `write_per_layer_moe_q4k`, `write_lm_head_q4k`) - `load_model_weights_q4k` / `_shard` → `_kquant` - `load_lm_head_q4` / `has_lm_head_q4` / `synthesize_lm_head_q4` / `set_lm_head_q4_mmap` / `set_lm_head_q4_synth` / `lm_head_q4_view` → `_kquant` cluster; storage field `lm_head_q4` → `lm_head_kquant` - `prefill_q4` / `prefill_q4_with_head_replacement` / `prefill_q4_prompt` / `prefill_q4k_moe` / `prefill_q4k_cpu` / `prefill_q4k_cpu_fallback` / `full_pipeline_q4_capture_pre_wo` → `prefill_kquant*` / `full_pipeline_kquant_capture_pre_wo` - `STAGE_MODEL_WEIGHTS_Q4K` (value `"model_weights_kquant"`) and `COMP_FFN_Q4K` (value `"ffn_kquant"`) CLI: `--down-q4k` user-facing flag preserved; translates to the new `DownProjFormat` enum at struct construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chrishayuk and others added 30 commits May 14, 2026 01:06

working on server grid

1611bb0

working on grid

c14886a

working on test coverage

b5d04b9

working on performance

9d3e4ca

PERFORMANCE

90f61bc

working kv refactor

e36c00d

kv unification

fc2906a

workg on kv unification

61f0f27

mega split for larql-compute-metal

f23b08e

mega split for larql-compute-metal

c467213

working oin kv

f638894

working on kv cache

eaec33f

coverage of metal now over 90%

151360f

models coverage

b1d22bd

cleaned up larql-kv

a0f6ff6

clean up of larql-compute samples

50f9866

Merge remote-tracking branch 'origin/main' into grid-server

0bbd813

tests

6ece3f3

fixed issues in ci build

bd4ac02

working on quality fixes

cfc44cd

impeoving test coverage

62c55fd

improving coverage

20f2332

working on the pr failures

0758890

granite and coverage

14bba27

clearing ci issues

14ca3bf

improving coverage

b4062d6

working on coverage and kv engines

bdc6b6e

working on coverage still

0736a38

docs update

f9e9dda

chrishayuk and others added 2 commits May 17, 2026 16:45

working on quant

eea64f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grid and Larql-Compute-Metal#104

Grid and Larql-Compute-Metal#104
chrishayuk wants to merge 32 commits into
mainfrom
grid-server

chrishayuk commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrishayuk commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant