Skip to content

Grid and Larql-Compute-Metal#104

Open
chrishayuk wants to merge 32 commits into
mainfrom
grid-server
Open

Grid and Larql-Compute-Metal#104
chrishayuk wants to merge 32 commits into
mainfrom
grid-server

Conversation

@chrishayuk
Copy link
Copy Markdown
Owner

No description provided.

chrishayuk and others added 30 commits May 14, 2026 01:06
These names lied about behavior: `is_q4k_family` returns true for
Q4_K, Q4_KF, *and* Q6_K; `uses_q4k` / `ffn_uses_q4k` flag the same
mixed family; `q4_index` is just a `VectorIndex` holding kquant
weights. `COMP_ATTN_Q4K`'s value was already `"attn_kquant"` — the
constant name was the only thing still saying q4k.

No on-disk or wire-format change in this commit. Filename constants
(`INTERLEAVED_Q4K_BIN`, `ATTN_WEIGHTS_Q4K_BIN`, ...) keep their q4k
names here because their *values* are real on-disk filenames; those
need dual-read for back-compat and land in a later commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chrishayuk and others added 2 commits May 17, 2026 16:45
Touches the Rust-level public surface that mixed Q4_K / Q6_K behind
q4k-prefixed names. No on-disk file rename, no JSON wire format
change, no CLI-flag rename — those land in the next commit with
dual-read for back-compat.

Renames:
- module `format/weights/write_q4k/` → `write_kquant/`
- `Q4kWriteOptions` → `KquantWriteOptions`; `down_q4k: bool` →
  `down_proj: DownProjFormat::{Q6K, Q4K}` (default unchanged: Q6K)
- `write_model_weights_q4k` → `write_model_weights_kquant` (and
  `_with_opts`, `write_attn_weights_q4k`, `write_interleaved_ffn_q4k`,
  `write_per_layer_moe_q4k`, `write_lm_head_q4k`)
- `load_model_weights_q4k` / `_shard` → `_kquant`
- `load_lm_head_q4` / `has_lm_head_q4` / `synthesize_lm_head_q4` /
  `set_lm_head_q4_mmap` / `set_lm_head_q4_synth` / `lm_head_q4_view`
  → `_kquant` cluster; storage field `lm_head_q4` → `lm_head_kquant`
- `prefill_q4` / `prefill_q4_with_head_replacement` /
  `prefill_q4_prompt` / `prefill_q4k_moe` / `prefill_q4k_cpu` /
  `prefill_q4k_cpu_fallback` / `full_pipeline_q4_capture_pre_wo`
  → `prefill_kquant*` / `full_pipeline_kquant_capture_pre_wo`
- `STAGE_MODEL_WEIGHTS_Q4K` (value `"model_weights_kquant"`) and
  `COMP_FFN_Q4K` (value `"ffn_kquant"`)

CLI: `--down-q4k` user-facing flag preserved; translates to the new
`DownProjFormat` enum at struct construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant