dusterbloom: feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback by dusterbloom · Pull Request #19 · dusterbloom/higgs

dusterbloom · 2026-05-05T14:44:03Z

Summary

Adds a load-time fallback for Qwen3.5 models with mixed-bit GDN BA projections (some layers q4, some q8 — common in unsloth dynamic-quant variants). The default fused loader concatenates in_proj_a + in_proj_b into a single matmul; mixed-bit weights have incompatible shapes and fusion fails. This PR detects the specific failure and retries with separate (4-dispatch) GDN projections.

This is PR-1c of the magic-canvas split — a deferred follow-up from PR-1's audit (commit 061e500c).

Behaviour

┌─ load_qwen3_5_model / load_qwen3_5_moe_model ─────┐
│                                                   │
│  args.use_separate_gdn_projections                │
│  || HIGGS_SEPARATE_GDN_PROJ env                   │
│      ↓ yes                                         │
│  ┌── direct (separate, 4 dispatches) ──┐         │
│      ↓ no                                          │
│  ┌── try fused (2 dispatches) ──────────┐         │
│      ↓ ShapeMismatch matching "in_proj_ba" +     │
│        "requires separate GDN projections"        │
│      ↓ retry with use_separate=true              │
│  ┌── direct (separate, mixed-bit fallback) ──┐   │
└───────────────────────────────────────────────────┘

What's added

is_mixed_bit_gdn_ba_fusion_error(err: &ModelError) -> bool — pattern-matches the diagnostic shape-mismatch error
qwen3_5_quantization_config — parses {group_size, bits} from the per-layer quantization map
qwen3_5_mixed_ba_quantization_layers — scans for layers where in_proj_a and in_proj_b differ in bits or group_size
can_concatenate_axis0 — guard used inside load_qwen3_5_moe_weights_fused to emit the diagnostic error rather than panic on the concat
load_qwen3_5_model_with_gdn_fallback — private helper called by both dense and MoE load paths, unifies the fallback

Adaptations from feat/magic-canvas

The dense load_qwen3_5_model previously only honoured the env var; now it honours args.use_separate_gdn_projections too, matching the MoE path. Strict improvement.
Direct cherry-pick of 061e500c had 5 conflict regions; this is a manual surgical port that preserves origin/main's structure.

Hygiene

No unwrap() on Result/Option
No as casts in shape-arithmetic paths (use i32::try_from)
Match arms enumerate variants
No file-level blanket allows added

Test plan

cargo check -p higgs-models — clean
cargo clippy --all-targets --all-features -- -D warnings — clean (rustc 1.95.0, matches CI)
cargo fmt --check — clean
cargo test -p higgs-models --lib — 333/333 pass (3 new)
End-to-end: load an unsloth dynamic-quant Qwen3.5 (e.g. Brooooooklyn/Qwen3.5-27B-unsloth-mlx) and confirm the warn-then-retry path triggers, model loads successfully, generation produces coherent tokens.

🤖 Generated with Claude Code

fix(deps): update rust crate toml to v1

…l-action-digest chore(deps): update taiki-e/install-action digest to cca35ed

…file chore(deps): update rust crate tokio to v1.52.2

…-lockfile chore(deps): update rust crate tower-http to v0.6.9

…anbanda#143) Adds AnyCache::trim_by to roll back KV layers for speculative decode while leaving hybrid Arrays state untouched.\n\nCI: https://github.com/panbanda/higgs/actions/runs/25312580791

Adds a fallback path for loading Qwen3.5 models with mixed-bit GDN projection weights (some layers q4, some q8 — common in unsloth's dynamic-quant variants). The default fused-projection loader fuses `in_proj_a` + `in_proj_b` into a single matmul; mixed-bit weights have incompatible shapes and the fusion fails. Behaviour: 1. Detect via `is_mixed_bit_gdn_ba_fusion_error` — matches a `ModelError::ShapeMismatch` whose message contains both `in_proj_ba` and `requires separate GDN projections`. 2. On detection, retry the load with `args.use_separate_gdn_projections = true`, taking the `load_qwen3_5_moe_weights_direct` path. Forward dispatches go from 2 to 4 GDN ops per layer — slightly slower but correct. 3. Forced separate (via `args.use_separate_gdn_projections` config or `HIGGS_SEPARATE_GDN_PROJ` env var) skips the fused attempt entirely. Also adds: * `qwen3_5_quantization_config` — parses `{group_size, bits}` from the per-layer `quantization` map in `config.json`. * `qwen3_5_mixed_ba_quantization_layers` — scans for the layers where `in_proj_a` and `in_proj_b` differ in bits or group_size. * `can_concatenate_axis0` — guard used inside `load_qwen3_5_moe_weights_fused` to emit the diagnostic `ShapeMismatch` error rather than panicking on the concat. * `load_qwen3_5_model_with_gdn_fallback` — private helper called by both `load_qwen3_5_model` (dense) and `load_qwen3_5_moe_model` (MoE), unifying the fallback path. Adaptations from feat/magic-canvas → origin/main: * The dense `load_qwen3_5_model` previously only honoured the env var; now it honours `args.use_separate_gdn_projections` too, matching the MoE path. Strict improvement: the config flag is set only by the env var or by mixed-bit detection. * No `unwrap()`, no `as` casts (use `i32::try_from`); match arms enumerate variants. No file-level allows added. Verification on origin/main (rustc 1.95.0): * `cargo check -p higgs-models` — clean * `cargo clippy --all-targets --all-features -- -D warnings` — clean * `cargo fmt --check` — clean * `cargo test -p higgs-models --lib` — 333/333 pass (3 new) Source: feat/magic-canvas commit `061e500c`. Direct cherry-pick had 5 conflict regions because origin/main has evolved the load functions independently; this is a manual surgical port that preserves origin/main's structure while adding the fallback behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

renovate Bot and others added 10 commits April 29, 2026 03:05

fix(deps): update rust crate toml to v1

f671235

chore(deps): update rust crate tokio to v1.52.2

50daaef

chore(deps): update taiki-e/install-action digest to cca35ed

8659d38

chore(deps): update rust crate tower-http to v0.6.9

442063e

Merge pull request panbanda#135 from panbanda/renovate/toml-1.x

a08c1bf

fix(deps): update rust crate toml to v1

Merge pull request panbanda#140 from panbanda/renovate/taiki-e-instal…

ea2fb41

…l-action-digest chore(deps): update taiki-e/install-action digest to cca35ed

Merge pull request panbanda#146 from panbanda/renovate/tokio-1.x-lock…

0ae70ec

…file chore(deps): update rust crate tokio to v1.52.2

Merge pull request panbanda#149 from panbanda/renovate/tower-http-0.x…

4b4c0be

…-lockfile chore(deps): update rust crate tower-http to v0.6.9

feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback (p…

229c111

…anbanda#143) Adds AnyCache::trim_by to roll back KV layers for speculative decode while leaving hybrid Arrays state untouched.\n\nCI: https://github.com/panbanda/higgs/actions/runs/25312580791

panbanda force-pushed the dusterbloom/qwen3-mixed-bit-gdn branch 2 times, most recently from 358ac62 to 45ece14 Compare May 6, 2026 12:30

fix(qwen3_next): preserve explicit GDN projection config

dd9cf7a

panbanda force-pushed the dusterbloom/qwen3-mixed-bit-gdn branch from 45ece14 to dd9cf7a Compare May 6, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dusterbloom: feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback#19

dusterbloom: feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback#19
dusterbloom wants to merge 11 commits intomainfrom
dusterbloom/qwen3-mixed-bit-gdn

dusterbloom commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dusterbloom commented May 5, 2026

Summary

Behaviour

What's added

Adaptations from feat/magic-canvas

Hygiene

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants