Skip to content

dusterbloom: feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback#19

Open
dusterbloom wants to merge 11 commits intomainfrom
dusterbloom/qwen3-mixed-bit-gdn
Open

dusterbloom: feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback#19
dusterbloom wants to merge 11 commits intomainfrom
dusterbloom/qwen3-mixed-bit-gdn

Conversation

@dusterbloom
Copy link
Copy Markdown
Owner

Summary

Adds a load-time fallback for Qwen3.5 models with mixed-bit GDN BA projections (some layers q4, some q8 — common in unsloth dynamic-quant variants). The default fused loader concatenates in_proj_a + in_proj_b into a single matmul; mixed-bit weights have incompatible shapes and fusion fails. This PR detects the specific failure and retries with separate (4-dispatch) GDN projections.

This is PR-1c of the magic-canvas split — a deferred follow-up from PR-1's audit (commit 061e500c).

Behaviour

┌─ load_qwen3_5_model / load_qwen3_5_moe_model ─────┐
│                                                   │
│  args.use_separate_gdn_projections                │
│  || HIGGS_SEPARATE_GDN_PROJ env                   │
│      ↓ yes                                         │
│  ┌── direct (separate, 4 dispatches) ──┐         │
│      ↓ no                                          │
│  ┌── try fused (2 dispatches) ──────────┐         │
│      ↓ ShapeMismatch matching "in_proj_ba" +     │
│        "requires separate GDN projections"        │
│      ↓ retry with use_separate=true              │
│  ┌── direct (separate, mixed-bit fallback) ──┐   │
└───────────────────────────────────────────────────┘

What's added

  • is_mixed_bit_gdn_ba_fusion_error(err: &ModelError) -> bool — pattern-matches the diagnostic shape-mismatch error
  • qwen3_5_quantization_config — parses {group_size, bits} from the per-layer quantization map
  • qwen3_5_mixed_ba_quantization_layers — scans for layers where in_proj_a and in_proj_b differ in bits or group_size
  • can_concatenate_axis0 — guard used inside load_qwen3_5_moe_weights_fused to emit the diagnostic error rather than panic on the concat
  • load_qwen3_5_model_with_gdn_fallback — private helper called by both dense and MoE load paths, unifies the fallback

Adaptations from feat/magic-canvas

  • The dense load_qwen3_5_model previously only honoured the env var; now it honours args.use_separate_gdn_projections too, matching the MoE path. Strict improvement.
  • Direct cherry-pick of 061e500c had 5 conflict regions; this is a manual surgical port that preserves origin/main's structure.

Hygiene

  • No unwrap() on Result/Option
  • No as casts in shape-arithmetic paths (use i32::try_from)
  • Match arms enumerate variants
  • No file-level blanket allows added

Test plan

  • cargo check -p higgs-models — clean
  • cargo clippy --all-targets --all-features -- -D warnings — clean (rustc 1.95.0, matches CI)
  • cargo fmt --check — clean
  • cargo test -p higgs-models --lib — 333/333 pass (3 new)
  • End-to-end: load an unsloth dynamic-quant Qwen3.5 (e.g. Brooooooklyn/Qwen3.5-27B-unsloth-mlx) and confirm the warn-then-retry path triggers, model loads successfully, generation produces coherent tokens.

🤖 Generated with Claude Code

renovate Bot and others added 10 commits April 29, 2026 03:05
fix(deps): update rust crate toml to v1
…l-action-digest

chore(deps): update taiki-e/install-action digest to cca35ed
…file

chore(deps): update rust crate tokio to v1.52.2
…-lockfile

chore(deps): update rust crate tower-http to v0.6.9
…anbanda#143)

Adds AnyCache::trim_by to roll back KV layers for speculative decode while leaving hybrid Arrays state untouched.\n\nCI: https://github.com/panbanda/higgs/actions/runs/25312580791
Adds a fallback path for loading Qwen3.5 models with mixed-bit GDN
projection weights (some layers q4, some q8 — common in unsloth's
dynamic-quant variants). The default fused-projection loader fuses
`in_proj_a` + `in_proj_b` into a single matmul; mixed-bit weights
have incompatible shapes and the fusion fails.

Behaviour:

  1. Detect via `is_mixed_bit_gdn_ba_fusion_error` — matches a
     `ModelError::ShapeMismatch` whose message contains both
     `in_proj_ba` and `requires separate GDN projections`.

  2. On detection, retry the load with
     `args.use_separate_gdn_projections = true`, taking the
     `load_qwen3_5_moe_weights_direct` path. Forward dispatches go from
     2 to 4 GDN ops per layer — slightly slower but correct.

  3. Forced separate (via `args.use_separate_gdn_projections` config or
     `HIGGS_SEPARATE_GDN_PROJ` env var) skips the fused attempt
     entirely.

Also adds:

  * `qwen3_5_quantization_config` — parses `{group_size, bits}` from
    the per-layer `quantization` map in `config.json`.
  * `qwen3_5_mixed_ba_quantization_layers` — scans for the layers
    where `in_proj_a` and `in_proj_b` differ in bits or group_size.
  * `can_concatenate_axis0` — guard used inside
    `load_qwen3_5_moe_weights_fused` to emit the diagnostic
    `ShapeMismatch` error rather than panicking on the concat.
  * `load_qwen3_5_model_with_gdn_fallback` — private helper called by
    both `load_qwen3_5_model` (dense) and `load_qwen3_5_moe_model`
    (MoE), unifying the fallback path.

Adaptations from feat/magic-canvas → origin/main:

  * The dense `load_qwen3_5_model` previously only honoured the env
    var; now it honours `args.use_separate_gdn_projections` too,
    matching the MoE path. Strict improvement: the config flag is set
    only by the env var or by mixed-bit detection.

  * No `unwrap()`, no `as` casts (use `i32::try_from`); match arms
    enumerate variants. No file-level allows added.

Verification on origin/main (rustc 1.95.0):

  * `cargo check -p higgs-models` — clean
  * `cargo clippy --all-targets --all-features -- -D warnings` — clean
  * `cargo fmt --check` — clean
  * `cargo test -p higgs-models --lib` — 333/333 pass (3 new)

Source: feat/magic-canvas commit `061e500c`. Direct cherry-pick had 5
conflict regions because origin/main has evolved the load functions
independently; this is a manual surgical port that preserves
origin/main's structure while adding the fallback behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@panbanda panbanda force-pushed the dusterbloom/qwen3-mixed-bit-gdn branch 2 times, most recently from 358ac62 to 45ece14 Compare May 6, 2026 12:30
@panbanda panbanda force-pushed the dusterbloom/qwen3-mixed-bit-gdn branch from 45ece14 to dd9cf7a Compare May 6, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants