Skip to content

fix(tune): skip empty-buffer LoRA layers in save_peft_safetensors (GDN slots)#279

Merged
ohdearquant merged 1 commit into
mainfrom
fix/lora-save-skip-empty-gdn-buffers
Jun 24, 2026
Merged

fix(tune): skip empty-buffer LoRA layers in save_peft_safetensors (GDN slots)#279
ohdearquant merged 1 commit into
mainfrom
fix/lora-save-skip-empty-gdn-buffers

Conversation

@ohdearquant

Copy link
Copy Markdown
Owner

Bug

save_peft_safetensors aborts with InvalidTensorView(F32, [rank, d_in], 0) when an adapter contains a LoRA layer with empty A/B factor buffers. This happens for GDN-attention slots when only the GQA layers are trained: those slots leave q_proj/v_proj factors empty, but the save loop still pushed a zero-byte byte buffer paired with the non-zero shape [rank, d_in] / [d_out, rank], which TensorView::new rejects. The whole save then fails — so a real LoRA adapter trained on a GDN model (e.g. Qwen3.5-0.8B, 18 GDN + 6 GQA layers) cannot be persisted at all.

Fix

Skip any layer whose A or B factor buffer is empty rather than emit a zero-byte tensor with a non-zero shape. The trained layers round-trip intact; the untrained GDN slots are dropped from the saved adapter.

for ((layer_idx, module), layer) in &adapter.layers {
    // A LoRA layer with an empty factor buffer means "this module was not trained"
    // (e.g. GDN-attention slots leave q_proj/v_proj empty). Skip it rather than
    // emit an InvalidTensorView for a zero-byte tensor with non-zero shape.
    if layer.a.is_empty() || layer.b.is_empty() {
        continue;
    }
    ...
}

Provenance & why a fresh PR

This is the safetensors.rs library slice of the GDN-save fix originally developed on feat/lora-backward-training (#193). That branch predates #191 / #212/#217 / #261 and has substantially diverged from main (60 commits behind, 20 add/add conflicts), so it can't be cleanly rebased without a scope decision. Applying just this guard onto current main preserves the fixes the branch would otherwise regress:

Verification

  • Regression proof (TDD): new test_save_skips_empty_buffer_layers builds an adapter with one real GQA layer + one empty GDN-slot layer. With the guard disabled the test fails with exactly InvalidTensorView(F32, [8, 1024], 0); with the guard restored it passes (real layer present + bit-exact, empty layer absent).
  • cargo test -p lattice-tune --lib --features safetensors25/25 green, including the preserved test_alpha_metadata_round_trips, test_rejects_shape_product_overflow, test_transpose_rejects_shape_element_mismatch.
  • cargo clippy -p lattice-tune --features safetensors --lib — clean (-D warnings).
  • Diff is purely additive: +82 lines, one file.

🤖 Generated with Claude Code

…N slots)

GDN-attention slots leave q_proj/v_proj LoRA factors empty when only the GQA
layers are trained. save_peft_safetensors pushed those empty byte buffers with
non-zero shapes [rank, d_in] / [d_out, rank], so TensorView construction failed
with InvalidTensorView(F32, [rank, d_in], 0) and the whole save aborted.

Skip any layer whose A or B factor buffer is empty rather than emit a zero-byte
tensor with a non-zero shape. The real (trained) layers round-trip intact; the
untrained GDN slots are dropped from the saved adapter.

Sliced from the feat/lora-backward-training branch (#193) onto current main, so
it preserves the #212/#217 shape/length validation and the #261 alpha-from-header
behavior that the branch predates (the branch's own safetensors.rs would regress
both).

Regression test test_save_skips_empty_buffer_layers reproduces the failure with
the guard disabled (InvalidTensorView) and passes with it restored. All 25
safetensors tests green under --features safetensors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit ba6c949 into main Jun 24, 2026
11 checks passed
@ohdearquant ohdearquant deleted the fix/lora-save-skip-empty-gdn-buffers branch June 24, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant