fix(tune): skip empty-buffer LoRA layers in save_peft_safetensors (GDN slots)#279
Merged
Merged
Conversation
…N slots) GDN-attention slots leave q_proj/v_proj LoRA factors empty when only the GQA layers are trained. save_peft_safetensors pushed those empty byte buffers with non-zero shapes [rank, d_in] / [d_out, rank], so TensorView construction failed with InvalidTensorView(F32, [rank, d_in], 0) and the whole save aborted. Skip any layer whose A or B factor buffer is empty rather than emit a zero-byte tensor with a non-zero shape. The real (trained) layers round-trip intact; the untrained GDN slots are dropped from the saved adapter. Sliced from the feat/lora-backward-training branch (#193) onto current main, so it preserves the #212/#217 shape/length validation and the #261 alpha-from-header behavior that the branch predates (the branch's own safetensors.rs would regress both). Regression test test_save_skips_empty_buffer_layers reproduces the failure with the guard disabled (InvalidTensorView) and passes with it restored. All 25 safetensors tests green under --features safetensors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
save_peft_safetensorsaborts withInvalidTensorView(F32, [rank, d_in], 0)when an adapter contains a LoRA layer with empty A/B factor buffers. This happens for GDN-attention slots when only the GQA layers are trained: those slots leaveq_proj/v_projfactors empty, but the save loop still pushed a zero-byte byte buffer paired with the non-zero shape[rank, d_in]/[d_out, rank], whichTensorView::newrejects. The whole save then fails — so a real LoRA adapter trained on a GDN model (e.g. Qwen3.5-0.8B, 18 GDN + 6 GQA layers) cannot be persisted at all.Fix
Skip any layer whose A or B factor buffer is empty rather than emit a zero-byte tensor with a non-zero shape. The trained layers round-trip intact; the untrained GDN slots are dropped from the saved adapter.
Provenance & why a fresh PR
This is the
safetensors.rslibrary slice of the GDN-save fix originally developed onfeat/lora-backward-training(#193). That branch predates #191 / #212/#217 / #261 and has substantially diverged from main (60 commits behind, 20 add/add conflicts), so it can't be cleanly rebased without a scope decision. Applying just this guard onto current main preserves the fixes the branch would otherwise regress:load_peft_safetensorsshape/length validation (branch removed it)alpha = rank)Verification
test_save_skips_empty_buffer_layersbuilds an adapter with one real GQA layer + one empty GDN-slot layer. With the guard disabled the test fails with exactlyInvalidTensorView(F32, [8, 1024], 0); with the guard restored it passes (real layer present + bit-exact, empty layer absent).cargo test -p lattice-tune --lib --features safetensors— 25/25 green, including the preservedtest_alpha_metadata_round_trips,test_rejects_shape_product_overflow,test_transpose_rejects_shape_element_mismatch.cargo clippy -p lattice-tune --features safetensors --lib— clean (-D warnings).🤖 Generated with Claude Code