Skip to content

feat(tune): surface-B GDN LoRA weight gradients + train_grad_full fields#202

Draft
ohdearquant wants to merge 3 commits into
mainfrom
pr/eng-6-lora-trainer
Draft

feat(tune): surface-B GDN LoRA weight gradients + train_grad_full fields#202
ohdearquant wants to merge 3 commits into
mainfrom
pr/eng-6-lora-trainer

Conversation

@ohdearquant

@ohdearquant ohdearquant commented Jun 22, 2026

Copy link
Copy Markdown
Owner

What

Surface-B: LoRA weight gradients through GDN (gated delta-net) layers, plus the train_grad_full field fixes and a forward-parity test.

Why

Extends the micro-LoRA backward pass to compute weight gradients for GDN-layer LoRA adapters (previously only attention/FFN surfaces had them). The Option-gated None path is byte-identical to before — when GDN LoRA grads are not requested, behavior is unchanged.

Files

  • crates/inference/src/attention/gdn_backward.rs (+978/-29, cfg-gated under train-backward)
  • crates/tune/src/bin/train_grad_full.rs (+592/-132, includes the E0063 GDN-fields fix)
  • crates/tune/src/bin/train_grad_layer23.rs
  • crates/inference/src/backward/tape.rs, crates/inference/src/backward/ops.rs
  • crates/inference/examples/diff_gdn_layer.rs (required-features = train-backward,f16)
  • crates/inference/tests/lora_forward_parity_test.rs (+917, new; compiles under default features via internal cfg guards, CI-safe)

Verification

cargo build --release -p lattice-tune --bin train_grad_full --features train-backward clean. The train_grad_full E0063 (missing GDN gradient fields) was the second of the two regressions that motivated the PR-G gate. Built green in the integrated-tree gate.

⚠️ Gradcheck PENDING — do not merge yet

The numerical gradient-check for the GDN weight-grad surface has not been run (tracked: khive task fe42740c). The forward path and compile are verified; the backward correctness is unverified. This PR is up for review structure and to keep the code on a branch, but the gradcheck must pass before merge. Marking ready-for-review on the non-gradient parts; holding the GDN-grad merge on that run.

Bench

Backward/training code is train-backward-gated, off the default decode path. make bench-compare's comparator errored assembling the delta (known two-worktree fragility; base benches ran clean) — bench-neutral by construction.

Series

Part of the PR #193 engine-slice (finest split). All engine code lands on main; the macOS app surfaces a subset (Models + Chat) for v0.0.1.

…parity test

GDN-layer LoRA weight-gradient surface (gdn_backward) plus the full-model
trainer path (train_grad_full, train_grad_layer23) and a LoRA forward parity
test. Option-gated: the None path is byte-identical to prior behavior.

UNVERIFIED: single/multi-head gradcheck is written but not yet executed on a
real model. The surface-B weight grads must pass gradcheck before this merges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

E2E Parity Report

PASS: all 3 prompts match within first 3 tokens

Prompt Agreement First Diff HF tok/s Lattice tok/s Verdict
The capital of France is 3/15 pos 3 0.4 2.0 PASS
In the year 2024, artificial intelligence 10/15 pos 9 0.4 1.9 PASS
`def fibonacci(n):
if n <= 1:
    return n
return` | 15/15 | none | 0.3 | 1.7 | PASS |

The capital of France is

  • HF: Paris.
    The capital of France is Paris.
    The capital of France
  • Lattice: Paris.
    A: Yes, the capital of France is Paris.

In the year 2024, artificial intelligence

  • HF: (AI) has become a significant part of the global economy. It is
  • Lattice: (AI) has become a significant part of our daily lives. From personal

def fibonacci(n): if n <= 1: return n return

  • HF: fibonacci(n-1) + fibonacci(n-2)

print(fib

  • Lattice: fibonacci(n-1) + fibonacci(n-2)

print(fib

ohdearquant and others added 2 commits June 22, 2026 14:59
- gdn_forward_save: unconditionally reset saved LoRA state (rank, scale,
  h_* caches, weight matrices) before the bound-only populate, so a reused
  GdnSaved from a prior LoRA call cannot leak stale state into a no-LoRA
  forward (gdn_backward gates its LoRA-grad path on saved.lora_rank)
- train_grad_full --save: branch on slot_kinds; GDN slots now save their
  five real modules (in_proj_qkv/z/b/a, out_proj) with loader-matched names
  and shapes instead of empty q_proj/v_proj; target_modules reflects the
  modules actually present (GQA / GDN / mixed)
- clippy under --features train-backward (not linted by default CI): factor
  the 10-slice lora_bound tuple into a LoraBound type alias; drop the unused
  GdnGrads import and the dead LoraParams::zeros alias

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
train_grad_full --save now emits all five GDN LoRA modules, but
validate_against only recognized in_proj_qkv/in_proj_z/out_proj, so a
saved full-GDN adapter failed validation before it could load. The
forward pass already applies in_proj_b/in_proj_a LoRA (gdn_fused.rs)
with d_out = linear_num_key_heads, and safetensors parsing + set_lora
accept them; the only gap was these two match arms. Add them so the
validation contract matches the forward and the trainer's saved output.

Adds a regression test covering all five GDN LoRA modules with
config-derived dims, so the contract can't silently drift again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant