Skip to content

capture: true block-output at the post_block hook#176

Draft
maxsloef-goodfire wants to merge 2 commits into
RhizoNymph:feat/integrationfrom
maxsloef-goodfire:post-block-semantics
Draft

capture: true block-output at the post_block hook#176
maxsloef-goodfire wants to merge 2 commits into
RhizoNymph:feat/integrationfrom
maxsloef-goodfire:post-block-semantics

Conversation

@maxsloef-goodfire

Copy link
Copy Markdown

PR 2 of 3 (stacked on #175 — view the top commit only: 45 files, +153/−44).

Introduces apply_block_steering() and converts the 43 deferred-residual model files to it. In those architectures (llama/qwen/deepseek-style), each branch-add is folded into the next layer's fused add+norm, so the old hook captured a residual that did not yet include the MLP output — byte-identical to what post_attn already captures (measured cos ~0.4–0.97 vs true hidden states). The new helper captures residual + hidden_states = the true block output (hs[L+1]), validated vs HF transformers at cos 0.99995 (layers 0–34) on qwen3-8b.

  • The sum is computed only when a capture manager is installed on the rank — a static property, so torch.compile constant-folds the branch and non-capture servers pay nothing.
  • Steering math is unchanged (delta still rides residual into the next fused add).
  • The 43 model-file changes are a mechanical one-line call swap + import each (script-verified: no other hunks). The 30 materialized-stream models (opt/gpt_neox/gemma4/…) already hooked the materialized stream and are untouched here.

The capture-path fixes and features (capture_wait, v2-runner guard, global filesystem capture, fsync docs) are stacked on top in #174.

🤖 Generated with Claude Code

…ange)

Token-level rename across code, tests, docs, and examples. The hook fires
after the mlp() call in program order, but in deferred-residual
architectures the captured/steered `residual` does not yet include the
MLP contribution (the add happens in the NEXT layer's fused add+norm) --
so 'post_mlp' misdescribes the dataflow. 'post_block' names the position
in the layer, not a dataflow claim.

No functional change: same tensors captured/steered as before. The
semantic correction (capturing the true block output, residual + hidden)
is stacked on top of this PR. Clients sending 'post_mlp' in capture specs
must switch to 'post_block' (pre-release branch; no alias kept).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Introduce apply_block_steering(): captures residual + hidden_states (the
true block output, what HF exposes as hidden_states[L+1]) instead of the
bare pre-MLP-add residual, which was byte-identical to post_attn's
capture in deferred-residual architectures. The sum is computed only when
a capture manager is installed on the rank (static for the process
lifetime, so torch.compile traces it as a constant branch); steering
still applies to `residual` -- identical propagation to the old
behavior.

The 43 deferred-residual model files are a mechanical one-line call-site
conversion + import each (verified: no other changes). The 30
materialized-stream models (opt/gpt_neox/gemma4/...) already hooked the
materialized stream and need no change.

Validated on qwen3-8b vs HF transformers: post_block[L] == hs[L+1] at
mean cos 0.99995 (layers 0-34; HF's final tuple entry is post-final-norm
so the last layer is not directly comparable).

Stacked on the mechanical post_mlp->post_block rename.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant