Skip to content

feat(capture+steering): DeepSeek-V4 mHC capture and steering targets#181

Open
RhizoNymph wants to merge 9 commits into
feat/integrationfrom
feat/mhc-capture
Open

feat(capture+steering): DeepSeek-V4 mHC capture and steering targets#181
RhizoNymph wants to merge 9 commits into
feat/integrationfrom
feat/mhc-capture

Conversation

@RhizoNymph

@RhizoNymph RhizoNymph commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Adds DeepSeek-V4 manifold-hyperconnection (mHC) activations as both capture targets and steering targets. The capture side landed via #177; the steering side via #183 (both now merged into feat/mhc-capture), so this PR integrates both into feat/integration (alongside the prefix-cache capture work already there).

mHC threads hc_mult parallel residual streams through each layer and mixes them with manifold-constrained (Sinkhorn) coefficients. The same hook points are exposed for capture (read) and steering (write); the string names are shared so one identifier captures and steers the same tensor.

Capture: hook points

New mhc_* hooks (DeepSeek-V4 only; rejected on other models):

hook fires at shape dtype
mhc_streams_pre_attn multi-stream residual entering the layer (n, hc_mult, hidden) bf16
mhc_streams_pre_mlp multi-stream residual before the FFN fold (n, hc_mult, hidden) bf16
mhc_streams_final final pre-hc_head streams at the model tail (last PP rank) (n, hc_mult, hidden) bf16
mhc_attn_post_mix attention sublayer post-mix gates (n, hc_mult) fp32
mhc_ffn_post_mix FFN sublayer post-mix gates (n, hc_mult) fp32
mhc_attn_res_mix attention Sinkhorn stream-mixing matrix (n, hc_mult, hc_mult) fp32
mhc_ffn_res_mix FFN Sinkhorn stream-mixing matrix (n, hc_mult, hc_mult) fp32

Reused standard hooks for V4's single-stream (n, hidden) bf16 tensors: pre_attn (pre-mixed attn input), post_attn (attn output), mlp_in (pre-mixed FFN input), mlp_out (FFN output). V4 has no single-stream post_mlp — the end-of-layer residual is multi-stream, captured via mhc_streams_*.

mhc_streams_final is a model-level hook (fires once at the tail, not per layer); its layer selector is ignored and normalized to the last layer, so callers just write {"mhc_streams_final": "all"}.

Capture: changes required

The framework previously assumed every captured row was (hidden_size,) in the model dtype. mHC breaks both (wider streams, fp32 coefficients), so:

  1. Hook vocabulary — 7 mhc_* names added to HookName and the mirrored _HOOK_NAME_TO_ID table (new ids appended; existing ids unchanged so compiled graphs stay valid).
  2. Per-hook schema — new HookSchema(width, dtype, logical_shape) + build_hook_schema(hidden, dtype, hc_mult), replacing the single hidden_size/model_dtype assumption. Sourced from hf_config.hc_mult; non-mHC models get only the standard wired hooks. Carried on CaptureContext.hook_schema and built centrally in build_capture_context (admission) + the model runner.
  3. Manager — global persistent buffers and per-key scratch dtype are sized from the schema (so fp32 coefficients are not downcast to bf16, and wide stream buffers allocate at hc_mult*hidden). Chunk metadata now carries row_shape and per-row positions.
  4. Validation — hook validity is now per-model (driven by the schema) instead of a global allow-list, so mhc_* / mlp_in / mlp_out are accepted only where the model actually taps them. Model-level hooks normalize their layer selector to the tail.
  5. Filesystem consumer + reader — sidecars gain row_shape (reshape flat (n, width) back to e.g. (n, hc_mult, hidden)), per-entry dtype (so one packed file can mix bf16 streams + fp32 coefficients), and per-row positions + a latest_per_position() reader helper (for speculative-decode dedup). All fields are additive/optional — existing captures round-trip unchanged.
  6. Model tapsmaybe_capture_residual taps added to the V4 decoder layer (nvidia + amd, fused-CUDA + native paths) and the model tail, plus layer_idx via extract_layer_index. Gated so they constant-fold out of the compiled graph when capture is disabled.

Steering at mHC hooks

Steering applies vectors at the same mHC hook points (the write/apply path), distinct from capturing them. The steering tables were hard-wired to one hidden_size width in the model dtype; this generalizes them to per-hook width — the steering analog of the capture HookSchema.

  1. New steering hook points on SteeringHookPoint: mlp_in, mlp_out (single-stream) and mhc_streams_pre_attn / mhc_streams_pre_mlp / mhc_streams_final (multi-stream).
  2. Model-selective, per-width tablesregister_steering_buffers takes a hook_widths map and defaults to exactly the three standard single-stream hooks at hidden_size (every existing model untouched). DeepSeek-V4 registers single-stream hooks at hidden and multi-stream residual hooks at hc_mult*hidden; mhc_streams_final only on the last layer.
  3. Multi-stream applyapply_layer_steering_streams flattens (n, hc_mult, hidden) → 2-D, runs the existing apply_steering gather/add (the kernel's hidden dim is a runtime arg), and reshapes back. The per-token row index, row layout, and any_active short-circuit are all width-agnostic.
  4. Managerpopulate_steering_tables groups active tables by (width, dtype) so mixed widths coexist on one layer (a single group in the non-mHC case); the zero sentinel row is cached per width. The kernel warmup covers each distinct width.
  5. V4 wiring — the decoder layer routes the steerable tensors (multi-stream residual + single-stream sublayer inputs) through the steering helpers in both forward paths via _steer_and_capture_mhc; the fp32 mixing coefficients stay capture-only (routing weights, no steering semantics).
  6. Wire format unchanged — a multi-stream steering vector is just the per-stream vectors concatenated to hc_mult*hidden (per-stream granularity); the packed blob and SamplingParams list-of-floats need no change. Coefficient (Sinkhorn-routing) steering is intentionally out of scope.

Backward compatibility: purely additive. Existing capture requests, CLI flags, and .bin payloads are unchanged; standard-model capture and steering hook validation are identical to before (steering still registers exactly the three standard hooks by default); the new sidecar fields are optional and read gracefully whether present or absent.

Validation

  • CPU: full capture and steering suites green on the merged tree (mHC tests + the prefix-cache/admission tests).
  • GPU: validated end-to-end on DeepSeek-V4-Flash-NVFP4 / 4× B300 (CUDA 13.1), both eager and CUDA-graph:
    • Capture — every hook correct shape/dtype/positions on eager; the global persistent-buffer path bitwise-matches eager under CUDA graphs (details in the feat(capture): support DeepSeek-V4 mHC activations as capture targets #177 validation comment).
    • Steering — a zero vector is a bit-identical no-op, while magnitude vectors at mhc_streams_pre_attn and a single-stream-targeted vector at mhc_streams_pre_mlp change the output (per-stream granularity confirmed); the per-width kernel warmup runs for both table widths, and CUDA graphs capture cleanly with steering enabled.

@RhizoNymph RhizoNymph changed the title feat(capture): DeepSeek-V4 mHC activation capture targets feat(capture+steering): DeepSeek-V4 mHC capture and steering targets Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant