feat(capture+steering): DeepSeek-V4 mHC capture and steering targets by RhizoNymph · Pull Request #181 · RhizoNymph/vllm

RhizoNymph · 2026-06-17T23:11:03Z

Adds DeepSeek-V4 manifold-hyperconnection (mHC) activations as both capture targets and steering targets. The capture side landed via #177; the steering side via #183 (both now merged into feat/mhc-capture), so this PR integrates both into feat/integration (alongside the prefix-cache capture work already there).

mHC threads hc_mult parallel residual streams through each layer and mixes them with manifold-constrained (Sinkhorn) coefficients. The same hook points are exposed for capture (read) and steering (write); the string names are shared so one identifier captures and steers the same tensor.

Capture: hook points

New mhc_* hooks (DeepSeek-V4 only; rejected on other models):

hook	fires at	shape	dtype
`mhc_streams_pre_attn`	multi-stream residual entering the layer	`(n, hc_mult, hidden)`	bf16
`mhc_streams_pre_mlp`	multi-stream residual before the FFN fold	`(n, hc_mult, hidden)`	bf16
`mhc_streams_final`	final pre-`hc_head` streams at the model tail (last PP rank)	`(n, hc_mult, hidden)`	bf16
`mhc_attn_post_mix`	attention sublayer post-mix gates	`(n, hc_mult)`	fp32
`mhc_ffn_post_mix`	FFN sublayer post-mix gates	`(n, hc_mult)`	fp32
`mhc_attn_res_mix`	attention Sinkhorn stream-mixing matrix	`(n, hc_mult, hc_mult)`	fp32
`mhc_ffn_res_mix`	FFN Sinkhorn stream-mixing matrix	`(n, hc_mult, hc_mult)`	fp32

Reused standard hooks for V4's single-stream (n, hidden) bf16 tensors: pre_attn (pre-mixed attn input), post_attn (attn output), mlp_in (pre-mixed FFN input), mlp_out (FFN output). V4 has no single-stream post_mlp — the end-of-layer residual is multi-stream, captured via mhc_streams_*.

mhc_streams_final is a model-level hook (fires once at the tail, not per layer); its layer selector is ignored and normalized to the last layer, so callers just write {"mhc_streams_final": "all"}.

Capture: changes required

The framework previously assumed every captured row was (hidden_size,) in the model dtype. mHC breaks both (wider streams, fp32 coefficients), so:

Hook vocabulary — 7 mhc_* names added to HookName and the mirrored _HOOK_NAME_TO_ID table (new ids appended; existing ids unchanged so compiled graphs stay valid).
Per-hook schema — new HookSchema(width, dtype, logical_shape) + build_hook_schema(hidden, dtype, hc_mult), replacing the single hidden_size/model_dtype assumption. Sourced from hf_config.hc_mult; non-mHC models get only the standard wired hooks. Carried on CaptureContext.hook_schema and built centrally in build_capture_context (admission) + the model runner.
Manager — global persistent buffers and per-key scratch dtype are sized from the schema (so fp32 coefficients are not downcast to bf16, and wide stream buffers allocate at hc_mult*hidden). Chunk metadata now carries row_shape and per-row positions.
Validation — hook validity is now per-model (driven by the schema) instead of a global allow-list, so mhc_* / mlp_in / mlp_out are accepted only where the model actually taps them. Model-level hooks normalize their layer selector to the tail.
Filesystem consumer + reader — sidecars gain row_shape (reshape flat (n, width) back to e.g. (n, hc_mult, hidden)), per-entry dtype (so one packed file can mix bf16 streams + fp32 coefficients), and per-row positions + a latest_per_position() reader helper (for speculative-decode dedup). All fields are additive/optional — existing captures round-trip unchanged.
Model taps — maybe_capture_residual taps added to the V4 decoder layer (nvidia + amd, fused-CUDA + native paths) and the model tail, plus layer_idx via extract_layer_index. Gated so they constant-fold out of the compiled graph when capture is disabled.

Steering at mHC hooks

Steering applies vectors at the same mHC hook points (the write/apply path), distinct from capturing them. The steering tables were hard-wired to one hidden_size width in the model dtype; this generalizes them to per-hook width — the steering analog of the capture HookSchema.

New steering hook points on SteeringHookPoint: mlp_in, mlp_out (single-stream) and mhc_streams_pre_attn / mhc_streams_pre_mlp / mhc_streams_final (multi-stream).
Model-selective, per-width tables — register_steering_buffers takes a hook_widths map and defaults to exactly the three standard single-stream hooks at hidden_size (every existing model untouched). DeepSeek-V4 registers single-stream hooks at hidden and multi-stream residual hooks at hc_mult*hidden; mhc_streams_final only on the last layer.
Multi-stream apply — apply_layer_steering_streams flattens (n, hc_mult, hidden) → 2-D, runs the existing apply_steering gather/add (the kernel's hidden dim is a runtime arg), and reshapes back. The per-token row index, row layout, and any_active short-circuit are all width-agnostic.
Manager — populate_steering_tables groups active tables by (width, dtype) so mixed widths coexist on one layer (a single group in the non-mHC case); the zero sentinel row is cached per width. The kernel warmup covers each distinct width.
V4 wiring — the decoder layer routes the steerable tensors (multi-stream residual + single-stream sublayer inputs) through the steering helpers in both forward paths via _steer_and_capture_mhc; the fp32 mixing coefficients stay capture-only (routing weights, no steering semantics).
Wire format unchanged — a multi-stream steering vector is just the per-stream vectors concatenated to hc_mult*hidden (per-stream granularity); the packed blob and SamplingParams list-of-floats need no change. Coefficient (Sinkhorn-routing) steering is intentionally out of scope.

Backward compatibility: purely additive. Existing capture requests, CLI flags, and .bin payloads are unchanged; standard-model capture and steering hook validation are identical to before (steering still registers exactly the three standard hooks by default); the new sidecar fields are optional and read gracefully whether present or absent.

Validation

CPU: full capture and steering suites green on the merged tree (mHC tests + the prefix-cache/admission tests).
GPU: validated end-to-end on DeepSeek-V4-Flash-NVFP4 / 4× B300 (CUDA 13.1), both eager and CUDA-graph:
- Capture — every hook correct shape/dtype/positions on eager; the global persistent-buffer path bitwise-matches eager under CUDA graphs (details in the feat(capture): support DeepSeek-V4 mHC activations as capture targets #177 validation comment).
- Steering — a zero vector is a bit-identical no-op, while magnitude vectors at mhc_streams_pre_attn and a single-stream-targeted vector at mhc_streams_pre_mlp change the output (per-stream granularity confirmed); the per-width kernel warmup runs for both table widths, and CUDA graphs capture cleanly with steering enabled.

…edup

feat(capture): support DeepSeek-V4 mHC activations as capture targets

…capture # Conflicts: # vllm/entrypoints/openai/chat_completion/serving.py # vllm/entrypoints/openai/completion/serving.py # vllm/v1/capture/__init__.py # vllm/v1/capture/manager.py

…epSeek-V4

feat(steering): mHC multi-stream + sublayer steering for DeepSeek-V4

RhizoNymph and others added 6 commits June 13, 2026 01:14

feat(capture): support DeepSeek-V4 mHC activations as capture targets

316232c

feat(capture): persist per-row token positions for spec-decode dedup

2ac8d8a

feat(capture): normalize mhc_streams_final layer selector to model tail

1705a73

docs(capture): document mHC hooks, per-hook schema, and spec-decode d…

7111fbc

…edup

Merge pull request #177 from RhizoNymph/feat/capture-mhc-targets

d61fd0a

feat(capture): support DeepSeek-V4 mHC activations as capture targets

Merge remote-tracking branch 'origin/feat/integration' into feat/mhc-…

d2e5c0c

…capture # Conflicts: # vllm/entrypoints/openai/chat_completion/serving.py # vllm/entrypoints/openai/completion/serving.py # vllm/v1/capture/__init__.py # vllm/v1/capture/manager.py

RhizoNymph mentioned this pull request Jun 18, 2026

feat(capture): DeepSeek-V4 mHC activation capture targets #182

Merged

RhizoNymph and others added 3 commits June 17, 2026 18:16

feat(steering): support mHC multi-stream and sublayer steering for De…

6d8ff77

…epSeek-V4

test(steering): make mHC streams tests robust to CUDA-only op dispatch

c058a1d

Merge pull request #183 from RhizoNymph/feat/mhc-steering

ad282af

feat(steering): mHC multi-stream + sublayer steering for DeepSeek-V4

RhizoNymph changed the title ~~feat(capture): DeepSeek-V4 mHC activation capture targets~~ feat(capture+steering): DeepSeek-V4 mHC capture and steering targets Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(capture+steering): DeepSeek-V4 mHC capture and steering targets#181

feat(capture+steering): DeepSeek-V4 mHC capture and steering targets#181
RhizoNymph wants to merge 9 commits into
feat/integrationfrom
feat/mhc-capture

RhizoNymph commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RhizoNymph commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Capture: hook points

Capture: changes required

Steering at mHC hooks

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RhizoNymph commented Jun 17, 2026 •

edited

Loading