Skip to content

feat(capture): DeepSeek-V4 mHC activation capture targets#182

Merged
RhizoNymph merged 1 commit into
feat/capture-consumersfrom
feat/capture-consumers-mhc-targets
Jun 18, 2026
Merged

feat(capture): DeepSeek-V4 mHC activation capture targets#182
RhizoNymph merged 1 commit into
feat/capture-consumersfrom
feat/capture-consumers-mhc-targets

Conversation

@RhizoNymph

Copy link
Copy Markdown
Owner

Adds DeepSeek-V4 manifold-hyperconnection (mHC) activations as capture targets on feat/capture-consumers. Same capture-side mHC work as #181 (which targets feat/integration), ported here so the standalone capture-consumers branch supports capturing mHC hook points.

Hook points

hook fires at shape dtype
mhc_streams_pre_attn multi-stream residual entering the layer (n, hc_mult, hidden) bf16
mhc_streams_pre_mlp multi-stream residual before the FFN fold (n, hc_mult, hidden) bf16
mhc_streams_final final pre-hc_head streams at the model tail (n, hc_mult, hidden) bf16
mhc_attn_post_mix / mhc_ffn_post_mix attn/FFN post-mix gates (n, hc_mult) fp32
mhc_attn_res_mix / mhc_ffn_res_mix attn/FFN Sinkhorn stream-mixing matrices (n, hc_mult, hc_mult) fp32

Plus reuse of the standard pre_attn / post_attn / mlp_in / mlp_out hooks for V4's single-stream (n, hidden) bf16 sublayer in/out. mhc_streams_final is a model-level hook (selector normalized to the tail layer).

Changes

  • Hook vocabulary (HookName + mirrored id table) extended with the 7 mhc_* names.
  • Per-hook HookSchema(width, dtype, logical_shape) + build_hook_schema(hidden, dtype, hc_mult), replacing the single hidden_size/model_dtype assumption; carried on CaptureContext.hook_schema and built in build_capture_context.
  • Manager sizes global buffers + per-key scratch dtype from the schema (fp32 coefficients not downcast); chunk metadata carries row_shape + per-row positions.
  • Per-model (schema-driven) hook validation; model-level hooks normalize their layer selector.
  • Filesystem consumer/reader: row_shape, per-entry dtype (mixed-dtype packed files), per-row positions + latest_per_position().
  • V4 decoder-layer + model-tail taps (nvidia + amd, fused-CUDA + native), gated to constant-fold out when capture is disabled.

Backward compatible: additive sidecar fields, unchanged request/CLI surface and .bin payload, standard-model validation unchanged.

Validation

Full capture suite green on this branch (335 passed, 1 skipped) — mHC tests plus the existing prefix-cache/admission tests. The same work was GPU-validated end-to-end on DeepSeek-V4-Flash-NVFP4 / 4× B300 (see #177/#181).

@RhizoNymph RhizoNymph merged commit 2c5aa94 into feat/capture-consumers Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant