Skip to content

feat(steering): dynamic steering — activation-conditioned steering#180

Draft
RhizoNymph wants to merge 47 commits into
feat/integrationfrom
feat/dynamic-steering
Draft

feat(steering): dynamic steering — activation-conditioned steering#180
RhizoNymph wants to merge 47 commits into
feat/integrationfrom
feat/dynamic-steering

Conversation

@RhizoNymph

@RhizoNymph RhizoNymph commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Draft. Ties activation capture to activation steering so activations decide when/how to steer. Three controller tiers (async → sync → in-graph), each configuring the one below. Design authority: docs/design/dynamic_steering.md (+ dynamic_steering_apc_notification.md, dynamic_steering_row_gating.md).

What's here

  • Phase 0 — async transport. In-process steering_action_queue (bounded, decode-tier-only validation), drained at the top of _update_steering_buffers.
  • Phase 1a — sync consumers + per-request actuation. execution="sync" consumer axis (every TP rank, 1-step latency); dynamic-override row pool (pure routing); observability + GET /v1/steering/dynamic; event-based on_step timing.
  • Phase 1b — gain primitives. Per-row strength scale (§5.3) + dedicated-gather dynamic additive tier (§5.4).
  • Phase 2 — in-graph monitor. Graph-safe monitor op computes a per-token gate sigmoid(sharpness·(residual·probe − threshold)) and modulates the §5.4 tier same-forward; and per-request rows (decode-only, prefill protected via a decode mask) when gate_rows is set.
  • APC correctness. Worker→scheduler effective-decode-steering-signature notification so decode KV produced under dynamic steering is not falsely reused — resolves the streaming-continuation prefix-cache hole.
  • Example controller emit_mode = scale | monitor.

Status / validation

GPU-validated on gemma4-31B: tp=1 (per-request actuation, tier, APC reuse), tp=2 cross-node (rank-replication + APC re-keying), pp=2, active in-graph monitor (tier + row gating), and row-gating kernel/op/cudagraph parity. Extensive CPU suites.

Notes for review

…d activation_reward_producer

Co-authored-by: Claude
@RhizoNymph

Copy link
Copy Markdown
Owner Author

End-to-end verification summary

Everything on this branch has now been validated end to end on GPU (RTX 3090, gemma-4-31B-it-Q4_K_S GGUF, hidden 5376, 60 layers — gemma4 is the only architecture carrying both capture taps and steering hooks). Below is the full picture: component-level checks, engine-level e2e, parallelism, and the three previously-open consumer-loop gaps, all now closed.

Methodology

Steering is verified via logprobs / num_cached_tokens / direct state inspection, never raw output-token equality — output-token comparison is ambiguous on high-confidence prompts, and capture records the pre-steering residual so it can't witness steering. The per-request e2e tests use a within-run target-vs-control technique (steer one of two identical concurrent requests; compare against the in-batch control) because two identical greedy prompts in one batch are not bitwise identical deep in generation (batched reductions are position-dependent, diverging from FP noise ~token 22). NOISE_FLOOR=10 separates real (early) steering from that floor. All GPU tests force VLLM_USE_FLASHINFER_SAMPLER=0 and VLLM_WORKER_MULTIPROC_METHOD=spawn.

Component / kernel (CPU + standalone GPU)

  • ~312 steering CPU tests pass; ruff clean.
  • Per-row scale kernel (out = hidden + table[r]·scale[r]): exact at 1.0 / 0.5 / 0.0; cudagraph capture.
  • Dynamic additive tier (§5.4, + dvec·token_scales[t]): per-token gate math, prefill-zero (decode-only), free gain changes, cudagraph capture.
  • In-graph monitor (steering_monitor): real Triton kernel matches fp32 eager across N=1..256 (bf16 max|Δ| ≤ 4e-5); a hand-built CUDA graph capturing monitor → apply_steering, replayed across steps with different gains, reproduces eager (rel ≤ 5e-3) — proving the in-place token_scales mutation is visible to the later steer op within the same graph, the per-step overwrite is the reset (no cross-step accumulation), and prefill stays tier-free.
  • Per-request row gating (steering_row_gate + steering_decode_mask, table[r]·scale[r]·row_gate[t]): kernel parity (bf16 ≤ 6e-3, fp32 ~1e-7), monitor gate_rows gates decode rows while prefill rows stay exactly 1.0 (cache safety), CUDA-graph replay, engine cudagraph boot with the 8-arg apply + 7-arg monitor ops.

Engine-level e2e (GPU)

  • One-step actuation latency + per-request targeting (test_dynamic_steering_e2e.py): an override emitted at step N changes the target's output starting at N+1, never token 0; the in-batch control is untouched. first_diff=2.
  • In-graph monitor, active path (engaged vs disengaged by threshold sign only): engaged diverges at token 1, disengaged == unsteered baseline — the monitor gate provably controls the tier through the full controller → SteeringMonitorUpdate → manager → kernel path.
  • APC steering-aware prefix caching (test_apc_steering_e2e.py): a continuation of a dynamically steered request reuses 0 of its prior decode KV (override-keyed blocks not falsely reused), while a continuation of an unsteered request reuses normally (64/75) — the worker→scheduler effective-decode-signature notification is correct.

Parallelism

  • tp=1: all of the above.
  • tp=2 cross-node (Ray, NCCL over bond0): rank-replication smoke (identical steering tables / coherent steered output across ranks) and APC re-keying (steered continuation cached=0, unsteered=64) — the rank-0-canonical notification holds under TP.
  • pp=2 cross-node: engine + per-request decode steering on both pipeline stages.

Three previously-open consumer-loop gaps — now closed (GPU)

  1. gate_rows end-to-end (test_steering_gating_e2e.py::test_row_gate_*): a consumer installs an override row + a saturated-threshold (±1e6) gate_rows=True monitor. Gate ON → target's per-request row applied (early divergence in [1,10]); gate OFF → row suppressed (target tracks the control past the noise floor). Confirms the monitor gates the per-request row term, not just the tier, through a real consumer loop.
  2. req_id-keyed scale end-to-end (::test_req_id_scale_*): override + SteeringScaleUpdate(req_id=, scale=0) emitted in the same step (override first so the runner resolves req_id → dyn_id in the in-order apply). scale=0 suppresses exactly the target's row; the unscaled override diverges early.
  3. Async transport end-to-end (test_async_steering_e2e.py, AsyncTierExample): a global-tier SteeringVectorUpdate submitted through the action queue from on_capture, exercising queue → drain → _apply_steering_actions.

Behavioral finding from gap 3: on_capture fires when a request finalizes (after its output is emitted), so an async update can never steer its own request — it steers a subsequent one. The first version of this test (single-request cross-load) correctly failed (base == steered), surfacing exactly this; the test was rewritten to repeat the prompt in one engine (gen[0] baseline, gen[1..] steered). The AsyncTierExample docstring + README were corrected accordingly (they had implied in-request "1–3 step latency"). For same-request exactly-one-step latency, use a sync on_step consumer.

Pre-existing bug found + fixed during validation

A decode-only static per-request steering request (prefill_hash==0, decode_hash!=0) was silently dropped at all parallelisms (predating this branch) — the nothing-active short-circuit returned before the prefill→decode transition that registers the decode config. Fixed via a batch_has_per_request_steering guard + regression test. Ported to the base branches (PRs #178feat/integration, #179feat/steering).

Out of scope (not covered here)

  • model_runner_v2 steering integration — upstream dev-flag-gated.
  • The base/prefill allow_cache_unsafe_phases escape hatch — deliberate, cache-unsafe, no example (caller owns invalidation).

@RhizoNymph RhizoNymph changed the title feat(steering): dynamic steering — activation-conditioned steering (Phases 0–2 + APC notification) feat(steering): dynamic steering — activation-conditioned steering Jun 18, 2026
…utating, same-hook) to avoid cudagraph FULL-graph downgrade
…-registers externally-plugged ops e.g. gguf, hitting torch's hard duplicate-registration error)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant