feat(steering): dynamic steering — activation-conditioned steering#180
feat(steering): dynamic steering — activation-conditioned steering#180RhizoNymph wants to merge 47 commits into
Conversation
…d decode steering)
…tier, packed banks)
…n with per-request actuation
…d activation_reward_producer Co-authored-by: Claude
…ed FP nondeterminism
…th operator decode steering
…l term), replace populate-folding
…cuit (Phase 2 M2)
…r-gain + in-graph probe)
…tive short-circuit
… gate_rows; row gating M2)
…; clarify async finalize timing
End-to-end verification summaryEverything on this branch has now been validated end to end on GPU (RTX 3090, MethodologySteering is verified via logprobs / Component / kernel (CPU + standalone GPU)
Engine-level e2e (GPU)
Parallelism
Three previously-open consumer-loop gaps — now closed (GPU)
Behavioral finding from gap 3: Pre-existing bug found + fixed during validationA decode-only static per-request steering request ( Out of scope (not covered here)
|
…no-sync on_step can't crash the engine
…utating, same-hook) to avoid cudagraph FULL-graph downgrade
…-registers externally-plugged ops e.g. gguf, hitting torch's hard duplicate-registration error)
Draft. Ties activation capture to activation steering so activations decide when/how to steer. Three controller tiers (async → sync → in-graph), each configuring the one below. Design authority:
docs/design/dynamic_steering.md(+dynamic_steering_apc_notification.md,dynamic_steering_row_gating.md).What's here
steering_action_queue(bounded, decode-tier-only validation), drained at the top of_update_steering_buffers.execution="sync"consumer axis (every TP rank, 1-step latency); dynamic-override row pool (pure routing); observability +GET /v1/steering/dynamic; event-basedon_steptiming.sigmoid(sharpness·(residual·probe − threshold))and modulates the §5.4 tier same-forward; and per-request rows (decode-only, prefill protected via a decode mask) whengate_rowsis set.emit_mode = scale | monitor.Status / validation
GPU-validated on gemma4-31B: tp=1 (per-request actuation, tier, APC reuse), tp=2 cross-node (rank-replication + APC re-keying), pp=2, active in-graph monitor (tier + row gating), and row-gating kernel/op/cudagraph parity. Extensive CPU suites.
Notes for review
model_runner_v2integration (upstream dev-flag-gated).