diff --git a/docs/design/capture_consumers.md b/docs/design/capture_consumers.md
index 463f0ecf6651..b8290ae09a34 100644
--- a/docs/design/capture_consumers.md
+++ b/docs/design/capture_consumers.md
@@ -117,7 +117,7 @@ VllmInternalRequestId = NewType("VllmInternalRequestId", str)
 CaptureKey = tuple[VllmInternalRequestId, int, str]
 # (request id, layer index, hook name)
 
-HookName = Literal["pre_attn", "post_attn", "post_mlp", "mlp_in", "mlp_out"]
+HookName = Literal["pre_attn", "post_attn", "post_block", "mlp_in", "mlp_out"]
 PositionSelector = (
     Literal["last_prompt", "all_prompt", "all_generated", "all"]
     | list[int]
@@ -678,7 +678,7 @@ Writer details (`writer.py`):
 - TP / PP / EP / DP are all accepted for the replicated residual hooks
   (no parallel-size rejection). See
   [Capture Consumers under Parallelism](capture_parallelism.md).
-- Every hook name is in `{pre_attn, post_attn, post_mlp, mlp_in,
+- Every hook name is in `{pre_attn, post_attn, post_block, mlp_in,
   mlp_out}`.
 - Every resolved layer is in `[0, num_hidden_layers)`, the **global**
   layer count (admission validates the full layer space; the runner then
diff --git a/docs/design/capture_parallelism.md b/docs/design/capture_parallelism.md
index 40a9d3ae8faa..4965e6a8129c 100644
--- a/docs/design/capture_parallelism.md
+++ b/docs/design/capture_parallelism.md
@@ -18,7 +18,7 @@ guide see [Capture Consumers](../features/capture_consumers.md).
 ## TL;DR
 
 - **The capturable hooks are replicated.** The three hooks that fire
-  today — `pre_attn`, `post_attn`, `post_mlp` — read the residual
+  today — `pre_attn`, `post_attn`, `post_block` — read the residual
   stream *after* the TP all-reduce and the MoE combine, so the tensor
   is full `hidden_size`, **byte-identical on every TP and every EP
   rank**. For these hooks, TP/EP support is a *rank gate*, not a
@@ -52,10 +52,10 @@ downstream of the reducing collectives:
   (`vllm/model_executor/layers/linear.py:1558-1559`), so attention- and
   MLP-output projections produce full `hidden_size` on every TP rank.
 - MoE paths all-gather/all-reduce before the residual add
-  (`vllm/model_executor/models/deepseek_v2.py:384`), so `post_mlp` on an
+  (`vllm/model_executor/models/deepseek_v2.py:384`), so `post_block` on an
   EP rank also sees the full residual.
 
-Hence `pre_attn` / `post_attn` / `post_mlp` are `[num_rows,
+Hence `pre_attn` / `post_attn` / `post_block` are `[num_rows,
 hidden_size]` and identical across the TP×EP plane of a PP stage.
 
 What is **genuinely sharded** (and not captured today):
@@ -166,7 +166,7 @@ declared `location` can select between them.
   filesystem consumer writes to a path keyed by **global** layer index
   + request_id on a **shared** mount.
 - The on-disk layout merges naturally: stage 0 writes
-  `…/req/12_post_mlp.bin`, stage 1 writes `…/req/40_post_mlp.bin`, no
+  `…/req/12_post_block.bin`, stage 1 writes `…/req/40_post_block.bin`, no
   collision. The `packed`/`sharded` layouts (one file per request / per
   tag) cannot merge by global layer index alone, so under PP each stage
   writes its **own** file keyed by stage rank
diff --git a/docs/features/capture_consumers.md b/docs/features/capture_consumers.md
index 4d1a2a2ac59f..85184c8dec31 100644
--- a/docs/features/capture_consumers.md
+++ b/docs/features/capture_consumers.md
@@ -290,7 +290,7 @@ llm = LLM(
     model="meta-llama/Llama-3-8B",
     capture_consumers=[
         {"name": "filesystem", "params": {"root": "/tmp/captures"}},
-        {"name": "logging", "params": {"hooks": {"post_mlp": [0]}}},
+        {"name": "logging", "params": {"hooks": {"post_block": [0]}}},
     ],
 )
 ```
@@ -323,7 +323,7 @@ sampling_params = SamplingParams(
         "filesystem": FilesystemCaptureRequest(
             request_id="probe_0001",
             tag="mnist-probe-v1",
-            hooks={"post_mlp": [12]},
+            hooks={"post_block": [12]},
             positions="last_prompt",
         ),
     },
@@ -355,7 +355,7 @@ response = httpx.post(
                 "filesystem": {
                     "request_id": "probe_train_0001",
                     "tag": "capital-probe",
-                    "hooks": {"post_mlp": [12, 16, 20, 24]},
+                    "hooks": {"post_block": [12, 16, 20, 24]},
                     "positions": "last_prompt",
                     "layout": "packed",
                 },
@@ -392,7 +392,7 @@ sampling_params = SamplingParams(
         "filesystem": FilesystemCaptureRequest(
             request_id="req1",
             tag="demo",
-            hooks={"post_mlp": [0]},
+            hooks={"post_block": [0]},
             positions="last_prompt",
         ),
     },
@@ -428,7 +428,7 @@ response body as `capture_results`, mirroring the structure above.
 ## Parallelism
 
 Capturing the residual-stream hooks (`pre_attn`, `post_attn`,
-`post_mlp`) is supported under **tensor, pipeline, expert, and data
+`post_block`) is supported under **tensor, pipeline, expert, and data
 parallelism** for worker-location consumers — including the built-in
 `filesystem` consumer. How it works:
 
diff --git a/docs/features/steering.md b/docs/features/steering.md
index 729ed36f4d87..ae4003f631d8 100644
--- a/docs/features/steering.md
+++ b/docs/features/steering.md
@@ -50,7 +50,7 @@ Also supported:
 - Global steering through HTTP endpoints
 - Per-request steering through `SamplingParams`
 - Three additive tiers (base / prefill-specific / decode-specific)
-- Three hook points: `pre_attn`, `post_attn`, `post_mlp`
+- Three hook points: `pre_attn`, `post_attn`, `post_block`
 - Phase-aware scheduler admission for per-request steering
 - Prefix-cache separation for different prefill steering configs
 - Continuous batching
@@ -109,7 +109,7 @@ activation that is discarded immediately afterward.
 | --- | --- |
 | `pre_attn` | Residual stream before attention |
 | `post_attn` | Residual stream after attention |
-| `post_mlp` | Residual stream after MLP |
+| `post_block` | Residual stream after MLP |
 
 For supported models, these hooks are wired directly into each decoder
 layer's forward path. Unused hook points are zero-valued no-ops.
@@ -157,7 +157,7 @@ curl -X POST http://localhost:8000/v1/steering/set \
   -H "Content-Type: application/json" \
   -d '{
     "vectors": {
-      "post_mlp": {
+      "post_block": {
         "15": {"vector": [0.1, 0.2], "scale": 2.0}
       }
     },
@@ -209,7 +209,7 @@ packed_hook = {
 
 requests.post(
     "http://localhost:8000/v1/steering/set",
-    json={"vectors": {"post_mlp": packed_hook}},
+    json={"vectors": {"post_block": packed_hook}},
 )
 ```
 
@@ -248,7 +248,7 @@ params = SamplingParams(
     max_tokens=64,
     temperature=0.0,
     steering_vectors={
-        "post_mlp": {
+        "post_block": {
             15: {"vector": [0.1, 0.2], "scale": 2.0},
         },
     },
@@ -288,7 +288,7 @@ vec = np.random.standard_normal(2560).astype(np.float16)
 stacked = np.stack([vec], axis=0)  # (num_layers, hidden_size)
 
 base = {
-    "post_mlp": {
+    "post_block": {
         "dtype": str(stacked.dtype),  # "float16" | "float32" | "float64"
         "shape": list(stacked.shape),
         "layer_indices": [15],
@@ -335,7 +335,7 @@ The JSON file uses the same three-tier format as the global steering API:
 ```json
 {
   "vectors": {
-    "post_mlp": {
+    "post_block": {
       "15": [0.1, 0.2, 0.3],
       "20": {"vector": [0.4, 0.5, 0.6], "scale": 2.0}
     }
@@ -357,7 +357,7 @@ startup cost:
 ```json
 {
   "vectors": {
-    "post_mlp": {
+    "post_block": {
       "dtype": "float32",
       "shape": [2, 2560],
       "layer_indices": [15, 20],
@@ -379,7 +379,7 @@ curl -X POST http://localhost:8000/v1/steering/modules/register \
   -d '{
     "name": "creativity",
     "vectors": {
-      "post_mlp": {"15": [0.1, 0.2, 0.3]}
+      "post_block": {"15": [0.1, 0.2, 0.3]}
     }
   }'
 
@@ -412,7 +412,7 @@ requests.post(
     json={
         "name": "creativity",
         "vectors": {
-            "post_mlp": {
+            "post_block": {
                 "dtype": str(stacked.dtype),
                 "shape": list(stacked.shape),
                 "layer_indices": [15],
@@ -454,7 +454,7 @@ response = client.chat.completions.create(
     extra_body={
         "steering_name": "creativity",
         "steering_vectors": {
-            "post_mlp": {15: [0.05, 0.1, 0.15]},
+            "post_block": {15: [0.05, 0.1, 0.15]},
         },
     },
 )
@@ -546,7 +546,7 @@ Returns per-layer hook-point availability aggregated across TP × PP ranks:
 
 ```bash
 curl http://localhost:8000/v1/steering/layers
-# {"layers": {"0": {"hook_points": ["post_mlp"]}, "1": {"hook_points": ["post_mlp", "pre_attn"]}, ...}}
+# {"layers": {"0": {"hook_points": ["post_block"]}, "1": {"hook_points": ["post_block", "pre_attn"]}, ...}}
 ```
 
 Useful to confirm which layers of the loaded model are steerable before
diff --git a/examples/capture_consumers/activation_reward_producer/README.md b/examples/capture_consumers/activation_reward_producer/README.md
index 093c94454933..723c9487f7c3 100644
--- a/examples/capture_consumers/activation_reward_producer/README.md
+++ b/examples/capture_consumers/activation_reward_producer/README.md
@@ -48,7 +48,7 @@ llm = LLM(
             "name": "activation_reward",
             "params": {
                 "layer": 12,
-                "hook": "post_mlp",
+                "hook": "post_block",
                 "vector_path": "/models/happy/sadness.pt",
                 "position_slice": {"start": 10, "end": None, "stride": 1},
                 "scale": 5.0,
@@ -64,7 +64,7 @@ llm = LLM(
 
 ```bash
 vllm serve meta-llama/Llama-3-8B \
-    --capture-consumers activation_reward:layer=12,hook=post_mlp,vector_path=/models/happy/sadness.pt,scale=5.0,nonlinearity=tanh
+    --capture-consumers activation_reward:layer=12,hook=post_block,vector_path=/models/happy/sadness.pt,scale=5.0,nonlinearity=tanh
 ```
 
 ### Parameters
@@ -72,7 +72,7 @@ vllm serve meta-llama/Llama-3-8B \
 | Field | Type | Default | Purpose |
 | --- | --- | --- | --- |
 | `layer` | `int` | required | Layer index to capture at. |
-| `hook` | `str` | required | One of `pre_attn`, `post_attn`, `post_mlp`, `mlp_in`, `mlp_out`. |
+| `hook` | `str` | required | One of `pre_attn`, `post_attn`, `post_block`, `mlp_in`, `mlp_out`. |
 | `vector_path` | `str` | required | Path to a `.pt` file holding a 1-D tensor of shape `(hidden_size,)`. L2-normalized at load. |
 | `position_slice` | `dict` | `{start: 10, end: null, stride: 1}` | Applied to the `all_generated` span before mean-pooling. |
 | `scale` | `float` | `1.0` | Multiplicative factor on the raw cosine. |
@@ -120,7 +120,7 @@ llm = LLM(
             "name": "activation_reward",
             "instance_name": "sadness_reward",
             "params": {
-                "layer": 12, "hook": "post_mlp",
+                "layer": 12, "hook": "post_block",
                 "vector_path": "/models/happy/sadness.pt",
             },
         },
diff --git a/examples/capture_consumers/activation_reward_producer/activation_reward_producer/__init__.py b/examples/capture_consumers/activation_reward_producer/activation_reward_producer/__init__.py
index 5bc33c61c6da..79dcf74d4b28 100644
--- a/examples/capture_consumers/activation_reward_producer/activation_reward_producer/__init__.py
+++ b/examples/capture_consumers/activation_reward_producer/activation_reward_producer/__init__.py
@@ -35,7 +35,7 @@
     from vllm.v1.capture.types import CaptureContext
 
 
-_HOOK_NAMES = frozenset({"pre_attn", "post_attn", "post_mlp", "mlp_in", "mlp_out"})
+_HOOK_NAMES = frozenset({"pre_attn", "post_attn", "post_block", "mlp_in", "mlp_out"})
 _NONLIN = {
     "tanh": math.tanh,
     "sigmoid": lambda x: 1.0 / (1.0 + math.exp(-x)),
diff --git a/examples/capture_consumers/activation_reward_producer/test.py b/examples/capture_consumers/activation_reward_producer/test.py
index 98bbe4b0c7d7..2568a96be4dc 100644
--- a/examples/capture_consumers/activation_reward_producer/test.py
+++ b/examples/capture_consumers/activation_reward_producer/test.py
@@ -63,7 +63,7 @@ def test_payload_shape_and_lifecycle(tmp: Path) -> None:
         _mock_config(),
         {
             "layer": 12,
-            "hook": "post_mlp",
+            "hook": "post_block",
             "vector_path": str(vec_path),
             "position_slice": {"start": 2, "end": None, "stride": 1},
             "scale": 5.0,
@@ -73,11 +73,11 @@ def test_payload_shape_and_lifecycle(tmp: Path) -> None:
 
     # Validator returns the pinned spec.
     spec = producer.validate_client_spec({}, _ctx())
-    assert spec.hooks == {"post_mlp": [12]}
+    assert spec.hooks == {"post_block": [12]}
     assert spec.positions == "all_generated"
 
     # Two chunks across two steps; total 6 rows; slice starts at 2.
-    key = (VllmInternalRequestId("req-1"), 12, "post_mlp")
+    key = (VllmInternalRequestId("req-1"), 12, "post_block")
     chunk_a = CaptureChunk(
         key=key,
         tensor=torch.randn(3, HIDDEN),
@@ -127,12 +127,12 @@ def test_empty_window_payload(tmp: Path) -> None:
         _mock_config(),
         {
             "layer": 0,
-            "hook": "post_mlp",
+            "hook": "post_block",
             "vector_path": str(vec_path),
             "position_slice": {"start": 100, "end": None, "stride": 1},
         },
     )
-    key = (VllmInternalRequestId("short"), 0, "post_mlp")
+    key = (VllmInternalRequestId("short"), 0, "post_block")
     producer.submit_chunk(
         CaptureChunk(
             key=key,
@@ -156,9 +156,9 @@ def test_no_chunks_partial_error(tmp: Path) -> None:
 
     producer = ActivationRewardProducer(
         _mock_config(),
-        {"layer": 0, "hook": "post_mlp", "vector_path": str(vec_path)},
+        {"layer": 0, "hook": "post_block", "vector_path": str(vec_path)},
     )
-    key = (VllmInternalRequestId("ghost"), 0, "post_mlp")
+    key = (VllmInternalRequestId("ghost"), 0, "post_block")
     producer.submit_finalize(CaptureFinalize(key=key))
     result = producer.get_result(key)
     assert result.status == "partial_error"
@@ -172,7 +172,7 @@ def test_non_empty_client_spec_rejected(tmp: Path) -> None:
 
     producer = ActivationRewardProducer(
         _mock_config(),
-        {"layer": 0, "hook": "post_mlp", "vector_path": str(vec_path)},
+        {"layer": 0, "hook": "post_block", "vector_path": str(vec_path)},
     )
     try:
         producer.validate_client_spec({"layer": 99}, _ctx())
@@ -189,7 +189,7 @@ def test_tp_pp_rejected(tmp: Path) -> None:
 
     producer = ActivationRewardProducer(
         _mock_config(),
-        {"layer": 0, "hook": "post_mlp", "vector_path": str(vec_path)},
+        {"layer": 0, "hook": "post_block", "vector_path": str(vec_path)},
     )
     try:
         producer.validate_client_spec({}, _ctx(tp=2))
@@ -207,7 +207,7 @@ def test_bad_layer_rejected(tmp: Path) -> None:
     try:
         ActivationRewardProducer(
             _mock_config(),
-            {"layer": NUM_LAYERS + 5, "hook": "post_mlp", "vector_path": str(vec_path)},
+            {"layer": NUM_LAYERS + 5, "hook": "post_block", "vector_path": str(vec_path)},
         )
     except ValueError as e:
         assert "out of range" in str(e)
@@ -223,7 +223,7 @@ def test_vector_hidden_size_mismatch(tmp: Path) -> None:
     try:
         ActivationRewardProducer(
             _mock_config(),
-            {"layer": 0, "hook": "post_mlp", "vector_path": str(vec_path)},
+            {"layer": 0, "hook": "post_block", "vector_path": str(vec_path)},
         )
     except ValueError as e:
         assert "hidden_size" in str(e)
diff --git a/examples/capture_consumers/minimal_plugin/my_plugin/__init__.py b/examples/capture_consumers/minimal_plugin/my_plugin/__init__.py
index 65b02837ae9d..250a0a4a9d13 100644
--- a/examples/capture_consumers/minimal_plugin/my_plugin/__init__.py
+++ b/examples/capture_consumers/minimal_plugin/my_plugin/__init__.py
@@ -28,7 +28,7 @@ def __init__(self, vllm_config: Any, params: dict[str, Any]) -> None:
 
     def global_capture_spec(self) -> CaptureSpec:
         return CaptureSpec(
-            hooks={"post_mlp": self._layers},
+            hooks={"post_block": self._layers},
             positions="last_prompt",
         )
 
diff --git a/examples/capture_consumers/plugin_authoring.md b/examples/capture_consumers/plugin_authoring.md
index 42ddff282624..1603cbd80f8a 100644
--- a/examples/capture_consumers/plugin_authoring.md
+++ b/examples/capture_consumers/plugin_authoring.md
@@ -40,7 +40,7 @@ class MyConsumer(CaptureConsumer):
 
     def global_capture_spec(self) -> CaptureSpec:
         return CaptureSpec(
-            hooks={"post_mlp": self._layers},
+            hooks={"post_block": self._layers},
             positions="last_prompt",
         )
 
@@ -86,7 +86,7 @@ pattern — the consumer always needs the same data.
 ```python
 def global_capture_spec(self) -> CaptureSpec:
     return CaptureSpec(
-        hooks={"post_mlp": [0, 15, 31]},
+        hooks={"post_block": [0, 15, 31]},
         positions="last_prompt",
     )
 ```
@@ -104,7 +104,7 @@ class FlexConsumer(CaptureConsumer):
     reads_client_spec = True
 
     def validate_client_spec(self, raw_spec, ctx):
-        hooks = raw_spec.get("hooks", {"post_mlp": list(range(ctx.num_hidden_layers))})
+        hooks = raw_spec.get("hooks", {"post_block": list(range(ctx.num_hidden_layers))})
         positions = raw_spec.get("positions", "all_prompt")
         return CaptureSpec(hooks=hooks, positions=positions)
 ```
@@ -197,7 +197,7 @@ import torch
 consumer = MyConsumer(MagicMock(), {"layers": [0]})
 adapter = _BatchedAdapter(consumer)
 
-key = (VllmInternalRequestId("test-req"), 0, "post_mlp")
+key = (VllmInternalRequestId("test-req"), 0, "post_block")
 
 adapter.submit_chunk(CaptureChunk(
     key=key,
@@ -247,7 +247,7 @@ class SumConsumer(CaptureConsumer):
 
     def global_capture_spec(self) -> CaptureSpec:
         return CaptureSpec(
-            hooks={"post_mlp": self._layers},
+            hooks={"post_block": self._layers},
             positions="last_prompt",
         )
 
diff --git a/examples/online_serving/openai_steering_client.py b/examples/online_serving/openai_steering_client.py
index bdb30fabc54f..99a575a33c0a 100644
--- a/examples/online_serving/openai_steering_client.py
+++ b/examples/online_serving/openai_steering_client.py
@@ -80,7 +80,7 @@ def main() -> None:
 
     # Base steering applied to both prefill and decode.
     base = {
-        "post_mlp": pack_hook(
+        "post_block": pack_hook(
             {15: rng.standard_normal(HIDDEN_SIZE).astype(PACK_DTYPE)},
             # Per-layer scales: the server multiplies row-by-row without
             # re-encoding the bytes, so the same vector can be reused at
diff --git a/tests/compile/passes/ir/test_inplace_functionalization.py b/tests/compile/passes/ir/test_inplace_functionalization.py
index 1e8d5662162f..f19a00b56a1f 100644
--- a/tests/compile/passes/ir/test_inplace_functionalization.py
+++ b/tests/compile/passes/ir/test_inplace_functionalization.py
@@ -369,7 +369,7 @@ def __init__(self, hidden_size=32, intermediate_size=128):
         )
 
         # Post-MLP norm
-        self.post_mlp_norm = nn.Parameter(torch.ones(hidden_size, dtype=torch.bfloat16))
+        self.post_block_norm = nn.Parameter(torch.ones(hidden_size, dtype=torch.bfloat16))
 
     def forward(self, x: torch.Tensor):
         # Attention block with residual
@@ -391,7 +391,7 @@ def forward(self, x: torch.Tensor):
 
         # Fused add + norm (maybe_inplace: residual1 is donated)
         normed2, residual2 = ops.fused_add_rms_norm.maybe_inplace(
-            mlp_out, residual1, self.post_mlp_norm, 1e-5
+            mlp_out, residual1, self.post_block_norm, 1e-5
         )
 
         return normed2, residual2
diff --git a/tests/entrypoints/openai/test_capture_protocol.py b/tests/entrypoints/openai/test_capture_protocol.py
index 03a823c0a8c7..9102c14f1475 100644
--- a/tests/entrypoints/openai/test_capture_protocol.py
+++ b/tests/entrypoints/openai/test_capture_protocol.py
@@ -61,7 +61,7 @@ def test_capture_request_field_accepted(self) -> None:
                 "filesystem": {
                     "request_id": "r1",
                     "tag": "t1",
-                    "hooks": {"post_mlp": [0]},
+                    "hooks": {"post_block": [0]},
                     "positions": "last_prompt",
                 },
             },
@@ -226,13 +226,13 @@ def test_populated_dict_builds_response_models(self) -> None:
         final = _FakeFinal(
             capture_results={
                 "fs": CaptureResult(
-                    key=("r1", 0, "post_mlp"),
+                    key=("r1", 0, "post_block"),
                     status="ok",
                     error=None,
                     payload=["/tmp/a.bin", "/tmp/a.json"],
                 ),
                 "log": CaptureResult(
-                    key=("r1", 0, "post_mlp"),
+                    key=("r1", 0, "post_block"),
                     status="partial_error",
                     error="dropped",
                     payload=None,
@@ -280,7 +280,7 @@ class _FakeConsumerAccepts(CaptureConsumer):
 
     def validate_client_spec(self, raw_spec, ctx):  # type: ignore[override]
         # Returns a valid CaptureSpec derived from the raw payload.
-        return CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        return CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
 
     def on_capture(self, key, tensor, sidecar):  # pragma: no cover - unused
         pass
@@ -414,7 +414,7 @@ def test_happy_path_mutates_sampling_params_to_spec(self, monkeypatch) -> None:
         stub._capture_consumers = {"filesystem": consumer}
 
         sp = SamplingParams(
-            capture={"filesystem": {"tag": "t", "hooks": {"post_mlp": [0]}}}
+            capture={"filesystem": {"tag": "t", "hooks": {"post_block": [0]}}}
         )
 
         result = admit(
@@ -460,7 +460,7 @@ class _Capturing(CaptureConsumer):
 
             def validate_client_spec(self, raw_spec, ctx):  # type: ignore[override]
                 received.append(ctx)
-                return CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+                return CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
 
             def on_capture(self, key, tensor, sidecar):  # pragma: no cover
                 pass
diff --git a/tests/entrypoints/openai/test_steering_modules.py b/tests/entrypoints/openai/test_steering_modules.py
index fa50dafd009c..ea1900ae2337 100644
--- a/tests/entrypoints/openai/test_steering_modules.py
+++ b/tests/entrypoints/openai/test_steering_modules.py
@@ -41,12 +41,12 @@ def test_both_empty_returns_none(self):
 
     def test_first_none_second_has_data(self):
         spec: SteeringVectorSpec = {
-            "post_mlp": {14: [1.0, 2.0, 3.0]},
+            "post_block": {14: [1.0, 2.0, 3.0]},
         }
         result = merge_steering_specs(None, spec)
         assert result is not None
         # Values should be pre-scaled (scale=1.0 for bare list)
-        assert result["post_mlp"][14].tolist() == [1.0, 2.0, 3.0]
+        assert result["post_block"][14].tolist() == [1.0, 2.0, 3.0]
 
     def test_first_has_data_second_none(self):
         spec: SteeringVectorSpec = {
@@ -57,70 +57,70 @@ def test_first_has_data_second_none(self):
         assert result["pre_attn"][5].tolist() == [0.5, 0.6]
 
     def test_non_overlapping_hooks_both_preserved(self):
-        a: SteeringVectorSpec = {"post_mlp": {14: [1.0, 2.0]}}
+        a: SteeringVectorSpec = {"post_block": {14: [1.0, 2.0]}}
         b: SteeringVectorSpec = {"pre_attn": {10: [3.0, 4.0]}}
         result = merge_steering_specs(a, b)
         assert result is not None
-        assert result["post_mlp"][14].tolist() == [1.0, 2.0]
+        assert result["post_block"][14].tolist() == [1.0, 2.0]
         assert result["pre_attn"][10].tolist() == [3.0, 4.0]
 
     def test_non_overlapping_layers_same_hook(self):
-        a: SteeringVectorSpec = {"post_mlp": {14: [1.0, 2.0]}}
-        b: SteeringVectorSpec = {"post_mlp": {15: [3.0, 4.0]}}
+        a: SteeringVectorSpec = {"post_block": {14: [1.0, 2.0]}}
+        b: SteeringVectorSpec = {"post_block": {15: [3.0, 4.0]}}
         result = merge_steering_specs(a, b)
         assert result is not None
-        assert result["post_mlp"][14].tolist() == [1.0, 2.0]
-        assert result["post_mlp"][15].tolist() == [3.0, 4.0]
+        assert result["post_block"][14].tolist() == [1.0, 2.0]
+        assert result["post_block"][15].tolist() == [3.0, 4.0]
 
     def test_overlapping_hook_layer_added(self):
-        a: SteeringVectorSpec = {"post_mlp": {14: [1.0, 2.0, 3.0]}}
-        b: SteeringVectorSpec = {"post_mlp": {14: [0.5, 0.5, 0.5]}}
+        a: SteeringVectorSpec = {"post_block": {14: [1.0, 2.0, 3.0]}}
+        b: SteeringVectorSpec = {"post_block": {14: [0.5, 0.5, 0.5]}}
         result = merge_steering_specs(a, b)
         assert result is not None
-        assert result["post_mlp"][14].tolist() == [1.5, 2.5, 3.5]
+        assert result["post_block"][14].tolist() == [1.5, 2.5, 3.5]
 
     def test_overlapping_with_scaled_entries(self):
         a: SteeringVectorSpec = {
-            "post_mlp": {
+            "post_block": {
                 14: {"vector": [1.0, 2.0], "scale": 2.0},
             }
         }
         b: SteeringVectorSpec = {
-            "post_mlp": {
+            "post_block": {
                 14: {"vector": [3.0, 4.0], "scale": 0.5},
             }
         }
         result = merge_steering_specs(a, b)
         assert result is not None
         # a scaled: [2.0, 4.0], b scaled: [1.5, 2.0], sum: [3.5, 6.0]
-        assert result["post_mlp"][14].tolist() == [3.5, 6.0]
+        assert result["post_block"][14].tolist() == [3.5, 6.0]
 
     def test_one_scaled_one_bare(self):
         a: SteeringVectorSpec = {
-            "post_mlp": {
+            "post_block": {
                 14: {"vector": [1.0, 2.0], "scale": 3.0},
             }
         }
         b: SteeringVectorSpec = {
-            "post_mlp": {
+            "post_block": {
                 14: [0.5, 0.5],
             }
         }
         result = merge_steering_specs(a, b)
         assert result is not None
         # a scaled: [3.0, 6.0], b scaled: [0.5, 0.5], sum: [3.5, 6.5]
-        assert result["post_mlp"][14].tolist() == [3.5, 6.5]
+        assert result["post_block"][14].tolist() == [3.5, 6.5]
 
     def test_passthrough_entry_is_prescaled(self):
         """Non-overlapping scaled entry should still be pre-scaled."""
         spec: SteeringVectorSpec = {
-            "post_mlp": {
+            "post_block": {
                 14: {"vector": [1.0, 2.0], "scale": 0.5},
             }
         }
         result = merge_steering_specs(spec, None)
         assert result is not None
-        assert result["post_mlp"][14].tolist() == [0.5, 1.0]
+        assert result["post_block"][14].tolist() == [0.5, 1.0]
 
 
 # ---------------------------------------------------------------------------
@@ -138,15 +138,15 @@ def test_empty_dict_returns_none(self):
         assert _convert_layer_keys({}, field_name="vectors") is None
 
     def test_converts_string_keys_to_int(self):
-        spec = {"post_mlp": {"14": [1.0, 2.0], "15": [3.0, 4.0]}}
+        spec = {"post_block": {"14": [1.0, 2.0], "15": [3.0, 4.0]}}
         result = _convert_layer_keys(spec, field_name="vectors")
         assert result is not None
-        assert 14 in result["post_mlp"]
-        assert 15 in result["post_mlp"]
-        assert result["post_mlp"][14] == [1.0, 2.0]
+        assert 14 in result["post_block"]
+        assert 15 in result["post_block"]
+        assert result["post_block"][14] == [1.0, 2.0]
 
     def test_rejects_non_dict_layers(self):
-        spec = {"post_mlp": "not_a_dict"}
+        spec = {"post_block": "not_a_dict"}
         with pytest.raises(ValueError, match="must be a JSON object mapping"):
             _convert_layer_keys(spec, field_name="vectors")
 
@@ -168,19 +168,19 @@ async def test_register_and_get(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             name="test_mod",
-            vectors={"post_mlp": {14: [1.0, 2.0]}},
+            vectors={"post_block": {14: [1.0, 2.0]}},
         )
         module = registry.get("test_mod")
         assert module is not None
         assert module.name == "test_mod"
-        assert module.vectors == {"post_mlp": {14: [1.0, 2.0]}}
+        assert module.vectors == {"post_block": {14: [1.0, 2.0]}}
 
     @pytest.mark.asyncio
     async def test_register_overwrites_existing(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             name="mod",
-            vectors={"post_mlp": {14: [1.0]}},
+            vectors={"post_block": {14: [1.0]}},
         )
         await registry.register(
             name="mod",
@@ -189,14 +189,14 @@ async def test_register_overwrites_existing(self):
         module = registry.get("mod")
         assert module is not None
         assert "pre_attn" in module.vectors
-        assert "post_mlp" not in module.vectors
+        assert "post_block" not in module.vectors
 
     @pytest.mark.asyncio
     async def test_unregister_existing_returns_true(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             name="mod",
-            vectors={"post_mlp": {14: [1.0]}},
+            vectors={"post_block": {14: [1.0]}},
         )
         assert await registry.unregister("mod") is True
         assert registry.get("mod") is None
@@ -213,9 +213,9 @@ def test_get_nonexistent_returns_none(self):
     @pytest.mark.asyncio
     async def test_list_modules_sorted(self):
         registry = SteeringModuleRegistry()
-        await registry.register("charlie", vectors={"post_mlp": {0: [1.0]}})
-        await registry.register("alpha", vectors={"post_mlp": {0: [1.0]}})
-        await registry.register("bravo", vectors={"post_mlp": {0: [1.0]}})
+        await registry.register("charlie", vectors={"post_block": {0: [1.0]}})
+        await registry.register("alpha", vectors={"post_block": {0: [1.0]}})
+        await registry.register("bravo", vectors={"post_block": {0: [1.0]}})
         assert registry.list_modules() == ["alpha", "bravo", "charlie"]
 
     @pytest.mark.asyncio
@@ -244,7 +244,7 @@ async def test_register_unknown_layer_index_raises(self):
         with pytest.raises(ValueError, match="unknown layer index 99"):
             await registry.register(
                 name="bad_layer",
-                vectors={"post_mlp": {99: [1.0]}},
+                vectors={"post_block": {99: [1.0]}},
             )
 
     @pytest.mark.asyncio
@@ -253,7 +253,7 @@ async def test_register_malformed_entry_raises(self):
         with pytest.raises(TypeError):
             await registry.register(
                 name="bad_entry",
-                vectors={"post_mlp": {0: "not_a_list_or_dict"}},
+                vectors={"post_block": {0: "not_a_list_or_dict"}},
             )
 
     @pytest.mark.asyncio
@@ -264,7 +264,7 @@ async def test_register_invalid_vector_contents_raise(self):
             await registry.register(
                 name="bad_values",
                 vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {
                             "vector": ["bad", 1.0],
                             "scale": 1.0,
@@ -277,7 +277,7 @@ async def test_register_invalid_vector_contents_raise(self):
             await registry.register(
                 name="bad_scale",
                 vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {
                             "vector": [1.0, 2.0],
                             "scale": math.nan,
@@ -292,7 +292,7 @@ async def test_register_invalid_vector_contents_raise(self):
     async def test_load_from_file_valid_json(self):
         registry = SteeringModuleRegistry()
         data = {
-            "vectors": {"post_mlp": {"14": [0.1, 0.2, 0.3]}},
+            "vectors": {"post_block": {"14": [0.1, 0.2, 0.3]}},
             "prefill_vectors": {"pre_attn": {"5": [0.4, 0.5, 0.6]}},
             "decode_vectors": None,
         }
@@ -306,7 +306,7 @@ async def test_load_from_file_valid_json(self):
             assert module is not None
             assert module.name == "loaded"
             # Layer keys should be ints
-            assert 14 in module.vectors["post_mlp"]
+            assert 14 in module.vectors["post_block"]
             assert 5 in module.prefill_vectors["pre_attn"]
             assert module.decode_vectors is None
         finally:
@@ -323,7 +323,7 @@ async def test_load_from_file_converts_string_keys(self):
         registry = SteeringModuleRegistry()
         data = {
             "vectors": {
-                "post_mlp": {"0": [1.0], "99": [2.0]},
+                "post_block": {"0": [1.0], "99": [2.0]},
             },
         }
         with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
@@ -334,10 +334,10 @@ async def test_load_from_file_converts_string_keys(self):
             await registry.load_from_file("conv_keys", tmp_path)
             module = registry.get("conv_keys")
             assert module is not None
-            assert 0 in module.vectors["post_mlp"]
-            assert 99 in module.vectors["post_mlp"]
+            assert 0 in module.vectors["post_block"]
+            assert 99 in module.vectors["post_block"]
             # String keys should NOT be present
-            assert "0" not in module.vectors["post_mlp"]
+            assert "0" not in module.vectors["post_block"]
         finally:
             os.unlink(tmp_path)
 
@@ -359,7 +359,7 @@ async def test_load_from_file_invalid_vector_contents_raise(self):
         registry = SteeringModuleRegistry()
         data = {
             "vectors": {
-                "post_mlp": {
+                "post_block": {
                     "14": {
                         "vector": [1.0, "bad"],
                         "scale": 1.0,
@@ -382,7 +382,7 @@ async def test_load_from_file_rejects_non_dict_hook_payload(self):
         registry = SteeringModuleRegistry()
         data = {
             "vectors": {
-                "post_mlp": [1.0, 2.0],
+                "post_block": [1.0, 2.0],
             },
         }
         with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
@@ -415,7 +415,7 @@ async def test_load_from_file_packed_tier(self):
             "data": base64.b64encode(stacked.tobytes()).decode("ascii"),
         }
         data = {
-            "vectors": {"post_mlp": packed_hook},
+            "vectors": {"post_block": packed_hook},
         }
         with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
             json.dump(data, f)
@@ -426,7 +426,7 @@ async def test_load_from_file_packed_tier(self):
             await registry.load_from_file("packed", tmp_path)
             module = registry.get("packed")
             assert module is not None
-            stored = module.vectors["post_mlp"][14]
+            stored = module.vectors["post_block"][14]
             assert isinstance(stored, list)
             assert [round(v, 5) for v in stored] == [
                 round(float(x), 5) for x in vec
@@ -451,7 +451,7 @@ async def test_load_from_file_packed_with_scales(self):
             "data": base64.b64encode(stacked.tobytes()).decode("ascii"),
             "scales": [3.0],
         }
-        data = {"vectors": {"post_mlp": packed_hook}}
+        data = {"vectors": {"post_block": packed_hook}}
         with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
             json.dump(data, f)
             tmp_path = f.name
@@ -459,7 +459,7 @@ async def test_load_from_file_packed_with_scales(self):
         try:
             registry = SteeringModuleRegistry()
             await registry.load_from_file("packed_scaled", tmp_path)
-            stored = registry.get("packed_scaled").vectors["post_mlp"][14]
+            stored = registry.get("packed_scaled").vectors["post_block"][14]
             assert [round(v, 5) for v in stored] == [3.0, 6.0]
         finally:
             os.unlink(tmp_path)
@@ -478,7 +478,7 @@ async def test_load_from_file_packed_malformed_raises(self):
             "layer_indices": [14],
             "data": base64.b64encode(b"\x00" * 8).decode("ascii"),
         }
-        data = {"vectors": {"post_mlp": bad_hook}}
+        data = {"vectors": {"post_block": bad_hook}}
         with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
             json.dump(data, f)
             tmp_path = f.name
@@ -502,7 +502,7 @@ async def test_init_app_state_only_sets_registry_when_steering_enabled():
     engine_client.renderer = MagicMock()
     engine_client.io_processor = MagicMock()
     engine_client.collective_rpc = AsyncMock(
-        return_value=[{0: ["post_mlp"], 1: ["post_mlp"]}]
+        return_value=[{0: ["post_block"], 1: ["post_block"]}]
     )
 
     args = Namespace(
@@ -613,14 +613,14 @@ async def test_known_name_no_inline(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             "my_mod",
-            vectors={"post_mlp": {14: [1.0, 2.0]}},
+            vectors={"post_block": {14: [1.0, 2.0]}},
             prefill_vectors={"pre_attn": {5: [0.5, 0.6]}},
         )
         v, p, d, err = registry.resolve_for_request("my_mod", None, None, None)
         assert err is None
         # Vectors are pre-scaled (scale=1.0 bare lists)
         assert v is not None
-        assert v["post_mlp"][14].tolist() == [1.0, 2.0]
+        assert v["post_block"][14].tolist() == [1.0, 2.0]
         assert p is not None
         assert p["pre_attn"][5].tolist() == [0.5, 0.6]
         assert d is None
@@ -630,27 +630,27 @@ async def test_known_name_with_inline_merge(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             "base",
-            vectors={"post_mlp": {14: [1.0, 2.0]}},
+            vectors={"post_block": {14: [1.0, 2.0]}},
         )
-        inline: SteeringVectorSpec = {"post_mlp": {14: [0.5, 0.5]}}
+        inline: SteeringVectorSpec = {"post_block": {14: [0.5, 0.5]}}
         v, p, d, err = registry.resolve_for_request("base", inline, None, None)
         assert err is None
         assert v is not None
-        assert v["post_mlp"][14].tolist() == [1.5, 2.5]
+        assert v["post_block"][14].tolist() == [1.5, 2.5]
 
     @pytest.mark.asyncio
     async def test_named_one_tier_inline_different_tier(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             "named",
-            vectors={"post_mlp": {14: [1.0, 2.0]}},
+            vectors={"post_block": {14: [1.0, 2.0]}},
         )
         inline_prefill: SteeringVectorSpec = {"pre_attn": {5: [0.3, 0.4]}}
         v, p, d, err = registry.resolve_for_request("named", None, inline_prefill, None)
         assert err is None
         # Named vectors tier
         assert v is not None
-        assert v["post_mlp"][14].tolist() == [1.0, 2.0]
+        assert v["post_block"][14].tolist() == [1.0, 2.0]
         # Inline prefill tier
         assert p is not None
         assert p["pre_attn"][5].tolist() == [0.3, 0.4]
@@ -660,8 +660,8 @@ async def test_named_one_tier_inline_different_tier(self):
     @pytest.mark.asyncio
     async def test_error_message_lists_available_modules(self):
         registry = SteeringModuleRegistry()
-        await registry.register("a", vectors={"post_mlp": {0: [1.0]}})
-        await registry.register("b", vectors={"post_mlp": {0: [1.0]}})
+        await registry.register("a", vectors={"post_block": {0: [1.0]}})
+        await registry.register("b", vectors={"post_block": {0: [1.0]}})
         _, _, _, err = registry.resolve_for_request("missing", None, None, None)
         assert err is not None
         assert "['a', 'b']" in err
@@ -671,10 +671,10 @@ async def test_dimension_mismatch_returns_error(self):
         registry = SteeringModuleRegistry()
         await registry.register(
             "named",
-            vectors={"post_mlp": {14: [1.0, 2.0]}},
+            vectors={"post_block": {14: [1.0, 2.0]}},
         )
 
-        inline: SteeringVectorSpec = {"post_mlp": {14: [0.5]}}
+        inline: SteeringVectorSpec = {"post_block": {14: [0.5]}}
         v, p, d, err = registry.resolve_for_request("named", inline, None, None)
 
         assert v is None
diff --git a/tests/entrypoints/openai/test_steering_protocol.py b/tests/entrypoints/openai/test_steering_protocol.py
index d8f16889446b..6e43b6bc61fd 100644
--- a/tests/entrypoints/openai/test_steering_protocol.py
+++ b/tests/entrypoints/openai/test_steering_protocol.py
@@ -129,7 +129,7 @@ def test_legacy_scaled_dict_rejected(self):
         with pytest.raises(ValidationError):
             _make_chat(
                 steering_vectors={
-                    "post_mlp": {10: {"vector": [0.4, 0.5, 0.6], "scale": 2.0}}
+                    "post_block": {10: {"vector": [0.4, 0.5, 0.6], "scale": 2.0}}
                 }
             )
 
@@ -160,11 +160,11 @@ def test_to_sampling_params_none_when_absent(self):
 
     def test_per_row_scales_applied_at_unpack(self):
         packed = {
-            "post_mlp": _pack({10: [1.0] * _HIDDEN}, scales=[2.0]),
+            "post_block": _pack({10: [1.0] * _HIDDEN}, scales=[2.0]),
         }
         req = _make_chat(steering_vectors=packed)
         sp = req.to_sampling_params(max_tokens=100, default_sampling_params={})
-        assert sp.steering_vectors["post_mlp"][10].tolist() == pytest.approx(
+        assert sp.steering_vectors["post_block"][10].tolist() == pytest.approx(
             [2.0] * _HIDDEN
         )
 
@@ -213,7 +213,7 @@ def test_legacy_scaled_dict_rejected(self):
         with pytest.raises(ValidationError):
             _make_completion(
                 steering_vectors={
-                    "post_mlp": {10: {"vector": [0.4, 0.5, 0.6], "scale": 2.0}}
+                    "post_block": {10: {"vector": [0.4, 0.5, 0.6], "scale": 2.0}}
                 }
             )
 
@@ -243,11 +243,11 @@ def test_to_sampling_params_none_when_absent(self):
 
     def test_per_row_scales_applied_at_unpack(self):
         packed = {
-            "post_mlp": _pack({10: [1.0] * _HIDDEN}, scales=[2.0]),
+            "post_block": _pack({10: [1.0] * _HIDDEN}, scales=[2.0]),
         }
         req = _make_completion(steering_vectors=packed)
         sp = req.to_sampling_params(max_tokens=100)
-        assert sp.steering_vectors["post_mlp"][10].tolist() == pytest.approx(
+        assert sp.steering_vectors["post_block"][10].tolist() == pytest.approx(
             [2.0] * _HIDDEN
         )
 
diff --git a/tests/entrypoints/serve/steering/test_api_router.py b/tests/entrypoints/serve/steering/test_api_router.py
index 493324767bf7..214089c7c8b7 100644
--- a/tests/entrypoints/serve/steering/test_api_router.py
+++ b/tests/entrypoints/serve/steering/test_api_router.py
@@ -477,7 +477,7 @@ class TestNormalizeSpec:
     def test_normalize_spec_drops_empty_hook(self):
         """Hooks whose layer dict is empty are dropped from the result.
 
-        An input like ``{"post_mlp": {}}`` is functionally
+        An input like ``{"post_block": {}}`` is functionally
         equivalent to omitting the hook entirely: no layers and no
         vectors would be applied. Keeping the empty hook in the
         normalized spec would produce a truthy-but-empty entry that
diff --git a/tests/entrypoints/serve/steering/test_api_router_distributed.py b/tests/entrypoints/serve/steering/test_api_router_distributed.py
index a1e6179e3734..b16ded0c8585 100644
--- a/tests/entrypoints/serve/steering/test_api_router_distributed.py
+++ b/tests/entrypoints/serve/steering/test_api_router_distributed.py
@@ -143,7 +143,7 @@ class TestErrorConsolidation:
     def test_size_mismatch_single_400(self, client, engine):
         """SteeringVectorError from any rank → single 400 with clean message."""
         engine.collective_rpc.side_effect = SteeringVectorError(
-            "Rank 1: Layer 0 (post_mlp): expected vector of size 128, got 2"
+            "Rank 1: Layer 0 (post_block): expected vector of size 128, got 2"
         )
         resp = client.post("/v1/steering/set", json=_vecs({0: [1.0, 2.0]}))
         assert resp.status_code == 400
@@ -153,7 +153,7 @@ def test_size_mismatch_single_400(self, client, engine):
 
     def test_non_finite_single_400(self, client, engine):
         engine.collective_rpc.side_effect = RuntimeError(
-            "Rank 0: Layer 0 (post_mlp): steering vector contains "
+            "Rank 0: Layer 0 (post_block): steering vector contains "
             "non-finite values (NaN or Infinity)"
         )
         resp = client.post("/v1/steering/set", json=_vecs({0: [1.0, 2.0]}))
@@ -169,31 +169,31 @@ class TestDeepMergeStatus:
     def test_merges_disjoint(self):
         result = deep_merge_status(
             [
-                {0: {"post_mlp": {"norm": 1.0}}},
-                {5: {"post_mlp": {"norm": 2.5}}},
+                {0: {"post_block": {"norm": 1.0}}},
+                {5: {"post_block": {"norm": 2.5}}},
             ]
         )
         assert result == {
-            0: {"post_mlp": {"norm": 1.0}},
-            5: {"post_mlp": {"norm": 2.5}},
+            0: {"post_block": {"norm": 1.0}},
+            5: {"post_block": {"norm": 2.5}},
         }
 
     def test_merges_identical_tp_duplicates(self):
         """TP ranks report identical state — merge must not raise."""
         result = deep_merge_status(
             [
-                {0: {"post_mlp": {"norm": 1.0}}},
-                {0: {"post_mlp": {"norm": 1.0}}},
+                {0: {"post_block": {"norm": 1.0}}},
+                {0: {"post_block": {"norm": 1.0}}},
             ]
         )
-        assert result == {0: {"post_mlp": {"norm": 1.0}}}
+        assert result == {0: {"post_block": {"norm": 1.0}}}
 
     def test_raises_on_divergence(self):
         with pytest.raises(RuntimeError, match="divergence"):
             deep_merge_status(
                 [
-                    {0: {"post_mlp": {"norm": 1.0}}},
-                    {0: {"post_mlp": {"norm": 2.0}}},
+                    {0: {"post_block": {"norm": 1.0}}},
+                    {0: {"post_block": {"norm": 2.0}}},
                 ]
             )
 
@@ -206,8 +206,8 @@ def test_handles_empty_inputs(self):
 class TestGetSteeringDivergence:
     def test_divergence_surfaces_as_500(self, client, engine):
         engine.collective_rpc.return_value = [
-            {0: {"post_mlp": {"norm": 1.0}}},
-            {0: {"post_mlp": {"norm": 2.0}}},
+            {0: {"post_block": {"norm": 1.0}}},
+            {0: {"post_block": {"norm": 2.0}}},
         ]
         resp = client.get("/v1/steering")
         assert resp.status_code == 500
@@ -221,17 +221,17 @@ class TestGetSteeringLayers:
     def test_merges_hook_points_across_workers(self, client, engine):
         """PP-disjoint layers + TP-identical hooks are merged correctly."""
         engine.collective_rpc.return_value = [
-            {0: ["post_mlp"], 1: ["post_mlp", "pre_attn"]},
-            {0: ["post_mlp"], 1: ["post_mlp", "pre_attn"]},
-            {2: ["post_mlp"], 3: ["post_mlp"]},
-            {2: ["post_mlp"], 3: ["post_mlp"]},
+            {0: ["post_block"], 1: ["post_block", "pre_attn"]},
+            {0: ["post_block"], 1: ["post_block", "pre_attn"]},
+            {2: ["post_block"], 3: ["post_block"]},
+            {2: ["post_block"], 3: ["post_block"]},
         ]
         resp = client.get("/v1/steering/layers")
         assert resp.status_code == 200
         layers = resp.json()["layers"]
         assert set(layers.keys()) == {"0", "1", "2", "3"}
-        assert layers["1"]["hook_points"] == ["post_mlp", "pre_attn"]
-        assert layers["2"]["hook_points"] == ["post_mlp"]
+        assert layers["1"]["hook_points"] == ["post_block", "pre_attn"]
+        assert layers["2"]["hook_points"] == ["post_block"]
 
     def test_empty_worker_results(self, client, engine):
         engine.collective_rpc.return_value = [{}, {}]
diff --git a/tests/entrypoints/serve/steering/test_protocol.py b/tests/entrypoints/serve/steering/test_protocol.py
index 5f586246b2d7..1d98b098a94d 100644
--- a/tests/entrypoints/serve/steering/test_protocol.py
+++ b/tests/entrypoints/serve/steering/test_protocol.py
@@ -11,19 +11,19 @@ class TestSetSteeringRequest:
     """Validate SetSteeringRequest Pydantic model."""
 
     def test_basic_vectors(self):
-        req = SetSteeringRequest(vectors={"post_mlp": {0: [1.0, 2.0], 5: [3.0, 4.0]}})
+        req = SetSteeringRequest(vectors={"post_block": {0: [1.0, 2.0], 5: [3.0, 4.0]}})
         assert req.vectors is not None
-        assert req.vectors["post_mlp"][0] == [1.0, 2.0]
+        assert req.vectors["post_block"][0] == [1.0, 2.0]
         assert req.prefill_vectors is None
         assert req.decode_vectors is None
         assert req.replace is False
 
     def test_with_co_located_scale(self):
         req = SetSteeringRequest(
-            vectors={"post_mlp": {0: {"vector": [1.0, 2.0], "scale": 2.5}}},
+            vectors={"post_block": {0: {"vector": [1.0, 2.0], "scale": 2.5}}},
         )
         assert req.vectors is not None
-        entry = req.vectors["post_mlp"][0]
+        entry = req.vectors["post_block"][0]
         assert isinstance(entry, dict)
         assert entry["vector"] == [1.0, 2.0]
         assert entry["scale"] == 2.5
@@ -36,7 +36,7 @@ def test_replace_flag(self):
         assert req.replace is True
 
     def test_replace_defaults_false(self):
-        req = SetSteeringRequest(vectors={"post_mlp": {0: [1.0]}})
+        req = SetSteeringRequest(vectors={"post_block": {0: [1.0]}})
         assert req.replace is False
 
     def test_empty_vectors_allowed(self):
@@ -54,22 +54,22 @@ def test_all_fields_none_by_default(self):
     def test_string_keys_coerced_to_int(self):
         """JSON dict keys are strings; Pydantic should coerce to int."""
         req = SetSteeringRequest.model_validate(
-            {"vectors": {"post_mlp": {"0": [1.0, 2.0]}}}
+            {"vectors": {"post_block": {"0": [1.0, 2.0]}}}
         )
         assert req.vectors is not None
-        assert 0 in req.vectors["post_mlp"]
+        assert 0 in req.vectors["post_block"]
 
     def test_full_request(self):
         req = SetSteeringRequest(
             vectors={
                 "pre_attn": {0: [1.0, 0.5]},
-                "post_mlp": {3: [0.0, 1.0]},
+                "post_block": {3: [0.0, 1.0]},
             },
             prefill_vectors={
                 "pre_attn": {0: {"vector": [0.1, 0.2], "scale": 2.0}},
             },
             decode_vectors={
-                "post_mlp": {3: [0.5, 0.5]},
+                "post_block": {3: [0.5, 0.5]},
             },
             replace=True,
         )
@@ -84,7 +84,7 @@ def test_multiple_hook_points(self):
             vectors={
                 "pre_attn": {0: [1.0]},
                 "post_attn": {0: [2.0]},
-                "post_mlp": {0: [3.0]},
+                "post_block": {0: [3.0]},
             }
         )
         assert req.vectors is not None
@@ -106,7 +106,7 @@ def test_unknown_field_rejected(self):
 
         with pytest.raises(pydantic.ValidationError):
             SetSteeringRequest(
-                vectors={"post_mlp": {0: [1.0, 2.0]}},
+                vectors={"post_block": {0: [1.0, 2.0]}},
                 scales={0: 2.5},
             )
 
@@ -117,7 +117,7 @@ def test_unknown_field_via_model_validate(self):
         with pytest.raises(pydantic.ValidationError):
             SetSteeringRequest.model_validate(
                 {
-                    "vectors": {"post_mlp": {"0": [1.0]}},
+                    "vectors": {"post_block": {"0": [1.0]}},
                     "scales": {"0": 2.5},
                 }
             )
diff --git a/tests/entrypoints/serve/steering/test_worker_steering.py b/tests/entrypoints/serve/steering/test_worker_steering.py
index 89d652571b85..c71916e6c251 100644
--- a/tests/entrypoints/serve/steering/test_worker_steering.py
+++ b/tests/entrypoints/serve/steering/test_worker_steering.py
@@ -3,7 +3,7 @@
 """Unit tests for steering model-runner mixin methods using a mock model.
 
 All hook-point-aware tests use the default hook point
-(``post_mlp``) unless testing multi-hook-point behaviour.
+(``post_block``) unless testing multi-hook-point behaviour.
 
 Tests cover three-tier steering (base, prefill, decode) and co-located
 scale format (bare list vs dict with scale).
@@ -22,7 +22,7 @@
 from vllm.v1.worker.worker_base import WorkerBase
 
 # Shorthand for test readability
-_HP = DEFAULT_HOOK_POINT.value  # "post_mlp"
+_HP = DEFAULT_HOOK_POINT.value  # "post_block"
 
 
 class FakeDecoderLayer(nn.Module):
@@ -31,9 +31,9 @@ class FakeDecoderLayer(nn.Module):
     def __init__(self, layer_idx: int, hidden_size: int, max_steering_configs: int = 0):
         super().__init__()
         self.layer_idx = layer_idx
-        # Default hook point buffers (post_mlp) — table + index only
+        # Default hook point buffers (post_block) — table + index only
         self.register_buffer(
-            "steering_table_post_mlp",
+            "steering_table_post_block",
             torch.zeros(max_steering_configs + 2, hidden_size),
             persistent=False,
         )
diff --git a/tests/v1/capture/consumers/filesystem/test_coalescing.py b/tests/v1/capture/consumers/filesystem/test_coalescing.py
index 363269fe51f8..b8f93f2e05d8 100644
--- a/tests/v1/capture/consumers/filesystem/test_coalescing.py
+++ b/tests/v1/capture/consumers/filesystem/test_coalescing.py
@@ -52,7 +52,7 @@ def _drive(
     for r in range(num_requests):
         req = f"req_{r:04d}"
         steps = rng.randint(1, max_steps)
-        layer, hook = rng.randint(0, 5), "post_mlp"
+        layer, hook = rng.randint(0, 5), "post_block"
         d = root / req
         d.mkdir(parents=True, exist_ok=True)
         bp = d / f"{layer}_{hook}.bin"
diff --git a/tests/v1/capture/consumers/filesystem/test_consumer.py b/tests/v1/capture/consumers/filesystem/test_consumer.py
index e9099a6f8ce1..c224054b90ad 100644
--- a/tests/v1/capture/consumers/filesystem/test_consumer.py
+++ b/tests/v1/capture/consumers/filesystem/test_consumer.py
@@ -388,12 +388,12 @@ def test_parallel_sizes_accepted_for_residual_hooks(
             raw = FilesystemCaptureRequest(
                 request_id="par-req",
                 tag="par-tag",
-                hooks={"post_mlp": [0, 1, 2]},
+                hooks={"post_block": [0, 1, 2]},
                 positions="last_prompt",
             )
             spec = consumer.validate_client_spec(raw, ctx)
             assert isinstance(spec, CaptureSpec)
-            assert spec.hooks["post_mlp"] == [0, 1, 2]
+            assert spec.hooks["post_block"] == [0, 1, 2]
         finally:
             consumer.shutdown(timeout=5.0)
 
@@ -412,7 +412,7 @@ def test_layouts_accepted_under_pipeline_parallelism(self, layout: str) -> None:
             raw = FilesystemCaptureRequest(
                 request_id="pp-accepts",
                 tag="pp-accepts",
-                hooks={"post_mlp": [0, 1]},
+                hooks={"post_block": [0, 1]},
                 positions="last_prompt",
                 layout=layout,
             )
@@ -505,7 +505,7 @@ def test_ok_after_finalize(self, tmp_path: pathlib.Path) -> None:
         consumer = _make_consumer(tmp_path)
         try:
             request_id = VllmInternalRequestId("result-test")
-            key: CaptureKey = (request_id, 1, "post_mlp")
+            key: CaptureKey = (request_id, 1, "post_block")
 
             tensor = torch.randn(1, 4, dtype=torch.float32)
             consumer.submit_chunk(
@@ -548,7 +548,7 @@ def test_returns_ok_after_finalize(self, tmp_path: pathlib.Path) -> None:
         consumer = _make_consumer(tmp_path)
         try:
             request_id = VllmInternalRequestId("wait-ok")
-            key: CaptureKey = (request_id, 2, "post_mlp")
+            key: CaptureKey = (request_id, 2, "post_block")
 
             tensor = torch.randn(1, 4, dtype=torch.float32)
             consumer.submit_chunk(
diff --git a/tests/v1/capture/consumers/filesystem/test_packed.py b/tests/v1/capture/consumers/filesystem/test_packed.py
index e9f04aed81bb..b48c1cc5e3cb 100644
--- a/tests/v1/capture/consumers/filesystem/test_packed.py
+++ b/tests/v1/capture/consumers/filesystem/test_packed.py
@@ -100,10 +100,10 @@ def _write_packed(
 class TestReader:
     def test_per_file_round_trip(self, tmp_path: pathlib.Path) -> None:
         arr = np.arange(2 * 8, dtype=np.float32).reshape(2, 8)
-        _write_per_file(tmp_path / "req-1", 3, "post_mlp", arr, "float32")
-        entry = read_per_file(tmp_path / "req-1" / "3_post_mlp.bin")
+        _write_per_file(tmp_path / "req-1", 3, "post_block", arr, "float32")
+        entry = read_per_file(tmp_path / "req-1" / "3_post_block.bin")
         assert entry.layer == 3
-        assert entry.hook == "post_mlp"
+        assert entry.hook == "post_block"
         assert entry.dtype == "float32"
         np.testing.assert_array_equal(entry.array, arr)
 
@@ -111,44 +111,44 @@ def test_packed_round_trip(self, tmp_path: pathlib.Path) -> None:
         a = np.random.randn(4, 16).astype(np.float32)
         b = np.random.randn(1, 16).astype(np.float32)
         c = np.random.randn(7, 16).astype(np.float32)
-        tensors = [(0, "post_mlp", a), (5, "post_mlp", b), (5, "post_attn", c)]
+        tensors = [(0, "post_block", a), (5, "post_block", b), (5, "post_attn", c)]
         _write_packed(tmp_path / "req-2", tensors, "float32")
 
         got = read_packed(tmp_path / "req-2")
-        assert set(got) == {(0, "post_mlp"), (5, "post_mlp"), (5, "post_attn")}
-        np.testing.assert_array_equal(got[(0, "post_mlp")].array, a)
-        np.testing.assert_array_equal(got[(5, "post_mlp")].array, b)
+        assert set(got) == {(0, "post_block"), (5, "post_block"), (5, "post_attn")}
+        np.testing.assert_array_equal(got[(0, "post_block")].array, a)
+        np.testing.assert_array_equal(got[(5, "post_block")].array, b)
         np.testing.assert_array_equal(got[(5, "post_attn")].array, c)
 
     def test_packed_accepts_index_or_bin_or_dir(self, tmp_path: pathlib.Path) -> None:
         arr = np.random.randn(2, 4).astype(np.float32)
-        _write_packed(tmp_path / "r", [(1, "post_mlp", arr)], "float32")
+        _write_packed(tmp_path / "r", [(1, "post_block", arr)], "float32")
         for target in (
             tmp_path / "r",
             tmp_path / "r" / PACKED_INDEX_NAME,
             tmp_path / "r" / PACKED_BIN_NAME,
         ):
             got = read_packed(target)
-            np.testing.assert_array_equal(got[(1, "post_mlp")].array, arr)
+            np.testing.assert_array_equal(got[(1, "post_block")].array, arr)
 
     def test_read_request_autodetects_layout(self, tmp_path: pathlib.Path) -> None:
         # per_file dir
         pf = tmp_path / "pf"
-        _write_per_file(pf, 0, "post_mlp", np.ones((2, 4), np.float32), "float32")
-        _write_per_file(pf, 1, "post_mlp", np.zeros((3, 4), np.float32), "float32")
+        _write_per_file(pf, 0, "post_block", np.ones((2, 4), np.float32), "float32")
+        _write_per_file(pf, 1, "post_block", np.zeros((3, 4), np.float32), "float32")
         got_pf = read_request(pf)
-        assert set(got_pf) == {(0, "post_mlp"), (1, "post_mlp")}
+        assert set(got_pf) == {(0, "post_block"), (1, "post_block")}
         # packed dir
         pk = tmp_path / "pk"
-        _write_packed(pk, [(0, "post_mlp", np.ones((2, 4), np.float32))], "float32")
+        _write_packed(pk, [(0, "post_block", np.ones((2, 4), np.float32))], "float32")
         got_pk = read_request(pk)
-        assert set(got_pk) == {(0, "post_mlp")}
+        assert set(got_pk) == {(0, "post_block")}
 
     def test_bfloat16_returns_uint16(self, tmp_path: pathlib.Path) -> None:
         # bf16 is stored as raw uint16; the reader returns it as uint16.
         raw = np.array([1, 2, 3, 4], dtype=np.uint16).reshape(2, 2)
-        _write_per_file(tmp_path / "bf", 0, "post_mlp", raw, "bfloat16")
-        entry = read_per_file(tmp_path / "bf" / "0_post_mlp.bin")
+        _write_per_file(tmp_path / "bf", 0, "post_block", raw, "bfloat16")
+        entry = read_per_file(tmp_path / "bf" / "0_post_block.bin")
         assert entry.dtype == "bfloat16"
         assert entry.array.dtype == np.uint16
         np.testing.assert_array_equal(entry.array, raw)
@@ -156,7 +156,7 @@ def test_bfloat16_returns_uint16(self, tmp_path: pathlib.Path) -> None:
     def test_truncated_packed_raises(self, tmp_path: pathlib.Path) -> None:
         arr = np.random.randn(4, 8).astype(np.float32)
         d = tmp_path / "trunc"
-        _write_packed(d, [(0, "post_mlp", arr)], "float32")
+        _write_packed(d, [(0, "post_block", arr)], "float32")
         # Corrupt: truncate the bin so the entry's bytes are missing.
         (d / PACKED_BIN_NAME).write_bytes(b"\x00\x00")
         try:
@@ -276,70 +276,70 @@ def test_packed_round_trip(self, tmp_path: pathlib.Path) -> None:
         req = "req-pk"
         c = _consumer(tmp_path)
         try:
-            _register_packed(c, req, {"post_mlp": [0, 2]})
-            # (0,post_mlp) spans 2 steps; (2,post_mlp) one step. Submit
+            _register_packed(c, req, {"post_block": [0, 2]})
+            # (0,post_block) spans 2 steps; (2,post_block) one step. Submit
             # interleaved across keys to exercise per-chunk indexing.
             a0 = torch.randn(2, 8, dtype=torch.float32)
             a1 = torch.randn(3, 8, dtype=torch.float32)
             b0 = torch.randn(1, 8, dtype=torch.float32)
-            c.submit_chunk(_chunk(req, 0, "post_mlp", a0, 0))
-            c.submit_chunk(_chunk(req, 2, "post_mlp", b0, 0))
-            c.submit_chunk(_chunk(req, 0, "post_mlp", a1, 1))
+            c.submit_chunk(_chunk(req, 0, "post_block", a0, 0))
+            c.submit_chunk(_chunk(req, 2, "post_block", b0, 0))
+            c.submit_chunk(_chunk(req, 0, "post_block", a1, 1))
             for layer in (0, 2):
-                c.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c.submit_finalize(_finalize(req, layer, "post_block"))
 
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
 
             req_dir = tmp_path / "t" / req
             assert (req_dir / PACKED_BIN_NAME).exists()
             assert (req_dir / PACKED_INDEX_NAME).exists()
-            assert not list(req_dir.glob("*_post_mlp.bin")), "no per-file bins"
+            assert not list(req_dir.glob("*_post_block.bin")), "no per-file bins"
 
             got = read_request(req_dir)
-            assert set(got) == {(0, "post_mlp"), (2, "post_mlp")}
+            assert set(got) == {(0, "post_block"), (2, "post_block")}
             np.testing.assert_array_equal(
-                got[(0, "post_mlp")].array, torch.cat([a0, a1]).numpy()
+                got[(0, "post_block")].array, torch.cat([a0, a1]).numpy()
             )
-            np.testing.assert_array_equal(got[(2, "post_mlp")].array, b0.numpy())
+            np.testing.assert_array_equal(got[(2, "post_block")].array, b0.numpy())
         finally:
             c.shutdown(timeout=5.0)
 
     def test_submit_chunk_batch_round_trip(self, tmp_path: pathlib.Path) -> None:
         # Batched submit: a step's worth of (layer) chunks handed over in
         # one call must produce the same packed file as per-chunk submits.
-        # Two steps batched; (0,post_mlp) spans both, (2,post_mlp) only
+        # Two steps batched; (0,post_block) spans both, (2,post_block) only
         # step 0 — concatenation order must follow submission order.
         req = "req-batch"
         c = _consumer(tmp_path)
         try:
-            _register_packed(c, req, {"post_mlp": [0, 2]})
+            _register_packed(c, req, {"post_block": [0, 2]})
             a0 = torch.randn(2, 8, dtype=torch.float32)
             b0 = torch.randn(1, 8, dtype=torch.float32)
             a1 = torch.randn(3, 8, dtype=torch.float32)
             # step 0: both layers, in one batch
             c.submit_chunk_batch(
-                [_chunk(req, 0, "post_mlp", a0, 0), _chunk(req, 2, "post_mlp", b0, 0)]
+                [_chunk(req, 0, "post_block", a0, 0), _chunk(req, 2, "post_block", b0, 0)]
             )
             # step 1: only layer 0
-            c.submit_chunk_batch([_chunk(req, 0, "post_mlp", a1, 1)])
+            c.submit_chunk_batch([_chunk(req, 0, "post_block", a1, 1)])
             for layer in (0, 2):
-                c.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c.submit_finalize(_finalize(req, layer, "post_block"))
 
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
 
             req_dir = tmp_path / "t" / req
             # One packed file for the whole request, no per-file bins.
             assert (req_dir / PACKED_BIN_NAME).exists()
-            assert not list(req_dir.glob("*_post_mlp.bin"))
+            assert not list(req_dir.glob("*_post_block.bin"))
 
             got = read_request(req_dir)
-            assert set(got) == {(0, "post_mlp"), (2, "post_mlp")}
+            assert set(got) == {(0, "post_block"), (2, "post_block")}
             np.testing.assert_array_equal(
-                got[(0, "post_mlp")].array, torch.cat([a0, a1]).numpy()
+                got[(0, "post_block")].array, torch.cat([a0, a1]).numpy()
             )
-            np.testing.assert_array_equal(got[(2, "post_mlp")].array, b0.numpy())
+            np.testing.assert_array_equal(got[(2, "post_block")].array, b0.numpy())
         finally:
             c.shutdown(timeout=5.0)
 
@@ -352,9 +352,9 @@ def test_batch_matches_per_chunk_bytes(self, tmp_path: pathlib.Path) -> None:
         def run(req: str, batched: bool) -> bytes:
             c = _consumer(tmp_path)
             try:
-                _register_packed(c, req, {"post_mlp": layers})
+                _register_packed(c, req, {"post_block": layers})
                 step_chunks = [
-                    _chunk(req, layer, "post_mlp", tensors[layer], 0)
+                    _chunk(req, layer, "post_block", tensors[layer], 0)
                     for layer in layers
                 ]
                 if batched:
@@ -363,8 +363,8 @@ def run(req: str, batched: bool) -> bytes:
                     for ch in step_chunks:
                         c.submit_chunk(ch)
                 for layer in layers:
-                    c.submit_finalize(_finalize(req, layer, "post_mlp"))
-                key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+                    c.submit_finalize(_finalize(req, layer, "post_block"))
+                key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
                 assert _wait(c, key0).status == "ok"
                 return (tmp_path / "t" / req / PACKED_BIN_NAME).read_bytes()
             finally:
@@ -378,19 +378,19 @@ def test_finalize_aggregation_waits_for_all_keys(
         req = "req-agg"
         c = _consumer(tmp_path)
         try:
-            _register_packed(c, req, {"post_mlp": [0, 1]})
-            c.submit_chunk(_chunk(req, 0, "post_mlp", torch.randn(2, 8), 0))
-            c.submit_chunk(_chunk(req, 1, "post_mlp", torch.randn(2, 8), 0))
+            _register_packed(c, req, {"post_block": [0, 1]})
+            c.submit_chunk(_chunk(req, 0, "post_block", torch.randn(2, 8), 0))
+            c.submit_chunk(_chunk(req, 1, "post_block", torch.randn(2, 8), 0))
             # Finalize only the first key — packed file must NOT publish.
-            c.submit_finalize(_finalize(req, 0, "post_mlp"))
+            c.submit_finalize(_finalize(req, 0, "post_block"))
             time.sleep(0.2)
             req_dir = tmp_path / "t" / req
             assert not (req_dir / PACKED_INDEX_NAME).exists(), (
                 "packed index published before all keys finalized"
             )
             # Finalize the last expected key — now it publishes.
-            c.submit_finalize(_finalize(req, 1, "post_mlp"))
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+            c.submit_finalize(_finalize(req, 1, "post_block"))
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
             assert (req_dir / PACKED_INDEX_NAME).exists()
         finally:
@@ -402,14 +402,14 @@ def test_zero_chunk_key(self, tmp_path: pathlib.Path) -> None:
         req = "req-zero"
         c = _consumer(tmp_path)
         try:
-            _register_packed(c, req, {"post_mlp": [0, 1]})
-            c.submit_chunk(_chunk(req, 0, "post_mlp", torch.randn(2, 8), 0))
+            _register_packed(c, req, {"post_block": [0, 1]})
+            c.submit_chunk(_chunk(req, 0, "post_block", torch.randn(2, 8), 0))
             for layer in (0, 1):  # key 1 had no chunk
-                c.submit_finalize(_finalize(req, layer, "post_mlp"))
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+                c.submit_finalize(_finalize(req, layer, "post_block"))
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
             got = read_request(tmp_path / "t" / req)
-            assert set(got) == {(0, "post_mlp")}
+            assert set(got) == {(0, "post_block")}
         finally:
             c.shutdown(timeout=5.0)
 
@@ -421,19 +421,19 @@ def test_per_file_default_unchanged(self, tmp_path: pathlib.Path) -> None:
             raw = FilesystemCaptureRequest(
                 request_id=req,
                 tag="t",
-                hooks={"post_mlp": [0]},
+                hooks={"post_block": [0]},
                 positions="last_prompt",
             )
             c.validate_client_spec(raw, _ctx(req))
             t0 = torch.randn(2, 8, dtype=torch.float32)
-            c.submit_chunk(_chunk(req, 0, "post_mlp", t0, 0))
-            c.submit_finalize(_finalize(req, 0, "post_mlp"))
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+            c.submit_chunk(_chunk(req, 0, "post_block", t0, 0))
+            c.submit_finalize(_finalize(req, 0, "post_block"))
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
             req_dir = tmp_path / "t" / req
-            assert (req_dir / "0_post_mlp.bin").exists()
+            assert (req_dir / "0_post_block.bin").exists()
             assert not (req_dir / PACKED_INDEX_NAME).exists()
-            entry = read_per_file(req_dir / "0_post_mlp.bin")
+            entry = read_per_file(req_dir / "0_post_block.bin")
             np.testing.assert_array_equal(entry.array, t0.numpy())
             assert entry.dtype == "float32"  # sidecar now self-describing
         finally:
@@ -451,21 +451,21 @@ def test_dict_spec_layout_packed(self, tmp_path: pathlib.Path) -> None:
                 {
                     "request_id": req,
                     "tag": "t",
-                    "hooks": {"post_mlp": [0, 1]},
+                    "hooks": {"post_block": [0, 1]},
                     "positions": "last_prompt",
                     "layout": "packed",
                 },
                 _ctx(req),
             )
             for layer in (0, 1):
-                c.submit_chunk(_chunk(req, layer, "post_mlp", torch.randn(2, 8), 0))
+                c.submit_chunk(_chunk(req, layer, "post_block", torch.randn(2, 8), 0))
             for layer in (0, 1):
-                c.submit_finalize(_finalize(req, layer, "post_mlp"))
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+                c.submit_finalize(_finalize(req, layer, "post_block"))
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
             req_dir = tmp_path / "t" / req
             assert (req_dir / PACKED_INDEX_NAME).exists()
-            assert set(read_request(req_dir)) == {(0, "post_mlp"), (1, "post_mlp")}
+            assert set(read_request(req_dir)) == {(0, "post_block"), (1, "post_block")}
         finally:
             c.shutdown(timeout=5.0)
 
@@ -478,17 +478,17 @@ def test_dict_spec_defaults_per_file(self, tmp_path: pathlib.Path) -> None:
                 {
                     "request_id": req,
                     "tag": "t",
-                    "hooks": {"post_mlp": [0]},
+                    "hooks": {"post_block": [0]},
                     "positions": "last_prompt",
                 },
                 _ctx(req),
             )
-            c.submit_chunk(_chunk(req, 0, "post_mlp", torch.randn(2, 8), 0))
-            c.submit_finalize(_finalize(req, 0, "post_mlp"))
-            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_mlp")
+            c.submit_chunk(_chunk(req, 0, "post_block", torch.randn(2, 8), 0))
+            c.submit_finalize(_finalize(req, 0, "post_block"))
+            key0: CaptureKey = (VllmInternalRequestId(req), 0, "post_block")
             assert _wait(c, key0).status == "ok"
             req_dir = tmp_path / "t" / req
-            assert (req_dir / "0_post_mlp.bin").exists()
+            assert (req_dir / "0_post_block.bin").exists()
             assert not (req_dir / PACKED_INDEX_NAME).exists()
         finally:
             c.shutdown(timeout=5.0)
@@ -507,7 +507,7 @@ def test_invalid_layout_rejected(self, tmp_path: pathlib.Path) -> None:
                     {
                         "request_id": req,
                         "tag": "t",
-                        "hooks": {"post_mlp": [0]},
+                        "hooks": {"post_block": [0]},
                         "positions": "last_prompt",
                         "layout": "bogus",
                     },
@@ -539,7 +539,7 @@ def test_two_stage_split_and_merge(self, tmp_path: pathlib.Path) -> None:
         # back into the request's full layer set.
         req = "req-pp"
         total = 4
-        hooks = {"post_mlp": [0, 1, 2, 3]}
+        hooks = {"post_block": [0, 1, 2, 3]}
         tensors = {layer: torch.randn(2, 8, dtype=torch.float32) for layer in range(4)}
         c0 = _pp_consumer(tmp_path, pp_size=2, pp_rank=0, total_layers=total)
         c1 = _pp_consumer(tmp_path, pp_size=2, pp_rank=1, total_layers=total)
@@ -548,16 +548,16 @@ def test_two_stage_split_and_merge(self, tmp_path: pathlib.Path) -> None:
             c1.validate_client_spec(_pp_raw(req, hooks), _ctx(req))
             # The manager only feeds each stage its owned layers.
             for layer in (0, 1):
-                c0.submit_chunk(_chunk(req, layer, "post_mlp", tensors[layer], 0))
+                c0.submit_chunk(_chunk(req, layer, "post_block", tensors[layer], 0))
             for layer in (0, 1):
-                c0.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c0.submit_finalize(_finalize(req, layer, "post_block"))
             for layer in (2, 3):
-                c1.submit_chunk(_chunk(req, layer, "post_mlp", tensors[layer], 0))
+                c1.submit_chunk(_chunk(req, layer, "post_block", tensors[layer], 0))
             for layer in (2, 3):
-                c1.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c1.submit_finalize(_finalize(req, layer, "post_block"))
 
-            assert _wait(c0, (VllmInternalRequestId(req), 0, "post_mlp")).status == "ok"
-            assert _wait(c1, (VllmInternalRequestId(req), 2, "post_mlp")).status == "ok"
+            assert _wait(c0, (VllmInternalRequestId(req), 0, "post_block")).status == "ok"
+            assert _wait(c1, (VllmInternalRequestId(req), 2, "post_block")).status == "ok"
 
             req_dir = tmp_path / "t" / req
             # Per-stage files exist; the pp-agnostic packed.json does not.
@@ -569,10 +569,10 @@ def test_two_stage_split_and_merge(self, tmp_path: pathlib.Path) -> None:
             assert not (req_dir / PACKED_BIN_NAME).exists()
 
             got = read_request(req_dir)
-            assert set(got) == {(layer, "post_mlp") for layer in range(4)}
+            assert set(got) == {(layer, "post_block") for layer in range(4)}
             for layer in range(4):
                 np.testing.assert_array_equal(
-                    got[(layer, "post_mlp")].array, tensors[layer].numpy()
+                    got[(layer, "post_block")].array, tensors[layer].numpy()
                 )
         finally:
             c0.shutdown(timeout=5.0)
@@ -588,7 +588,7 @@ def test_stage_owning_no_layers_writes_nothing(
         c1 = _pp_consumer(tmp_path, pp_size=2, pp_rank=1, total_layers=4)
         try:
             spec = c1.validate_client_spec(
-                _pp_raw(req, {"post_mlp": [0, 1]}), _ctx(req)
+                _pp_raw(req, {"post_block": [0, 1]}), _ctx(req)
             )
             assert isinstance(spec, CaptureSpec)
             # No accumulation state for a stage that owns none of the layers.
@@ -604,8 +604,8 @@ def test_expected_keys_filtered_to_local_slice(
         req = "req-pp-expected"
         c1 = _pp_consumer(tmp_path, pp_size=2, pp_rank=1, total_layers=4)
         try:
-            c1.validate_client_spec(_pp_raw(req, {"post_mlp": [0, 1, 2, 3]}), _ctx(req))
+            c1.validate_client_spec(_pp_raw(req, {"post_block": [0, 1, 2, 3]}), _ctx(req))
             state = c1._packed_states[req]
-            assert state.expected_keys == {(2, "post_mlp"), (3, "post_mlp")}
+            assert state.expected_keys == {(2, "post_block"), (3, "post_block")}
         finally:
             c1.shutdown(timeout=5.0)
diff --git a/tests/v1/capture/consumers/filesystem/test_sharded.py b/tests/v1/capture/consumers/filesystem/test_sharded.py
index c724a04b0b0d..ae0b548be513 100644
--- a/tests/v1/capture/consumers/filesystem/test_sharded.py
+++ b/tests/v1/capture/consumers/filesystem/test_sharded.py
@@ -86,17 +86,17 @@ def test_round_trip_multi_request(self, tmp_path: pathlib.Path) -> None:
             0,
             0,
             [
-                ("reqA", 0, "post_mlp", a),
-                ("reqB", 0, "post_mlp", b),
-                ("reqA", 1, "post_mlp", c),
+                ("reqA", 0, "post_block", a),
+                ("reqB", 0, "post_block", b),
+                ("reqA", 1, "post_block", c),
             ],
             "float32",
         )
         got = read_sharded(tag)
         assert set(got) == {"reqA", "reqB"}
-        np.testing.assert_array_equal(got["reqA"][(0, "post_mlp")].array, a)
-        np.testing.assert_array_equal(got["reqA"][(1, "post_mlp")].array, c)
-        np.testing.assert_array_equal(got["reqB"][(0, "post_mlp")].array, b)
+        np.testing.assert_array_equal(got["reqA"][(0, "post_block")].array, a)
+        np.testing.assert_array_equal(got["reqA"][(1, "post_block")].array, c)
+        np.testing.assert_array_equal(got["reqB"][(0, "post_block")].array, b)
 
     def test_request_spanning_two_shards(self, tmp_path: pathlib.Path) -> None:
         # reqA L0 has rows in seq 0 then seq 1 (sealed mid-request); reader
@@ -104,11 +104,11 @@ def test_request_spanning_two_shards(self, tmp_path: pathlib.Path) -> None:
         tag = tmp_path / "t"
         a0 = np.arange(2 * 8, dtype=np.float32).reshape(2, 8)
         a1 = (np.arange(3 * 8, dtype=np.float32) + 100).reshape(3, 8)
-        _write_shard(tag, 0, 0, [("reqA", 0, "post_mlp", a0)], "float32")
-        _write_shard(tag, 0, 1, [("reqA", 0, "post_mlp", a1)], "float32")
+        _write_shard(tag, 0, 0, [("reqA", 0, "post_block", a0)], "float32")
+        _write_shard(tag, 0, 1, [("reqA", 0, "post_block", a1)], "float32")
         got = read_sharded(tag)
         np.testing.assert_array_equal(
-            got["reqA"][(0, "post_mlp")].array, np.concatenate([a0, a1])
+            got["reqA"][(0, "post_block")].array, np.concatenate([a0, a1])
         )
 
 
@@ -182,7 +182,7 @@ def test_many_requests_one_shard_round_trip(self, tmp_path: pathlib.Path) -> Non
         expected: dict[str, dict[tuple[int, str], np.ndarray]] = {}
         try:
             for rid in reqs:
-                _register(c, rid, {"post_mlp": [0, 1]})
+                _register(c, rid, {"post_block": [0, 1]})
             # Interleave chunks across requests and layers; 2 steps each.
             tensors: dict = {}
             for step in range(2):
@@ -190,15 +190,15 @@ def test_many_requests_one_shard_round_trip(self, tmp_path: pathlib.Path) -> Non
                     for layer in (0, 1):
                         t = torch.randn(2, 8, dtype=torch.float32)
                         tensors.setdefault((rid, layer), []).append(t)
-                        c.submit_chunk(_chunk(rid, layer, "post_mlp", t, step))
+                        c.submit_chunk(_chunk(rid, layer, "post_block", t, step))
             for rid in reqs:
                 for layer in (0, 1):
-                    c.submit_finalize(_finalize(rid, layer, "post_mlp"))
-                    expected.setdefault(rid, {})[(layer, "post_mlp")] = torch.cat(
+                    c.submit_finalize(_finalize(rid, layer, "post_block"))
+                    expected.setdefault(rid, {})[(layer, "post_block")] = torch.cat(
                         tensors[(rid, layer)]
                     ).numpy()
             # results are ok before seal (data captured, readable after seal)
-            r = _wait(c, (VllmInternalRequestId("req0"), 0, "post_mlp"))
+            r = _wait(c, (VllmInternalRequestId("req0"), 0, "post_block"))
             assert r is not None and r.status == "ok"
             assert r.payload and all("shard-" in p for p in r.payload)
         finally:
@@ -209,8 +209,8 @@ def test_many_requests_one_shard_round_trip(self, tmp_path: pathlib.Path) -> Non
         for rid in reqs:
             for layer in (0, 1):
                 np.testing.assert_array_equal(
-                    got[rid][(layer, "post_mlp")].array,
-                    expected[rid][(layer, "post_mlp")],
+                    got[rid][(layer, "post_block")].array,
+                    expected[rid][(layer, "post_block")],
                 )
 
     def test_size_based_sealing_rotates(self, tmp_path: pathlib.Path) -> None:
@@ -218,14 +218,14 @@ def test_size_based_sealing_rotates(self, tmp_path: pathlib.Path) -> None:
         # Each row is 8*4=32 bytes; cap at 200 bytes -> seal every ~6 rows.
         c = _consumer(tmp_path, num_shards=1, shard_max_bytes=200)
         try:
-            _register(c, "r", {"post_mlp": [0]})
+            _register(c, "r", {"post_block": [0]})
             tensors = []
             for step in range(20):
                 t = torch.randn(1, 8, dtype=torch.float32)
                 tensors.append(t)
-                c.submit_chunk(_chunk("r", 0, "post_mlp", t, step))
-            c.submit_finalize(_finalize("r", 0, "post_mlp"))
-            assert _wait(c, (VllmInternalRequestId("r"), 0, "post_mlp")).status == "ok"
+                c.submit_chunk(_chunk("r", 0, "post_block", t, step))
+            c.submit_finalize(_finalize("r", 0, "post_block"))
+            assert _wait(c, (VllmInternalRequestId("r"), 0, "post_block")).status == "ok"
         finally:
             c.shutdown(timeout=5.0)
         tag = tmp_path / "t"
@@ -233,7 +233,7 @@ def test_size_based_sealing_rotates(self, tmp_path: pathlib.Path) -> None:
         assert len(shards) >= 2, f"expected rotation into multiple shards, got {shards}"
         got = read_sharded(tag)
         np.testing.assert_array_equal(
-            got["r"][(0, "post_mlp")].array, torch.cat(tensors).numpy()
+            got["r"][(0, "post_block")].array, torch.cat(tensors).numpy()
         )
 
 
@@ -280,7 +280,7 @@ def test_two_stage_shards_merge(self, tmp_path: pathlib.Path) -> None:
         # stage seals its own shard-pp{rank} files; read_sharded merges by
         # request across both, recovering the full layer set.
         req = "req"
-        hooks = {"post_mlp": [0, 1, 2, 3]}
+        hooks = {"post_block": [0, 1, 2, 3]}
         tensors = {layer: torch.randn(2, 8, dtype=torch.float32) for layer in range(4)}
         c0 = _pp_consumer(tmp_path, pp_rank=0, num_shards=1)
         c1 = _pp_consumer(tmp_path, pp_rank=1, num_shards=1)
@@ -288,11 +288,11 @@ def test_two_stage_shards_merge(self, tmp_path: pathlib.Path) -> None:
             _register(c0, req, hooks)
             _register(c1, req, hooks)
             for layer in (0, 1):
-                c0.submit_chunk(_chunk(req, layer, "post_mlp", tensors[layer], 0))
-                c0.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c0.submit_chunk(_chunk(req, layer, "post_block", tensors[layer], 0))
+                c0.submit_finalize(_finalize(req, layer, "post_block"))
             for layer in (2, 3):
-                c1.submit_chunk(_chunk(req, layer, "post_mlp", tensors[layer], 0))
-                c1.submit_finalize(_finalize(req, layer, "post_mlp"))
+                c1.submit_chunk(_chunk(req, layer, "post_block", tensors[layer], 0))
+                c1.submit_finalize(_finalize(req, layer, "post_block"))
         finally:
             c0.shutdown(timeout=5.0)  # seal each stage's open shard
             c1.shutdown(timeout=5.0)
@@ -303,10 +303,10 @@ def test_two_stage_shards_merge(self, tmp_path: pathlib.Path) -> None:
         assert sorted(p.name for p in tag.glob("shard-pp01-*.bin"))
         got = read_sharded(tag)
         assert set(got) == {req}
-        assert set(got[req]) == {(layer, "post_mlp") for layer in range(4)}
+        assert set(got[req]) == {(layer, "post_block") for layer in range(4)}
         for layer in range(4):
             np.testing.assert_array_equal(
-                got[req][(layer, "post_mlp")].array, tensors[layer].numpy()
+                got[req][(layer, "post_block")].array, tensors[layer].numpy()
             )
 
     def test_stage_owning_no_layers_creates_no_state(
@@ -315,7 +315,7 @@ def test_stage_owning_no_layers_creates_no_state(
         req = "req-skip"
         c1 = _pp_consumer(tmp_path, pp_rank=1, num_shards=1)
         try:
-            _register(c1, req, {"post_mlp": [0, 1]})  # all on stage 0
+            _register(c1, req, {"post_block": [0, 1]})  # all on stage 0
             assert req not in c1._sharded_requests
         finally:
             c1.shutdown(timeout=5.0)
diff --git a/tests/v1/capture/consumers/test_logging.py b/tests/v1/capture/consumers/test_logging.py
index cae5c094657d..811ebab33eb5 100644
--- a/tests/v1/capture/consumers/test_logging.py
+++ b/tests/v1/capture/consumers/test_logging.py
@@ -26,7 +26,7 @@
 _LOGGER_NAME = "vllm.capture.logging"
 
 
-def _key(req_id: str = "req-1", layer: int = 0, hook: str = "post_mlp") -> CaptureKey:
+def _key(req_id: str = "req-1", layer: int = 0, hook: str = "post_block") -> CaptureKey:
     return (VllmInternalRequestId(req_id), layer, hook)
 
 
@@ -60,7 +60,7 @@ def test_construction():
     """LoggingConsumer constructs without error when given valid params."""
     consumer = LoggingConsumer(
         _MOCK_CONFIG,
-        {"hooks": {"post_mlp": [0]}, "positions": "last_prompt"},
+        {"hooks": {"post_block": [0]}, "positions": "last_prompt"},
     )
     assert consumer is not None
 
@@ -75,11 +75,11 @@ def test_global_capture_spec_returns_configured_spec():
     and positions."""
     consumer = LoggingConsumer(
         _MOCK_CONFIG,
-        {"hooks": {"post_mlp": [0, 1], "pre_attn": [2]}, "positions": "all"},
+        {"hooks": {"post_block": [0, 1], "pre_attn": [2]}, "positions": "all"},
     )
     spec = consumer.global_capture_spec()
     assert isinstance(spec, CaptureSpec)
-    assert spec.hooks == {"post_mlp": [0, 1], "pre_attn": [2]}
+    assert spec.hooks == {"post_block": [0, 1], "pre_attn": [2]}
     assert spec.positions == "all"
 
 
@@ -93,7 +93,7 @@ def test_on_capture_logs_key_rows_dtype(caplog: pytest.LogCaptureFixture):
     dtype."""
     consumer = LoggingConsumer(
         _MOCK_CONFIG,
-        {"hooks": {"post_mlp": [0]}},
+        {"hooks": {"post_block": [0]}},
     )
     key = _key()
     tensor = torch.randn(5, 16)
@@ -117,7 +117,7 @@ def test_custom_level_debug(caplog: pytest.LogCaptureFixture):
     """Construct with level='DEBUG', verify log message at DEBUG level."""
     consumer = LoggingConsumer(
         _MOCK_CONFIG,
-        {"hooks": {"post_mlp": [0]}, "level": "DEBUG"},
+        {"hooks": {"post_block": [0]}, "level": "DEBUG"},
     )
     key = _key()
     tensor = torch.randn(3, 8)
@@ -138,7 +138,7 @@ def test_default_positions():
     """Construct without positions param, verify default is 'last_prompt'."""
     consumer = LoggingConsumer(
         _MOCK_CONFIG,
-        {"hooks": {"post_mlp": [0]}},
+        {"hooks": {"post_block": [0]}},
     )
     spec = consumer.global_capture_spec()
     assert spec.positions == "last_prompt"
diff --git a/tests/v1/capture/test_consumer_base.py b/tests/v1/capture/test_consumer_base.py
index 63e447629fb4..4ae2cd5daf04 100644
--- a/tests/v1/capture/test_consumer_base.py
+++ b/tests/v1/capture/test_consumer_base.py
@@ -31,7 +31,7 @@
 # ---------------------------------------------------------------------------
 
 
-def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_mlp") -> CaptureKey:
+def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_block") -> CaptureKey:
     return (VllmInternalRequestId(req_id), layer, hook)
 
 
@@ -165,7 +165,7 @@ def test_multiple_keys_finalize_independently():
     adapter = _BatchedAdapter(consumer)
 
     key_a = _key("req-a", layer=1, hook="pre_attn")
-    key_b = _key("req-b", layer=5, hook="post_mlp")
+    key_b = _key("req-b", layer=5, hook="post_block")
 
     adapter.submit_chunk(
         CaptureChunk(
@@ -330,7 +330,7 @@ def __init__(self) -> None:
         self.sums: dict[CaptureKey, float] = {}
 
     def global_capture_spec(self) -> CaptureSpec | None:
-        return CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        return CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
 
     def on_capture(
         self,
@@ -347,7 +347,7 @@ def test_hello_world_consumer_through_batched_adapter():
 
     spec = consumer.global_capture_spec()
     assert spec is not None
-    assert spec.hooks == {"post_mlp": [0]}
+    assert spec.hooks == {"post_block": [0]}
     assert spec.positions == "last_prompt"
 
     key = _key()
diff --git a/tests/v1/capture/test_driver_bridge.py b/tests/v1/capture/test_driver_bridge.py
index 6b468a140ef6..7a15f08af024 100644
--- a/tests/v1/capture/test_driver_bridge.py
+++ b/tests/v1/capture/test_driver_bridge.py
@@ -40,7 +40,7 @@
 # ---------------------------------------------------------------------------
 
 
-def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_mlp") -> CaptureKey:
+def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_block") -> CaptureKey:
     return (VllmInternalRequestId(req_id), layer, hook)
 
 
@@ -158,7 +158,7 @@ def test_multiple_keys_finalize_independently():
     shim = _DriverQueueShim(event_q, result_q, timeout=5.0)
 
     key_a = _key("req-a", layer=1, hook="pre_attn")
-    key_b = _key("req-b", layer=5, hook="post_mlp")
+    key_b = _key("req-b", layer=5, hook="post_block")
 
     shim.submit_chunk(
         CaptureChunk(
diff --git a/tests/v1/capture/test_manager.py b/tests/v1/capture/test_manager.py
index 251c587058ef..5ab123c6ca69 100644
--- a/tests/v1/capture/test_manager.py
+++ b/tests/v1/capture/test_manager.py
@@ -60,7 +60,7 @@ def _make_manager(
     if specs is None:
         specs = (
             CaptureSpec(
-                hooks={"post_mlp": [0, 1]},
+                hooks={"post_block": [0, 1]},
                 positions="last_prompt",
             ),
         ) * len(sinks)
@@ -134,10 +134,10 @@ def test_register_build_dispatch_finalize(self):
         )
         plan = mgr.build_step_plan(view)
 
-        # The spec asks for post_mlp at layers [0, 1] and "last_prompt"
+        # The spec asks for post_block at layers [0, 1] and "last_prompt"
         # which is position 9 for a 10-token prompt.
-        assert (0, "post_mlp") in plan.gather_indices
-        assert (1, "post_mlp") in plan.gather_indices
+        assert (0, "post_block") in plan.gather_indices
+        assert (1, "post_block") in plan.gather_indices
         assert len(plan.entries) == 2  # one entry per layer
 
         for entry in plan.entries:
@@ -190,7 +190,7 @@ def _make_buffer_manager(
         sinks = (_make_sink(),)
     if specs is None:
         specs = (
-            CaptureSpec(hooks={"post_mlp": [0, 1]}, positions="last_prompt"),
+            CaptureSpec(hooks={"post_block": [0, 1]}, positions="last_prompt"),
         ) * len(sinks)
     mgr = CaptureManager(
         consumers=sinks,
@@ -206,7 +206,7 @@ def _make_buffer_manager(
 class TestGlobalSpecBufferPath:
     def test_buffers_allocated_for_global_keys(self):
         mgr, _ = _make_buffer_manager(max_num_tokens=16)
-        assert mgr._global_keys == frozenset({(0, "post_mlp"), (1, "post_mlp")})
+        assert mgr._global_keys == frozenset({(0, "post_block"), (1, "post_block")})
         for key in mgr._global_keys:
             buf = mgr._global_buffers[key]
             assert buf.shape == (16, HIDDEN_SIZE)
@@ -235,10 +235,10 @@ def test_build_step_plan_routes_global_keys_to_global_gather(self):
         plan = mgr.build_step_plan(view)
         # Global keys take the buffer path, not the dynamic in-hook gather.
         assert plan.gather_indices == {}
-        assert (0, "post_mlp") in plan.global_gather_indices
-        assert (1, "post_mlp") in plan.global_gather_indices
+        assert (0, "post_block") in plan.global_gather_indices
+        assert (1, "post_block") in plan.global_gather_indices
         # last_prompt of a 10-token prompt is absolute row 9.
-        assert plan.global_gather_indices[(0, "post_mlp")].tolist() == [9]
+        assert plan.global_gather_indices[(0, "post_block")].tolist() == [9]
         assert len(plan.entries) == 2
 
     def test_on_hook_copies_full_residual_into_buffer(self):
@@ -255,13 +255,13 @@ def test_on_hook_copies_full_residual_into_buffer(self):
         hidden = torch.arange(10 * HIDDEN_SIZE, dtype=MODEL_DTYPE).reshape(
             10, HIDDEN_SIZE
         )
-        mgr.on_hook(0, "post_mlp", hidden)
-        buf = mgr._global_buffers[(0, "post_mlp")]
+        mgr.on_hook(0, "post_block", hidden)
+        buf = mgr._global_buffers[(0, "post_block")]
         # The full residual is copied (fixed-shape, graph-safe), not gathered.
         torch.testing.assert_close(buf[:10], hidden)
         # on_hook must not populate scratch for global keys (host does that
         # post-forward in _materialize_global_keys).
-        assert (0, "post_mlp") not in mgr._step_plan.scratch_gpu
+        assert (0, "post_block") not in mgr._step_plan.scratch_gpu
 
     def test_materialize_dispatch_finalize_via_buffer(self):
         mgr, (sink,) = _make_buffer_manager(max_num_tokens=16)
@@ -279,8 +279,8 @@ def test_materialize_dispatch_finalize_via_buffer(self):
         hidden = torch.arange(10 * HIDDEN_SIZE, dtype=MODEL_DTYPE).reshape(
             10, HIDDEN_SIZE
         )
-        mgr.on_hook(0, "post_mlp", hidden)
-        mgr.on_hook(1, "post_mlp", hidden + 1000.0)
+        mgr.on_hook(0, "post_block", hidden)
+        mgr.on_hook(1, "post_block", hidden + 1000.0)
 
         mgr.dispatch_step_captures(plan)
         mgr._drain_dispatch_queue()
@@ -308,7 +308,7 @@ def test_global_and_client_keys_coexist(self):
         mgr, _ = _make_buffer_manager(
             sinks=(global_sink, client_sink),
             specs=(
-                CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt"),
+                CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt"),
                 None,  # consumer 1 has no global spec — client-driven
             ),
             max_num_tokens=16,
@@ -323,15 +323,15 @@ def test_global_and_client_keys_coexist(self):
         )
         plan = mgr.build_step_plan(view)
         # Global key on the buffer path; client key on the dynamic path.
-        assert (0, "post_mlp") in plan.global_gather_indices
+        assert (0, "post_block") in plan.global_gather_indices
         assert (2, "pre_attn") in plan.gather_indices
-        assert (0, "post_mlp") not in plan.gather_indices
+        assert (0, "post_block") not in plan.gather_indices
 
         hidden = torch.arange(10 * HIDDEN_SIZE, dtype=MODEL_DTYPE).reshape(
             10, HIDDEN_SIZE
         )
         # Global key: full-residual copy. Client key: dynamic gather (eager).
-        mgr.on_hook(0, "post_mlp", hidden)
+        mgr.on_hook(0, "post_block", hidden)
         mgr.on_hook(2, "pre_attn", hidden + 1000.0)
         # The client key's scratch was populated by the dynamic gather.
         assert (2, "pre_attn") in plan.scratch_gpu
@@ -353,7 +353,7 @@ def test_union_gather_both_dispatched(self):
         sink0 = _make_sink("sink0")
         sink1 = _make_sink("sink1")
         spec = CaptureSpec(
-            hooks={"post_mlp": [0]},
+            hooks={"post_block": [0]},
             positions="last_prompt",
         )
 
@@ -371,7 +371,7 @@ def test_union_gather_both_dispatched(self):
         )
         plan = mgr.build_step_plan(view)
 
-        # Union: only one entry for (layer=0, post_mlp, pos=9), but the
+        # Union: only one entry for (layer=0, post_block, pos=9), but the
         # consumer_mask should have bits 0 and 1 set.
         assert len(plan.entries) == 1
         entry = plan.entries[0]
@@ -388,8 +388,8 @@ def test_union_gather_both_dispatched(self):
     def test_different_layers_produce_separate_entries(self):
         sink0 = _make_sink("sink0")
         sink1 = _make_sink("sink1")
-        spec0 = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
-        spec1 = CaptureSpec(hooks={"post_mlp": [1]}, positions="last_prompt")
+        spec0 = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
+        spec1 = CaptureSpec(hooks={"post_block": [1]}, positions="last_prompt")
 
         mgr, _ = _make_manager(
             sinks=(sink0, sink1),
@@ -405,8 +405,8 @@ def test_different_layers_produce_separate_entries(self):
         )
         plan = mgr.build_step_plan(view)
 
-        # Two entries: (layer=0, post_mlp) for consumer 0,
-        # (layer=1, post_mlp) for consumer 1.
+        # Two entries: (layer=0, post_block) for consumer 0,
+        # (layer=1, post_block) for consumer 1.
         assert len(plan.entries) == 2
         masks = {e.layer: e.consumer_mask for e in plan.entries}
         assert masks[0] == 0b01  # only consumer 0
@@ -421,7 +421,7 @@ def test_different_layers_produce_separate_entries(self):
 class TestPerRequestClientSpec:
     def test_client_spec_overrides_global(self):
         sink = _make_sink()
-        global_spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        global_spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         client_spec = CaptureSpec(hooks={"pre_attn": [2]}, positions="all_prompt")
         mgr, _ = _make_manager(sinks=(sink,), specs=(global_spec,))
 
@@ -438,20 +438,20 @@ def test_client_spec_overrides_global(self):
 
         # Should use client spec: pre_attn at layer 2, all_prompt = [0..4].
         assert (2, "pre_attn") in plan.gather_indices
-        assert (0, "post_mlp") not in plan.gather_indices
+        assert (0, "post_block") not in plan.gather_indices
         assert len(plan.entries) == 5
 
     def test_client_spec_for_specific_consumer_only(self):
         """Only consumer 1 gets a client spec; consumer 0 uses global."""
         sink0 = _make_sink("sink0")
         sink1 = _make_sink("sink1")
-        global0 = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        global0 = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         mgr, _ = _make_manager(
             sinks=(sink0, sink1),
             specs=(global0, None),
         )
 
-        client1 = CaptureSpec(hooks={"post_mlp": [0]}, positions="all_prompt")
+        client1 = CaptureSpec(hooks={"post_block": [0]}, positions="all_prompt")
         mgr.register_request(
             "r1",
             client_specs={1: client1},
@@ -467,8 +467,8 @@ def test_client_spec_for_specific_consumer_only(self):
         plan = mgr.build_step_plan(view)
 
         # Consumer 0 wants position 4 (last_prompt), consumer 1 wants [0..4].
-        # The union at (layer=0, post_mlp) should be [0, 1, 2, 3, 4].
-        assert (0, "post_mlp") in plan.gather_indices
+        # The union at (layer=0, post_block) should be [0, 1, 2, 3, 4].
+        assert (0, "post_block") in plan.gather_indices
         assert len(plan.entries) == 5  # positions 0,1,2,3,4
 
         # Position 4 should have both consumers' bits.
@@ -493,7 +493,7 @@ def test_failing_submit_chunk_does_not_block_other_consumer(self):
         sink1 = _make_sink("sink1")
         sink0.submit_chunk.side_effect = RuntimeError("sink0 exploded")
 
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         mgr, _ = _make_manager(
             sinks=(sink0, sink1),
             specs=(spec, spec),
@@ -522,7 +522,7 @@ def test_failing_submit_finalize_does_not_block_other_consumer(self):
         sink1 = _make_sink("sink1")
         sink0.submit_finalize.side_effect = RuntimeError("finalize boom")
 
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         mgr, _ = _make_manager(
             sinks=(sink0, sink1),
             specs=(spec, spec),
@@ -568,10 +568,10 @@ class TestFinalizeResults:
     def test_returns_dict_keyed_by_consumer_index(self):
         sink0 = _make_sink("sink0")
         sink1 = _make_sink("sink1")
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
 
         # Make sink0 return a specific result.
-        expected_key = (VllmInternalRequestId("r1"), 0, "post_mlp")
+        expected_key = (VllmInternalRequestId("r1"), 0, "post_block")
         sink0.wait_for_result.return_value = CaptureResult(
             key=expected_key, status="ok", payload={"path": "/tmp/test"}
         )
@@ -600,11 +600,11 @@ def test_finalize_unknown_request_returns_empty(self):
     def test_finalize_aggregates_all_keys_and_preserves_payloads(self):
         sink = _make_sink("sink0")
         spec = CaptureSpec(
-            hooks={"post_mlp": [0, 1]},
+            hooks={"post_block": [0, 1]},
             positions="last_prompt",
         )
-        key0 = (VllmInternalRequestId("r1"), 0, "post_mlp")
-        key1 = (VllmInternalRequestId("r1"), 1, "post_mlp")
+        key0 = (VllmInternalRequestId("r1"), 0, "post_block")
+        key1 = (VllmInternalRequestId("r1"), 1, "post_block")
         payload0 = {"path": "/tmp/layer0"}
         payload1 = {"path": "/tmp/layer1"}
 
@@ -636,11 +636,11 @@ def _wait_for_result(key: CaptureKey, timeout: float) -> CaptureResult:
     def test_finalize_uses_worst_key_result(self):
         sink = _make_sink("sink0")
         spec = CaptureSpec(
-            hooks={"post_mlp": [0, 1]},
+            hooks={"post_block": [0, 1]},
             positions="last_prompt",
         )
-        key0 = (VllmInternalRequestId("r1"), 0, "post_mlp")
-        key1 = (VllmInternalRequestId("r1"), 1, "post_mlp")
+        key0 = (VllmInternalRequestId("r1"), 0, "post_block")
+        key1 = (VllmInternalRequestId("r1"), 1, "post_block")
 
         def _wait_for_result(key: CaptureKey, timeout: float) -> CaptureResult:
             if key == key0:
@@ -672,8 +672,8 @@ def _wait_for_result(key: CaptureKey, timeout: float) -> CaptureResult:
 
     def test_finalize_timeout_becomes_error(self):
         sink = _make_sink("sink0")
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
-        key = (VllmInternalRequestId("r1"), 0, "post_mlp")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
+        key = (VllmInternalRequestId("r1"), 0, "post_block")
         sink.wait_for_result.return_value = None
 
         mgr, _ = _make_manager(sinks=(sink,), specs=(spec,))
@@ -688,9 +688,9 @@ def test_finalize_timeout_becomes_error(self):
 
 class TestAggregateCaptureResults:
     def test_prefers_error_over_partial_error_over_ok(self):
-        key_ok = (VllmInternalRequestId("r1"), 0, "post_mlp")
-        key_partial = (VllmInternalRequestId("r1"), 1, "post_mlp")
-        key_error = (VllmInternalRequestId("r1"), 2, "post_mlp")
+        key_ok = (VllmInternalRequestId("r1"), 0, "post_block")
+        key_partial = (VllmInternalRequestId("r1"), 1, "post_block")
+        key_error = (VllmInternalRequestId("r1"), 2, "post_block")
 
         result = _aggregate_capture_results(
             [
@@ -720,7 +720,7 @@ def test_prefers_error_over_partial_error_over_ok(self):
         }
 
     def test_single_result_preserves_payload_shape(self):
-        key = (VllmInternalRequestId("r1"), 0, "post_mlp")
+        key = (VllmInternalRequestId("r1"), 0, "post_block")
         payload = ["/tmp/capture.bin"]
 
         result = _aggregate_capture_results(
@@ -767,7 +767,7 @@ def test_finalize_after_unregister_returns_empty(self):
 
 class TestPositionExpansion:
     def test_last_prompt(self):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
 
@@ -782,7 +782,7 @@ def test_last_prompt(self):
         assert positions == [9]
 
     def test_all_prompt(self):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="all_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="all_prompt")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=5)
 
@@ -797,7 +797,7 @@ def test_all_prompt(self):
         assert positions == [0, 1, 2, 3, 4]
 
     def test_all_generated(self):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="all_generated")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="all_generated")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=5)
 
@@ -824,7 +824,7 @@ def test_all_generated(self):
         assert positions == [5]
 
     def test_all(self):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="all")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="all")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=3)
 
@@ -851,7 +851,7 @@ def test_all(self):
         assert positions == [3]
 
     def test_explicit_list(self):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions=[2, 7])
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions=[2, 7])
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
 
@@ -874,7 +874,7 @@ def test_explicit_list(self):
 class TestStepWindowIntersection:
     def test_positions_outside_window_excluded(self):
         """Explicit list with some positions outside the current window."""
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions=[0, 5, 9])
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions=[0, 5, 9])
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
 
@@ -891,7 +891,7 @@ def test_positions_outside_window_excluded(self):
 
     def test_all_prompt_only_captures_scheduled_window(self):
         """all_prompt is [0..9] but window [0, 3) only captures 0,1,2."""
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="all_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="all_prompt")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
 
@@ -907,7 +907,7 @@ def test_all_prompt_only_captures_scheduled_window(self):
 
     def test_decode_step_window(self):
         """During decode, the window is [N, N+1) for one token."""
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="all")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="all")
         mgr, _ = _make_manager(specs=(spec,))
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=5)
 
@@ -1036,7 +1036,7 @@ def test_client_spec_out_of_range_raises(self):
             mgr.register_request(
                 "r1",
                 client_specs={
-                    99: CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+                    99: CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
                 },
                 num_prompt_tokens=10,
             )
@@ -1047,7 +1047,7 @@ def test_layer_out_of_range_raises(self):
             mgr.register_request(
                 "r1",
                 client_specs={
-                    0: CaptureSpec(hooks={"post_mlp": [999]}, positions="last_prompt")
+                    0: CaptureSpec(hooks={"post_block": [999]}, positions="last_prompt")
                 },
                 num_prompt_tokens=10,
             )
@@ -1094,7 +1094,7 @@ def _captured_layers(mgr: CaptureManager) -> set[int]:
 
 class TestLocalLayerRangeFiltering:
     def test_first_stage_keeps_only_its_layers(self):
-        spec = CaptureSpec(hooks={"post_mlp": [2, 6]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [2, 6]}, positions="last_prompt")
         mgr, sink = _make_pp_manager((0, 4), spec)
         assert _captured_layers(mgr) == {2}
         # Finalize touches only the in-range layer (layer 2), not layer 6.
@@ -1105,17 +1105,17 @@ def test_first_stage_keeps_only_its_layers(self):
         assert finalized_layers == {2}
 
     def test_second_stage_keeps_only_its_layers(self):
-        spec = CaptureSpec(hooks={"post_mlp": [2, 6]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [2, 6]}, positions="last_prompt")
         mgr, _ = _make_pp_manager((4, 8), spec)
         assert _captured_layers(mgr) == {6}
 
     def test_none_range_keeps_all_layers(self):
-        spec = CaptureSpec(hooks={"post_mlp": [2, 6]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [2, 6]}, positions="last_prompt")
         mgr, _ = _make_pp_manager(None, spec)
         assert _captured_layers(mgr) == {2, 6}
 
     def test_all_layers_out_of_local_range_inactive(self):
-        spec = CaptureSpec(hooks={"post_mlp": [6, 7]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [6, 7]}, positions="last_prompt")
         mgr, _ = _make_pp_manager((0, 4), spec)
         mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
         # No requested layer lives on this stage → request not registered.
@@ -1125,7 +1125,7 @@ def test_all_layers_out_of_local_range_inactive(self):
     def test_out_of_global_range_still_raises_per_stage(self):
         # A genuinely out-of-range layer is rejected even though it is also
         # outside this stage's local slice.
-        spec = CaptureSpec(hooks={"post_mlp": [100]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [100]}, positions="last_prompt")
         mgr, _ = _make_pp_manager((0, 4), spec)
         with pytest.raises(ValueError, match="out of range"):
             mgr.register_request("r1", client_specs=None, num_prompt_tokens=10)
@@ -1133,7 +1133,7 @@ def test_out_of_global_range_still_raises_per_stage(self):
     def test_partial_hook_layers_filtered(self):
         # Multiple hooks, each split across the stage boundary.
         spec = CaptureSpec(
-            hooks={"post_mlp": [1, 5], "post_attn": [3, 7]},
+            hooks={"post_block": [1, 5], "post_attn": [3, 7]},
             positions="last_prompt",
         )
         mgr, _ = _make_pp_manager((0, 4), spec)
@@ -1145,11 +1145,11 @@ def test_partial_hook_layers_filtered(self):
             num_scheduled_tokens=[10],
         )
         plan = mgr.build_step_plan(view)
-        assert set(plan.gather_indices) == {(1, "post_mlp"), (3, "post_attn")}
+        assert set(plan.gather_indices) == {(1, "post_block"), (3, "post_attn")}
 
     @pytest.mark.parametrize("bad_range", [(-1, 4), (4, 2), (0, 9)])
     def test_invalid_local_range_rejected(self, bad_range):
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
         with pytest.raises(ValueError, match="local_layer_range"):
             _make_pp_manager(bad_range, spec)
 
@@ -1161,7 +1161,7 @@ def test_invalid_local_range_rejected(self, bad_range):
 
 def _result(req: str, layer: int, status: str = "ok", payload=None) -> CaptureResult:
     return CaptureResult(
-        key=(VllmInternalRequestId(req), layer, "post_mlp"),
+        key=(VllmInternalRequestId(req), layer, "post_block"),
         status=status,
         payload=payload,
     )
@@ -1237,8 +1237,8 @@ def test_none_target_is_noop(self):
 class TestFinalizeAsync:
     def test_callback_receives_aggregated_results(self):
         sink = _make_sink("sink0")
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
-        key = (VllmInternalRequestId("r1"), 0, "post_mlp")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
+        key = (VllmInternalRequestId("r1"), 0, "post_block")
         sink.wait_for_result.return_value = CaptureResult(
             key=key, status="ok", payload={"path": "/tmp/x"}
         )
@@ -1270,8 +1270,8 @@ def test_does_not_block_the_caller(self):
         # The caller (model-runner step thread) must return before the
         # blocking wait_for_result completes.
         sink = _make_sink("sink0")
-        spec = CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt")
-        key = (VllmInternalRequestId("r1"), 0, "post_mlp")
+        spec = CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt")
+        key = (VllmInternalRequestId("r1"), 0, "post_block")
 
         entered = threading.Event()
         release = threading.Event()
diff --git a/tests/v1/capture/test_multi_consumer_runner.py b/tests/v1/capture/test_multi_consumer_runner.py
index 63a1a29600d2..1429b21e5c29 100644
--- a/tests/v1/capture/test_multi_consumer_runner.py
+++ b/tests/v1/capture/test_multi_consumer_runner.py
@@ -82,7 +82,7 @@ def test_two_consumers_both_see_captures(tmp_path: pathlib.Path) -> None:
     # is what ``build_consumers`` does for ``CaptureConsumer`` subclasses.
     recording = _RecordingConsumer(
         _FakeVllmConfig(),
-        params={"hooks": {"post_mlp": [1]}, "positions": "last_prompt"},
+        params={"hooks": {"post_block": [1]}, "positions": "last_prompt"},
     )
     recording_sink = _BatchedAdapter(recording)
 
@@ -109,7 +109,7 @@ def test_two_consumers_both_see_captures(tmp_path: pathlib.Path) -> None:
 
     req_id = "req-multi-1"
     fs_client_spec = CaptureSpec(
-        hooks={"post_mlp": [1]},
+        hooks={"post_block": [1]},
         positions="last_prompt",
     )
 
@@ -136,7 +136,7 @@ def test_two_consumers_both_see_captures(tmp_path: pathlib.Path) -> None:
     )
     plan = mgr.build_step_plan(batch_view)
     hidden = torch.arange(32, dtype=torch.float32).reshape(4, 8)
-    mgr.on_hook(1, "post_mlp", hidden)
+    mgr.on_hook(1, "post_block", hidden)
     mgr.dispatch_step_captures(plan)
 
     # Finalize — indexed by consumer index.
@@ -152,12 +152,12 @@ def test_two_consumers_both_see_captures(tmp_path: pathlib.Path) -> None:
 
     # Give filesystem writer time to flush before asserting the on-disk
     # result status.
-    _wait_for_filesystem_result(fs_consumer, (req_id, 1, "post_mlp"))
+    _wait_for_filesystem_result(fs_consumer, (req_id, 1, "post_block"))
 
     # Recording consumer received the capture via ``on_capture``.
     assert len(recording.captured) == 1
     rec_key, rec_shape = recording.captured[0]
-    assert rec_key == (VllmInternalRequestId(req_id), 1, "post_mlp")
+    assert rec_key == (VllmInternalRequestId(req_id), 1, "post_block")
     # "last_prompt" at num_prompt_tokens=4 → one row, hidden_size=8.
     assert rec_shape == (1, 8)
 
@@ -174,7 +174,7 @@ class _FailingConsumer(CaptureConsumer):
 
         def global_capture_spec(self) -> CaptureSpec:
             return CaptureSpec(
-                hooks={"post_mlp": [0]},
+                hooks={"post_block": [0]},
                 positions="last_prompt",
             )
 
@@ -184,7 +184,7 @@ def on_capture(self, key, tensor, sidecar):
     failing = _FailingConsumer(_FakeVllmConfig(), params={})
     recording = _RecordingConsumer(
         _FakeVllmConfig(),
-        params={"hooks": {"post_mlp": [0]}, "positions": "last_prompt"},
+        params={"hooks": {"post_block": [0]}, "positions": "last_prompt"},
     )
     failing_sink = _BatchedAdapter(failing)
     recording_sink = _BatchedAdapter(recording)
@@ -216,7 +216,7 @@ def on_capture(self, key, tensor, sidecar):
     )
     plan = mgr.build_step_plan(batch_view)
     hidden = torch.zeros((2, 4), dtype=torch.float32)
-    mgr.on_hook(0, "post_mlp", hidden)
+    mgr.on_hook(0, "post_block", hidden)
     mgr.dispatch_step_captures(plan)
 
     indexed = mgr.finalize_request("req-isolated")
diff --git a/tests/v1/capture/test_plan.py b/tests/v1/capture/test_plan.py
index a2ed004f5cb4..7ecab1845c35 100644
--- a/tests/v1/capture/test_plan.py
+++ b/tests/v1/capture/test_plan.py
@@ -66,7 +66,7 @@ def test_single_consumer_mask(self):
         entry = CapturePositionEntry(
             request_id="r1",
             layer=0,
-            hook="post_mlp",
+            hook="post_block",
             logical_pos=9,
             scratch_row=0,
             step_index=0,
@@ -79,7 +79,7 @@ def test_multi_consumer_mask(self):
         entry = CapturePositionEntry(
             request_id="r1",
             layer=0,
-            hook="post_mlp",
+            hook="post_block",
             logical_pos=9,
             scratch_row=0,
             step_index=0,
@@ -111,7 +111,7 @@ def test_consumer_mask_zero_means_no_consumer(self):
         entry = CapturePositionEntry(
             request_id="r1",
             layer=0,
-            hook="post_mlp",
+            hook="post_block",
             logical_pos=0,
             scratch_row=0,
             step_index=0,
@@ -131,14 +131,14 @@ def test_gather_indices_dtype_and_shape(self):
         indices = torch.tensor([0, 3, 7], dtype=torch.int64)
         scratch = torch.empty((3, 16), dtype=torch.float32)
         plan = StepCapturePlan(
-            gather_indices={(0, "post_mlp"): indices},
-            scratch_gpu={(0, "post_mlp"): scratch},
-            scratch_dtype={(0, "post_mlp"): torch.float32},
+            gather_indices={(0, "post_block"): indices},
+            scratch_gpu={(0, "post_block"): scratch},
+            scratch_dtype={(0, "post_block"): torch.float32},
             entries=[],
         )
-        assert plan.gather_indices[(0, "post_mlp")].dtype == torch.int64
-        assert plan.gather_indices[(0, "post_mlp")].shape == (3,)
-        assert plan.scratch_gpu[(0, "post_mlp")].shape == (3, 16)
+        assert plan.gather_indices[(0, "post_block")].dtype == torch.int64
+        assert plan.gather_indices[(0, "post_block")].shape == (3,)
+        assert plan.scratch_gpu[(0, "post_block")].shape == (3, 16)
 
     def test_empty_plan(self):
         plan = StepCapturePlan(
@@ -155,23 +155,23 @@ def test_multiple_layer_hook_pairs(self):
         plan = StepCapturePlan(
             gather_indices={
                 (0, "pre_attn"): torch.tensor([0], dtype=torch.int64),
-                (0, "post_mlp"): torch.tensor([0, 1], dtype=torch.int64),
-                (1, "post_mlp"): torch.tensor([2], dtype=torch.int64),
+                (0, "post_block"): torch.tensor([0, 1], dtype=torch.int64),
+                (1, "post_block"): torch.tensor([2], dtype=torch.int64),
             },
             scratch_gpu={
                 (0, "pre_attn"): torch.empty((1, 8)),
-                (0, "post_mlp"): torch.empty((2, 8)),
-                (1, "post_mlp"): torch.empty((1, 8)),
+                (0, "post_block"): torch.empty((2, 8)),
+                (1, "post_block"): torch.empty((1, 8)),
             },
             scratch_dtype={
                 (0, "pre_attn"): torch.float32,
-                (0, "post_mlp"): torch.float32,
-                (1, "post_mlp"): torch.float32,
+                (0, "post_block"): torch.float32,
+                (1, "post_block"): torch.float32,
             },
             entries=[],
         )
         assert len(plan.gather_indices) == 3
-        assert plan.scratch_gpu[(0, "post_mlp")].shape[0] == 2
+        assert plan.scratch_gpu[(0, "post_block")].shape[0] == 2
 
     def test_request_errors_default_empty(self):
         plan = StepCapturePlan(
diff --git a/tests/v1/capture/test_runner_integration.py b/tests/v1/capture/test_runner_integration.py
index 4eecf133be96..3f4c657332ac 100644
--- a/tests/v1/capture/test_runner_integration.py
+++ b/tests/v1/capture/test_runner_integration.py
@@ -101,7 +101,7 @@ def test_filesystem_consumer_end_to_end_via_manager(tmp_path: pathlib.Path) -> N
     # ``_register_capture_request`` resolves via
     # ``validate_client_spec``.
     client_spec = CaptureSpec(
-        hooks={"post_mlp": [1]},
+        hooks={"post_block": [1]},
         positions="last_prompt",
     )
 
@@ -127,10 +127,10 @@ def test_filesystem_consumer_end_to_end_via_manager(tmp_path: pathlib.Path) -> N
     )
     plan = mgr.build_step_plan(batch_view)
 
-    # Simulate ``on_hook`` firing: for the single (layer=1, hook=post_mlp)
+    # Simulate ``on_hook`` firing: for the single (layer=1, hook=post_block)
     # key, populate the scratch with a known tensor.
     hidden = torch.arange(24, dtype=torch.float32).reshape(3, 8)
-    mgr.on_hook(1, "post_mlp", hidden)
+    mgr.on_hook(1, "post_block", hidden)
 
     # Drain.
     mgr.dispatch_step_captures(plan)
@@ -139,11 +139,11 @@ def test_filesystem_consumer_end_to_end_via_manager(tmp_path: pathlib.Path) -> N
     assert list(results.keys()) == [0]
 
     # Wait for the writer pool to flush.
-    _wait_for_status(consumer, (req_id, 1, "post_mlp"))
+    _wait_for_status(consumer, (req_id, 1, "post_block"))
     consumer.shutdown()
 
     # Verify the expected file exists under the consumer's layout.
-    bin_path = tmp_path / "default" / req_id / "1_post_mlp.bin"
+    bin_path = tmp_path / "default" / req_id / "1_post_block.bin"
     sidecar_path = bin_path.with_suffix(".json")
     assert bin_path.exists(), f"missing bin file {bin_path}"
     assert sidecar_path.exists(), f"missing sidecar {sidecar_path}"
@@ -152,7 +152,7 @@ def test_filesystem_consumer_end_to_end_via_manager(tmp_path: pathlib.Path) -> N
     sidecar = json.loads(sidecar_path.read_text())
     assert sidecar["request_id"] == req_id
     assert sidecar["layer"] == 1
-    assert sidecar["hook"] == "post_mlp"
+    assert sidecar["hook"] == "post_block"
 
 
 # ---------------------------------------------------------------------------
@@ -176,7 +176,7 @@ def test_manager_admission_error_yields_error_result() -> None:
 
     mgr = CaptureManager(
         consumers=(sink,),
-        consumer_specs=(CaptureSpec(hooks={"post_mlp": [0]}, positions="last_prompt"),),
+        consumer_specs=(CaptureSpec(hooks={"post_block": [0]}, positions="last_prompt"),),
         num_hidden_layers=2,
         hidden_size=4,
         model_dtype=torch.float32,
@@ -230,7 +230,7 @@ def test_filesystem_consumer_byte_for_byte_matches_writer(
     )
 
     tensor = torch.arange(16, dtype=torch.float32).reshape(2, 8)
-    key = (VllmInternalRequestId("req-gold"), 3, "post_mlp")
+    key = (VllmInternalRequestId("req-gold"), 3, "post_block")
 
     consumer.submit_chunk(
         CaptureChunk(
@@ -253,10 +253,10 @@ def test_filesystem_consumer_byte_for_byte_matches_writer(
             },
         )
     )
-    _wait_for_status(consumer, ("req-gold", 3, "post_mlp"))
+    _wait_for_status(consumer, ("req-gold", 3, "post_block"))
     consumer.shutdown()
 
-    consumer_bin = consumer_root / "gold" / "req-gold" / "3_post_mlp.bin"
+    consumer_bin = consumer_root / "gold" / "req-gold" / "3_post_block.bin"
     assert consumer_bin.exists()
     consumer_bytes = consumer_bin.read_bytes()
 
@@ -265,14 +265,14 @@ def test_filesystem_consumer_byte_for_byte_matches_writer(
     writer_root.mkdir()
     writer = ActivationWriter(writer_root, num_threads=1)
     try:
-        writer_bin = writer_root / "gold" / "req-gold" / "3_post_mlp.bin"
+        writer_bin = writer_root / "gold" / "req-gold" / "3_post_block.bin"
         writer_bin.parent.mkdir(parents=True, exist_ok=True)
         writer.submit(
             WriteTask(
                 path=writer_bin,
                 payload=bytes(tensor.numpy().tobytes()),
                 append=True,
-                key=("req-gold", 3, "post_mlp"),
+                key=("req-gold", 3, "post_block"),
             )
         )
         writer.submit(
@@ -285,14 +285,14 @@ def test_filesystem_consumer_byte_for_byte_matches_writer(
                     "shape": [2, 8],
                     "dtype": "float32",
                 },
-                key=("req-gold", 3, "post_mlp"),
+                key=("req-gold", 3, "post_block"),
             )
         )
 
         # Spin until writer finalizes.
         deadline = time.monotonic() + 5.0
         while time.monotonic() < deadline:
-            result = writer.get_result(("req-gold", 3, "post_mlp"))
+            result = writer.get_result(("req-gold", 3, "post_block"))
             if result is not None and result.status in ("ok", "error"):
                 break
             time.sleep(0.005)
@@ -396,14 +396,14 @@ def test_pipeline_parallel_two_stage_shared_fs(tmp_path: pathlib.Path) -> None:
     its ``CaptureManager`` is built with the *global* layer count and the
     stage's *local* ``[start, end)`` slice, and both write to the same
     root (the shared mount). A client spec spanning both stages
-    (``post_mlp`` at layers 1 and 3 of a 4-layer model) must land exactly
+    (``post_block`` at layers 1 and 3 of a 4-layer model) must land exactly
     one file per layer under its global-layer path, with each stage
     writing only the layers it owns — the Option-A merge the engine then
     unions at the result level.
     """
     GLOBAL = 4
     req_id = "req-pp"
-    client_spec = CaptureSpec(hooks={"post_mlp": [1, 3]}, positions="last_prompt")
+    client_spec = CaptureSpec(hooks={"post_block": [1, 3]}, positions="last_prompt")
 
     def _drive_stage(local_range: tuple[int, int], owned_layer: int) -> None:
         consumer = FilesystemConsumer(
@@ -438,15 +438,15 @@ def _drive_stage(local_range: tuple[int, int], owned_layer: int) -> None:
         )
         plan = mgr.build_step_plan(batch_view)
         # Only this stage's owned layer is planned.
-        assert set(plan.gather_indices) == {(owned_layer, "post_mlp")}
+        assert set(plan.gather_indices) == {(owned_layer, "post_block")}
 
         hidden = torch.arange(24, dtype=torch.float32).reshape(3, 8)
         # Firing the other stage's layer is a no-op on this manager.
-        mgr.on_hook(owned_layer, "post_mlp", hidden)
+        mgr.on_hook(owned_layer, "post_block", hidden)
         mgr.dispatch_step_captures(plan)
         results = mgr.finalize_request(req_id)
         assert list(results.keys()) == [0]
-        _wait_for_status(consumer, (req_id, owned_layer, "post_mlp"))
+        _wait_for_status(consumer, (req_id, owned_layer, "post_block"))
         consumer.shutdown()
 
     # Stage 0 owns global layers [0, 2) → captures layer 1.
@@ -457,4 +457,4 @@ def _drive_stage(local_range: tuple[int, int], owned_layer: int) -> None:
     req_dir = tmp_path / "default" / req_id
     written = sorted(p.name for p in req_dir.glob("*.bin"))
     # Exactly one file per requested layer, keyed by the GLOBAL layer index.
-    assert written == ["1_post_mlp.bin", "3_post_mlp.bin"]
+    assert written == ["1_post_block.bin", "3_post_block.bin"]
diff --git a/tests/v1/capture/test_sampling_params.py b/tests/v1/capture/test_sampling_params.py
index 6aed9057679e..9981d1e2b85e 100644
--- a/tests/v1/capture/test_sampling_params.py
+++ b/tests/v1/capture/test_sampling_params.py
@@ -37,7 +37,7 @@ def test_empty_dict_is_accepted(self) -> None:
 
     def test_dict_with_string_keys_is_accepted(self) -> None:
         spec = {
-            "filesystem": {"tag": "t", "hooks": {"post_mlp": [0]}},
+            "filesystem": {"tag": "t", "hooks": {"post_block": [0]}},
             "logging": {"level": "INFO"},
         }
         params = SamplingParams(capture=spec)
diff --git a/tests/v1/capture/test_step_gate.py b/tests/v1/capture/test_step_gate.py
index 17c2b45b221c..505c069b654f 100644
--- a/tests/v1/capture/test_step_gate.py
+++ b/tests/v1/capture/test_step_gate.py
@@ -78,7 +78,7 @@ def test_extract_selectors_none_and_empty():
 
 
 def test_extract_selectors_dict_spec():
-    raw = {"filesystem": {"hooks": {"post_mlp": "all"}, "positions": "last_prompt"}}
+    raw = {"filesystem": {"hooks": {"post_block": "all"}, "positions": "last_prompt"}}
     assert _extract_selectors(raw) == ["last_prompt"]
 
 
diff --git a/tests/v1/capture/test_types.py b/tests/v1/capture/test_types.py
index 7fd2e5a6b045..325dda5f0239 100644
--- a/tests/v1/capture/test_types.py
+++ b/tests/v1/capture/test_types.py
@@ -20,7 +20,7 @@
 )
 
 
-def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_mlp") -> CaptureKey:
+def _key(req_id: str = "req-1", layer: int = 3, hook: str = "post_block") -> CaptureKey:
     return (VllmInternalRequestId(req_id), layer, hook)
 
 
@@ -31,15 +31,15 @@ def test_capture_key_is_a_three_tuple():
     req_id, layer, hook = key
     assert req_id == "req-1"
     assert layer == 3
-    assert hook == "post_mlp"
+    assert hook == "post_block"
 
 
 def test_capture_spec_is_frozen():
     spec = CaptureSpec(
-        hooks={"post_mlp": [1, 2, 3]},
+        hooks={"post_block": [1, 2, 3]},
         positions="last_prompt",
     )
-    assert spec.hooks == {"post_mlp": [1, 2, 3]}
+    assert spec.hooks == {"post_block": [1, 2, 3]}
     assert spec.positions == "last_prompt"
 
     with pytest.raises(dataclasses.FrozenInstanceError):
diff --git a/tests/v1/core/test_steering_hash_determinism.py b/tests/v1/core/test_steering_hash_determinism.py
index 551677f203b0..07a7c77c8e8b 100644
--- a/tests/v1/core/test_steering_hash_determinism.py
+++ b/tests/v1/core/test_steering_hash_determinism.py
@@ -33,47 +33,47 @@ def test_empty_and_none_hash_zero(self):
         assert _hash({}, module_ref=None) == 0
 
     def test_identical_specs_hash_equal(self):
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
-        b = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0]}}
+        b = {"post_block": {0: [1.0, 2.0, 3.0]}}
         assert _hash(a) == _hash(b)
 
     def test_dict_insertion_order_does_not_matter(self):
         a = {
-            "post_mlp": {0: [1.0, 2.0], 1: [3.0, 4.0]},
+            "post_block": {0: [1.0, 2.0], 1: [3.0, 4.0]},
             "pre_attn": {5: [5.0, 6.0]},
         }
         # Same data, different insertion orders.
         b: dict = {}
         b["pre_attn"] = {5: [5.0, 6.0]}
-        b["post_mlp"] = {}
-        b["post_mlp"][1] = [3.0, 4.0]
-        b["post_mlp"][0] = [1.0, 2.0]
+        b["post_block"] = {}
+        b["post_block"][1] = [3.0, 4.0]
+        b["post_block"][0] = [1.0, 2.0]
         assert _hash(a) == _hash(b)
 
     def test_different_vector_values_hash_different(self):
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
-        b = {"post_mlp": {0: [1.0, 2.0, 3.1]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0]}}
+        b = {"post_block": {0: [1.0, 2.0, 3.1]}}
         assert _hash(a) != _hash(b)
 
     def test_different_layer_indices_hash_different(self):
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
-        b = {"post_mlp": {1: [1.0, 2.0, 3.0]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0]}}
+        b = {"post_block": {1: [1.0, 2.0, 3.0]}}
         assert _hash(a) != _hash(b)
 
     def test_different_hook_points_hash_different(self):
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0]}}
         b = {"pre_attn": {0: [1.0, 2.0, 3.0]}}
         assert _hash(a) != _hash(b)
 
     def test_fits_in_int64(self):
-        a = {"post_mlp": {0: [1.0] * 1024}}
+        a = {"post_block": {0: [1.0] * 1024}}
         h = _hash(a)
         assert 0 <= h < 2**63, f"Hash {h} outside signed int64 range"
 
     def test_module_ref_changes_hash(self):
         """A module ref folds into the hash; same vectors + different
         ``(name, scale)`` tuples must produce different hashes."""
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0]}}
         h_no_ref = _hash(a)
         h_ref_foo = _hash(a, module_ref=("foo", 1.0))
         h_ref_bar = _hash(a, module_ref=("bar", 1.0))
@@ -94,7 +94,7 @@ def test_module_ref_default_matches_explicit_none(self):
         """``module_ref=None`` must reduce to the original inline-only hash
         bit-for-bit so existing prefix-cache reuse is preserved.
         """
-        a = {"post_mlp": {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}}
+        a = {"post_block": {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}}
         # Default arg.
         h_default = hash_steering_config(a)
         # Explicit None.
@@ -107,7 +107,7 @@ def test_module_ref_identical_specs_hash_equal(self):
         produce the same hash regardless of when (or whether) the
         worker-side registry has been populated.  The hash is a pure
         function of the reference, not the resolved vectors."""
-        inline = {"post_mlp": {14: [0.1, 0.2]}}
+        inline = {"post_block": {14: [0.1, 0.2]}}
         ref = ("foo", 1.0)
         first = _hash(inline, module_ref=ref)
         second = _hash(inline, module_ref=ref)
@@ -125,7 +125,7 @@ def test_across_processes(self):
         script = (
             "from vllm.config.steering_types import hash_steering_config; "
             "print(hash_steering_config("
-            "{'post_mlp': {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}}"
+            "{'post_block': {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}}"
             "))"
         )
         first = subprocess.check_output([sys.executable, "-c", script])
@@ -134,5 +134,5 @@ def test_across_processes(self):
             f"Hash differs across processes: {first!r} vs {second!r}"
         )
         # And matches the in-process hash.
-        in_process = _hash({"post_mlp": {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}})
+        in_process = _hash({"post_block": {0: [1.0, 2.0, 3.0], 1: [4.0, 5.0, 6.0]}})
         assert int(first.strip()) == in_process
diff --git a/tests/v1/executor/test_executor.py b/tests/v1/executor/test_executor.py
index 525dc2ee407a..59b556ba09e6 100644
--- a/tests/v1/executor/test_executor.py
+++ b/tests/v1/executor/test_executor.py
@@ -174,7 +174,7 @@ def _result(req, layer):
         from vllm.v1.capture.types import CaptureResult, VllmInternalRequestId
 
         return CaptureResult(
-            key=(VllmInternalRequestId(req), layer, "post_mlp"),
+            key=(VllmInternalRequestId(req), layer, "post_block"),
             status="ok",
         )
 
diff --git a/tests/v1/test_request_steering.py b/tests/v1/test_request_steering.py
index c0e1e15b197a..757257a8cc99 100644
--- a/tests/v1/test_request_steering.py
+++ b/tests/v1/test_request_steering.py
@@ -30,8 +30,8 @@
 # Helpers
 # ---------------------------------------------------------------------------
 
-STEERING_A = {"post_mlp": {0: [1.0, 2.0]}}
-STEERING_B = {"post_mlp": {0: [99.0, 100.0]}}
+STEERING_A = {"post_block": {0: [1.0, 2.0]}}
+STEERING_B = {"post_block": {0: [99.0, 100.0]}}
 
 init_none_hash(sha256_cbor)
 
diff --git a/tests/v1/test_steering_inline_packed.py b/tests/v1/test_steering_inline_packed.py
index 89a613e8e591..6e7aebaf2be2 100644
--- a/tests/v1/test_steering_inline_packed.py
+++ b/tests/v1/test_steering_inline_packed.py
@@ -40,37 +40,37 @@ def test_torch_dtype_mapping(self):
         assert _torch_dtype_to_pack_dtype(torch.bfloat16) == np.dtype(np.float32)
 
     def test_pack_steering_for_dtype_bare_list(self):
-        spec = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
+        spec = {"post_block": {0: [1.0, 2.0, 3.0]}}
         out = pack_steering_for_dtype(spec, np.float32)
         assert out is not None
-        arr = out["post_mlp"][0]
+        arr = out["post_block"][0]
         assert arr.dtype == np.float32
         assert arr.tolist() == [1.0, 2.0, 3.0]
 
     def test_pack_steering_for_dtype_with_scale(self):
-        spec = {"post_mlp": {0: {"vector": [1.0, 2.0], "scale": 3.0}}}
+        spec = {"post_block": {0: {"vector": [1.0, 2.0], "scale": 3.0}}}
         out = pack_steering_for_dtype(spec, np.float32)
         assert out is not None
-        assert out["post_mlp"][0].tolist() == [3.0, 6.0]
+        assert out["post_block"][0].tolist() == [3.0, 6.0]
 
     def test_pack_effective_steering_resolves_then_casts(self):
-        base = {"post_mlp": {0: [1.0, 2.0]}}
-        prefill = {"post_mlp": {0: [10.0, 20.0]}}
+        base = {"post_block": {0: [1.0, 2.0]}}
+        prefill = {"post_block": {0: [10.0, 20.0]}}
         out = pack_effective_steering(base, prefill, np.float32)
         assert out is not None
         # 1.0+10.0=11.0, 2.0+20.0=22.0
-        assert out["post_mlp"][0].dtype == np.float32
-        assert out["post_mlp"][0].tolist() == [11.0, 22.0]
+        assert out["post_block"][0].dtype == np.float32
+        assert out["post_block"][0].tolist() == [11.0, 22.0]
 
     def test_pack_effective_steering_handles_none(self):
         assert pack_effective_steering(None, None, np.float32) is None
         assert pack_effective_steering({}, {}, np.float32) is None
 
     def test_pack_dtype_fp16_loses_some_precision_but_preserves_shape(self):
-        spec = {"post_mlp": {0: list(range(16))}}
+        spec = {"post_block": {0: list(range(16))}}
         out = pack_steering_for_dtype(spec, np.float16)
         assert out is not None
-        arr = out["post_mlp"][0]
+        arr = out["post_block"][0]
         assert arr.dtype == np.float16
         assert arr.shape == (16,)
         # fp16 represents small ints exactly.
@@ -99,7 +99,7 @@ def test_named_only_is_noop(self):
     def test_inline_packs_and_clears_originals(self):
         sp = SamplingParams(
             max_tokens=1,
-            steering_vectors={"post_mlp": {0: [1.0, 2.0]}},
+            steering_vectors={"post_block": {0: [1.0, 2.0]}},
         )
         maybe_pack_inline_steering_for_request(sp, torch.float32)
         assert sp.steering_vectors is None
@@ -108,11 +108,11 @@ def test_inline_packs_and_clears_originals(self):
         assert sp._effective_prefill_steering_packed is not None
         assert sp._effective_decode_steering_packed is not None
         # Both phases resolve to the same result when only base is set.
-        assert sp._effective_prefill_steering_packed["post_mlp"][0].tolist() == [
+        assert sp._effective_prefill_steering_packed["post_block"][0].tolist() == [
             1.0,
             2.0,
         ]
-        assert sp._effective_decode_steering_packed["post_mlp"][0].tolist() == [
+        assert sp._effective_decode_steering_packed["post_block"][0].tolist() == [
             1.0,
             2.0,
         ]
@@ -120,16 +120,16 @@ def test_inline_packs_and_clears_originals(self):
     def test_phase_specific_resolves_per_phase(self):
         sp = SamplingParams(
             max_tokens=1,
-            steering_vectors={"post_mlp": {0: [1.0, 2.0]}},
-            prefill_steering_vectors={"post_mlp": {0: [10.0, 20.0]}},
-            decode_steering_vectors={"post_mlp": {0: [100.0, 200.0]}},
+            steering_vectors={"post_block": {0: [1.0, 2.0]}},
+            prefill_steering_vectors={"post_block": {0: [10.0, 20.0]}},
+            decode_steering_vectors={"post_block": {0: [100.0, 200.0]}},
         )
         maybe_pack_inline_steering_for_request(sp, torch.float32)
-        assert sp._effective_prefill_steering_packed["post_mlp"][0].tolist() == [
+        assert sp._effective_prefill_steering_packed["post_block"][0].tolist() == [
             11.0,
             22.0,
         ]
-        assert sp._effective_decode_steering_packed["post_mlp"][0].tolist() == [
+        assert sp._effective_decode_steering_packed["post_block"][0].tolist() == [
             101.0,
             202.0,
         ]
@@ -137,7 +137,7 @@ def test_phase_specific_resolves_per_phase(self):
     def test_idempotent_when_already_packed(self):
         sp = SamplingParams(
             max_tokens=1,
-            steering_vectors={"post_mlp": {0: [1.0, 2.0]}},
+            steering_vectors={"post_block": {0: [1.0, 2.0]}},
         )
         maybe_pack_inline_steering_for_request(sp, torch.float32)
         first = sp._effective_prefill_steering_packed
@@ -148,12 +148,12 @@ def test_idempotent_when_already_packed(self):
     def test_effective_steering_returns_packed_after_pack(self):
         sp = SamplingParams(
             max_tokens=1,
-            steering_vectors={"post_mlp": {0: [1.0, 2.0]}},
+            steering_vectors={"post_block": {0: [1.0, 2.0]}},
         )
         maybe_pack_inline_steering_for_request(sp, torch.float32)
         # The cached_property fallback should now return packed values.
         assert sp.effective_prefill_steering is not None
-        assert sp.effective_prefill_steering["post_mlp"][0].tolist() == [1.0, 2.0]
+        assert sp.effective_prefill_steering["post_block"][0].tolist() == [1.0, 2.0]
 
 
 # ---------------------------------------------------------------------------
@@ -165,7 +165,7 @@ class TestHashDeterminism:
     def test_packed_request_hash_matches_unpacked(self):
         """A packed and unpacked submission of the same logical request
         must produce the same prefix-cache hash."""
-        vectors = {"post_mlp": {0: [1.0, 2.0, 3.0]}}
+        vectors = {"post_block": {0: [1.0, 2.0, 3.0]}}
         sp_unpacked = SamplingParams(max_tokens=1, steering_vectors=vectors)
         unpacked_hash = sp_unpacked.prefill_steering_config_hash
 
@@ -177,10 +177,10 @@ def test_packed_request_hash_matches_unpacked(self):
 
     def test_different_vectors_different_hash(self):
         sp_a = SamplingParams(
-            max_tokens=1, steering_vectors={"post_mlp": {0: [1.0, 2.0]}}
+            max_tokens=1, steering_vectors={"post_block": {0: [1.0, 2.0]}}
         )
         sp_b = SamplingParams(
-            max_tokens=1, steering_vectors={"post_mlp": {0: [1.0, 3.0]}}
+            max_tokens=1, steering_vectors={"post_block": {0: [1.0, 3.0]}}
         )
         maybe_pack_inline_steering_for_request(sp_a, torch.float32)
         maybe_pack_inline_steering_for_request(sp_b, torch.float32)
@@ -197,7 +197,7 @@ def test_packed_field_round_trips_through_msgspec(self):
         """Packed ndarrays survive msgspec encode/decode with dtype + values."""
         sp_in = SamplingParams(
             max_tokens=1,
-            steering_vectors={"post_mlp": {0: [1.0, 2.0, 3.0]}},
+            steering_vectors={"post_block": {0: [1.0, 2.0, 3.0]}},
         )
         maybe_pack_inline_steering_for_request(sp_in, torch.float32)
         assert sp_in._effective_prefill_steering_packed is not None
@@ -208,15 +208,15 @@ def test_packed_field_round_trips_through_msgspec(self):
         sp_out = dec.decode(bufs)
 
         assert sp_out._effective_prefill_steering_packed is not None
-        out_arr = sp_out._effective_prefill_steering_packed["post_mlp"][0]
-        in_arr = sp_in._effective_prefill_steering_packed["post_mlp"][0]
+        out_arr = sp_out._effective_prefill_steering_packed["post_block"][0]
+        in_arr = sp_in._effective_prefill_steering_packed["post_block"][0]
         assert isinstance(out_arr, np.ndarray)
         assert out_arr.dtype == in_arr.dtype
         assert np.array_equal(out_arr, in_arr)
 
     def test_packed_payload_smaller_than_unpacked(self):
         """Sanity: the packed wire form is smaller than the unpacked one."""
-        vectors = {"post_mlp": {i: [float(j) for j in range(2560)] for i in range(34)}}
+        vectors = {"post_block": {i: [float(j) for j in range(2560)] for i in range(34)}}
         sp_unpacked = SamplingParams(max_tokens=1, steering_vectors=vectors)
         sp_packed = SamplingParams(max_tokens=1, steering_vectors=vectors)
         maybe_pack_inline_steering_for_request(sp_packed, torch.float32)
diff --git a/tests/v1/test_steering_types.py b/tests/v1/test_steering_types.py
index ee7bdfedd312..e7ff5e1eaf36 100644
--- a/tests/v1/test_steering_types.py
+++ b/tests/v1/test_steering_types.py
@@ -370,10 +370,10 @@ def test_mismatched_base_prefill_raises(self):
         ):
             SamplingParams(
                 steering_vectors={
-                    "post_mlp": {15: [1.0, 2.0]},
+                    "post_block": {15: [1.0, 2.0]},
                 },
                 prefill_steering_vectors={
-                    "post_mlp": {15: [1.0]},
+                    "post_block": {15: [1.0]},
                 },
             )
 
@@ -386,10 +386,10 @@ def test_mismatched_base_decode_raises(self):
         ):
             SamplingParams(
                 steering_vectors={
-                    "post_mlp": {0: [1.0, 2.0, 3.0]},
+                    "post_block": {0: [1.0, 2.0, 3.0]},
                 },
                 decode_steering_vectors={
-                    "post_mlp": {0: [1.0, 2.0]},
+                    "post_block": {0: [1.0, 2.0]},
                 },
             )
 
@@ -399,10 +399,10 @@ def test_matching_dimensions_pass(self):
 
         params = SamplingParams(
             steering_vectors={
-                "post_mlp": {0: [1.0, 2.0]},
+                "post_block": {0: [1.0, 2.0]},
             },
             prefill_steering_vectors={
-                "post_mlp": {0: [3.0, 4.0]},
+                "post_block": {0: [3.0, 4.0]},
             },
         )
         assert params.steering_vectors is not None
@@ -415,10 +415,10 @@ def test_non_overlapping_different_dims_pass(self):
 
         params = SamplingParams(
             steering_vectors={
-                "post_mlp": {0: [1.0, 2.0]},
+                "post_block": {0: [1.0, 2.0]},
             },
             prefill_steering_vectors={
-                "post_mlp": {1: [1.0]},
+                "post_block": {1: [1.0]},
             },
         )
         assert params.steering_vectors is not None
@@ -434,10 +434,10 @@ def test_mismatched_prefill_decode_without_base_raises(self):
         ):
             SamplingParams(
                 prefill_steering_vectors={
-                    "post_mlp": {0: [1.0, 2.0]},
+                    "post_block": {0: [1.0, 2.0]},
                 },
                 decode_steering_vectors={
-                    "post_mlp": {0: [1.0]},
+                    "post_block": {0: [1.0]},
                 },
             )
 
@@ -448,10 +448,10 @@ def test_non_overlapping_prefill_decode_pass(self):
 
         params = SamplingParams(
             prefill_steering_vectors={
-                "post_mlp": {0: [1.0, 2.0]},
+                "post_block": {0: [1.0, 2.0]},
             },
             decode_steering_vectors={
-                "post_mlp": {1: [1.0]},
+                "post_block": {1: [1.0]},
             },
         )
         assert params.prefill_steering_vectors is not None
@@ -464,10 +464,10 @@ def test_matching_prefill_decode_without_base_pass(self):
 
         params = SamplingParams(
             prefill_steering_vectors={
-                "post_mlp": {0: [1.0, 2.0]},
+                "post_block": {0: [1.0, 2.0]},
             },
             decode_steering_vectors={
-                "post_mlp": {0: [3.0, 4.0]},
+                "post_block": {0: [3.0, 4.0]},
             },
         )
         assert params.prefill_steering_vectors is not None
@@ -483,12 +483,12 @@ def test_mismatched_prefill_decode_scaled_entry_raises(self):
         ):
             SamplingParams(
                 prefill_steering_vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {"vector": [1.0, 2.0], "scale": 0.5},
                     },
                 },
                 decode_steering_vectors={
-                    "post_mlp": {0: [1.0]},
+                    "post_block": {0: [1.0]},
                 },
             )
 
@@ -509,7 +509,7 @@ def test_extra_key_in_steering_vectors_raises(self):
         with pytest.raises(ValueError, match="unexpected keys"):
             SamplingParams(
                 steering_vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {"vector": [1.0, 2.0], "scale": 1.0, "typo": "bad"},
                     },
                 },
@@ -522,7 +522,7 @@ def test_extra_key_in_prefill_steering_vectors_raises(self):
         with pytest.raises(ValueError, match="unexpected keys"):
             SamplingParams(
                 prefill_steering_vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {"vector": [1.0], "scale": 1.0, "extra": 42},
                     },
                 },
@@ -535,7 +535,7 @@ def test_extra_key_in_decode_steering_vectors_raises(self):
         with pytest.raises(ValueError, match="unexpected keys"):
             SamplingParams(
                 decode_steering_vectors={
-                    "post_mlp": {
+                    "post_block": {
                         0: {"vector": [1.0], "scale": 1.0, "foo": 1, "bar": 2},
                     },
                 },
diff --git a/tests/v1/worker/test_steering_manager.py b/tests/v1/worker/test_steering_manager.py
index d382314a12da..21dab67d5be8 100644
--- a/tests/v1/worker/test_steering_manager.py
+++ b/tests/v1/worker/test_steering_manager.py
@@ -23,8 +23,8 @@
 
 HIDDEN_SIZE = 8
 MAX_CONFIGS = 4
-_HP = DEFAULT_HOOK_POINT.value  # "post_mlp"
-_TABLE_ATTR = "steering_table_post_mlp"
+_HP = DEFAULT_HOOK_POINT.value  # "post_block"
+_TABLE_ATTR = "steering_table_post_block"
 
 
 # ---------------------------------------------------------------------------
diff --git a/tests/v1/worker/test_steering_manager_ownership.py b/tests/v1/worker/test_steering_manager_ownership.py
index e074d2c849b3..ea4393409e56 100644
--- a/tests/v1/worker/test_steering_manager_ownership.py
+++ b/tests/v1/worker/test_steering_manager_ownership.py
@@ -21,7 +21,7 @@
 
 HIDDEN_SIZE = 8
 MAX_CONFIGS = 4
-_HP = DEFAULT_HOOK_POINT.value  # "post_mlp"
+_HP = DEFAULT_HOOK_POINT.value  # "post_block"
 _TABLE_ATTR = HOOK_POINT_TABLE_ATTR[DEFAULT_HOOK_POINT]
 
 
diff --git a/tests/v1/worker/test_steering_named_resolve_cache.py b/tests/v1/worker/test_steering_named_resolve_cache.py
index b75e28220091..6d7ff3b34155 100644
--- a/tests/v1/worker/test_steering_named_resolve_cache.py
+++ b/tests/v1/worker/test_steering_named_resolve_cache.py
@@ -74,8 +74,8 @@ class TestNamedCacheFastPath:
     def test_scale_one_no_overrides_returns_cached(self):
         """scale=1.0 + no inline → cache hit; output equals slow-path output."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [1.0, 2.0]})
-        prefill = _spec("post_mlp", {1: [3.0, 4.0]})
+        base = _spec("post_block", {0: [1.0, 2.0]})
+        prefill = _spec("post_block", {1: [3.0, 4.0]})
         mixin.register_steering_modules(
             {"m": {"vectors": base, "prefill_vectors": prefill}},
             replace=True,
@@ -89,9 +89,9 @@ def test_scale_one_no_overrides_returns_cached(self):
     def test_decode_phase_resolves_separately(self):
         """The cache holds (prefill, decode) — decode must use its slot."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [1.0, 2.0]})
-        prefill = _spec("post_mlp", {0: [10.0, 20.0]})
-        decode = _spec("post_mlp", {0: [100.0, 200.0]})
+        base = _spec("post_block", {0: [1.0, 2.0]})
+        prefill = _spec("post_block", {0: [10.0, 20.0]})
+        decode = _spec("post_block", {0: [100.0, 200.0]})
         mixin.register_steering_modules(
             {
                 "m": {
@@ -106,25 +106,25 @@ def test_decode_phase_resolves_separately(self):
 
         fast_prefill = mixin._resolve_request_steering(sp, "prefill")
         fast_decode = mixin._resolve_request_steering(sp, "decode")
-        assert fast_prefill["post_mlp"][0].tolist() == [11.0, 22.0]
-        assert fast_decode["post_mlp"][0].tolist() == [101.0, 202.0]
+        assert fast_prefill["post_block"][0].tolist() == [11.0, 22.0]
+        assert fast_decode["post_block"][0].tolist() == [101.0, 202.0]
 
     def test_scaled_fast_path_multiplies_cached(self):
         """scale=0.5 + no inline → fast path returns cached * 0.5."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [2.0, 4.0]})
+        base = _spec("post_block", {0: [2.0, 4.0]})
         mixin.register_steering_modules({"m": {"vectors": base}}, replace=True)
         sp = SamplingParams(steering_module_ref=("m", 0.5))
 
         fast = mixin._resolve_request_steering(sp, "prefill")
         # base resolved alone: [2.0, 4.0]; scaled by 0.5: [1.0, 2.0]
-        assert fast["post_mlp"][0].tolist() == [1.0, 2.0]
+        assert fast["post_block"][0].tolist() == [1.0, 2.0]
 
     def test_scaled_fast_path_matches_slow_path(self):
         """For scale!=1.0, fast and slow paths must agree numerically."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [1.0, 2.0], 1: [3.0, 4.0]})
-        prefill = _spec("post_mlp", {0: [10.0, 20.0]})
+        base = _spec("post_block", {0: [1.0, 2.0], 1: [3.0, 4.0]})
+        prefill = _spec("post_block", {0: [10.0, 20.0]})
         mixin.register_steering_modules(
             {"m": {"vectors": base, "prefill_vectors": prefill}},
             replace=True,
@@ -149,14 +149,14 @@ def test_scaled_fast_path_matches_slow_path(self):
     def test_decode_only_module_returns_none_for_prefill(self):
         """If module has only decode_vectors and no base, prefill returns None."""
         mixin = _StubMixin()
-        decode = _spec("post_mlp", {0: [1.0, 2.0]})
+        decode = _spec("post_block", {0: [1.0, 2.0]})
         mixin.register_steering_modules({"m": {"decode_vectors": decode}}, replace=True)
         sp = SamplingParams(steering_module_ref=("m", 1.0))
 
         assert mixin._resolve_request_steering(sp, "prefill") is None
         decoded = mixin._resolve_request_steering(sp, "decode")
         assert decoded is not None
-        assert decoded["post_mlp"][0].tolist() == [1.0, 2.0]
+        assert decoded["post_block"][0].tolist() == [1.0, 2.0]
 
 
 # ---------------------------------------------------------------------------
@@ -168,9 +168,9 @@ class TestInlineOverrideFallback:
     def test_inline_base_falls_through(self):
         """Inline ``steering_vectors`` forces the merge path."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [1.0, 2.0]})
+        base = _spec("post_block", {0: [1.0, 2.0]})
         mixin.register_steering_modules({"m": {"vectors": base}}, replace=True)
-        inline = _spec("post_mlp", {0: [10.0, 20.0]})
+        inline = _spec("post_block", {0: [10.0, 20.0]})
         sp = SamplingParams(
             steering_module_ref=("m", 1.0),
             steering_vectors=inline,
@@ -179,14 +179,14 @@ def test_inline_base_falls_through(self):
         result = mixin._resolve_request_steering(sp, "prefill")
         assert result is not None
         # base + inline = [1.0, 2.0] + [10.0, 20.0] = [11.0, 22.0]
-        assert result["post_mlp"][0].tolist() == [11.0, 22.0]
+        assert result["post_block"][0].tolist() == [11.0, 22.0]
 
     def test_inline_phase_falls_through(self):
         """Inline ``prefill_steering_vectors`` forces the merge path."""
         mixin = _StubMixin()
-        base = _spec("post_mlp", {0: [1.0, 2.0]})
+        base = _spec("post_block", {0: [1.0, 2.0]})
         mixin.register_steering_modules({"m": {"vectors": base}}, replace=True)
-        inline_prefill = _spec("post_mlp", {1: [5.0, 5.0]})
+        inline_prefill = _spec("post_block", {1: [5.0, 5.0]})
         sp = SamplingParams(
             steering_module_ref=("m", 1.0),
             prefill_steering_vectors=inline_prefill,
@@ -195,8 +195,8 @@ def test_inline_phase_falls_through(self):
         result = mixin._resolve_request_steering(sp, "prefill")
         assert result is not None
         # Layer 0 from base, layer 1 from inline_prefill.
-        assert result["post_mlp"][0].tolist() == [1.0, 2.0]
-        assert result["post_mlp"][1].tolist() == [5.0, 5.0]
+        assert result["post_block"][0].tolist() == [1.0, 2.0]
+        assert result["post_block"][1].tolist() == [5.0, 5.0]
 
 
 # ---------------------------------------------------------------------------
@@ -208,12 +208,12 @@ class TestCacheLifecycle:
     def test_register_replace_clears_cache(self):
         mixin = _StubMixin()
         mixin.register_steering_modules(
-            {"a": {"vectors": _spec("post_mlp", {0: [1.0]})}}, replace=True
+            {"a": {"vectors": _spec("post_block", {0: [1.0]})}}, replace=True
         )
         assert "a" in mixin._steering_module_resolved_cache
 
         mixin.register_steering_modules(
-            {"b": {"vectors": _spec("post_mlp", {0: [2.0]})}}, replace=True
+            {"b": {"vectors": _spec("post_block", {0: [2.0]})}}, replace=True
         )
         assert "a" not in mixin._steering_module_resolved_cache
         assert "b" in mixin._steering_module_resolved_cache
@@ -222,8 +222,8 @@ def test_unregister_drops_cache_entry(self):
         mixin = _StubMixin()
         mixin.register_steering_modules(
             {
-                "a": {"vectors": _spec("post_mlp", {0: [1.0]})},
-                "b": {"vectors": _spec("post_mlp", {0: [2.0]})},
+                "a": {"vectors": _spec("post_block", {0: [1.0]})},
+                "b": {"vectors": _spec("post_block", {0: [2.0]})},
             },
             replace=True,
         )
diff --git a/tests/v1/worker/test_steering_pre_materialize.py b/tests/v1/worker/test_steering_pre_materialize.py
index fcfb70a94b87..120f8d6add1c 100644
--- a/tests/v1/worker/test_steering_pre_materialize.py
+++ b/tests/v1/worker/test_steering_pre_materialize.py
@@ -55,8 +55,8 @@ def __init__(self, max_configs: int = 8):
 
 
 def _spec(layer_to_vec: dict[int, list[float]]) -> dict:
-    """Build a single-hook SteeringVectorSpec on hook ``post_mlp``."""
-    return {"post_mlp": dict(layer_to_vec)}
+    """Build a single-hook SteeringVectorSpec on hook ``post_block``."""
+    return {"post_block": dict(layer_to_vec)}
 
 
 def _module_payload(
@@ -390,7 +390,7 @@ def test_re_register_drops_stale_pin_then_pin_again(self):
         # Verify contents match the *new* spec by checking the manager
         # stored the new vector in its config_vectors map.
         stored = stub._steering_manager.config_vectors[(named_only_h, "prefill")]
-        layer0_t = stored["post_mlp"][0].squeeze(0)
+        layer0_t = stored["post_block"][0].squeeze(0)
         assert layer0_t.tolist() == [9.0, 9.0, 9.0, 9.0]
 
     def test_replace_true_drops_all_prior_pins(self):
diff --git a/vllm/entrypoints/openai/chat_completion/protocol.py b/vllm/entrypoints/openai/chat_completion/protocol.py
index 22f63ff6f715..d64353257a58 100644
--- a/vllm/entrypoints/openai/chat_completion/protocol.py
+++ b/vllm/entrypoints/openai/chat_completion/protocol.py
@@ -492,7 +492,7 @@ class ChatCompletionRequest(OpenAIBaseModel):
     steering_vectors: SteeringVectorSpecPacked | None = Field(
         default=None,
         description="Per-request activation steering vectors keyed by hook "
-        "point name (pre_attn, post_attn, post_mlp). Each hook carries one "
+        "point name (pre_attn, post_attn, post_block). Each hook carries one "
         "base64-encoded (num_layers, hidden_size) blob plus a sibling "
         "layer_indices list (and optional per-row scales).",
     )
diff --git a/vllm/entrypoints/openai/completion/protocol.py b/vllm/entrypoints/openai/completion/protocol.py
index 2561d4e05a6a..5daaf784717d 100644
--- a/vllm/entrypoints/openai/completion/protocol.py
+++ b/vllm/entrypoints/openai/completion/protocol.py
@@ -217,7 +217,7 @@ class CompletionRequest(OpenAIBaseModel):
     steering_vectors: SteeringVectorSpecPacked | None = Field(
         default=None,
         description="Per-request activation steering vectors keyed by hook "
-        "point name (pre_attn, post_attn, post_mlp). Each hook carries one "
+        "point name (pre_attn, post_attn, post_block). Each hook carries one "
         "base64-encoded (num_layers, hidden_size) blob plus a sibling "
         "layer_indices list (and optional per-row scales).",
     )
diff --git a/vllm/entrypoints/openai/steering/registry.py b/vllm/entrypoints/openai/steering/registry.py
index 874161ab8847..3de8cf7aec36 100644
--- a/vllm/entrypoints/openai/steering/registry.py
+++ b/vllm/entrypoints/openai/steering/registry.py
@@ -153,7 +153,7 @@ async def load_from_file(self, name: str, path: str) -> None:
 
         Each tier in the JSON file may be either the legacy shape::
 
-            {"vectors": {"post_mlp": {"14": [0.1, ...]}}}
+            {"vectors": {"post_block": {"14": [0.1, ...]}}}
 
         (string layer keys are converted to int) or the binary-wire
         ``SteeringVectorSpecPacked`` shape (base64-encoded ``data`` field
diff --git a/vllm/entrypoints/serve/steering/protocol.py b/vllm/entrypoints/serve/steering/protocol.py
index 18cac7cea69b..2d97d5c8a23d 100644
--- a/vllm/entrypoints/serve/steering/protocol.py
+++ b/vllm/entrypoints/serve/steering/protocol.py
@@ -23,7 +23,7 @@ class SetSteeringRequest(BaseModel):
         default=None,
         description="Base steering vectors applied to both prefill and "
         "decode phases. Keyed by hook point name (pre_attn, post_attn, "
-        "post_mlp). Each hook's value is either a legacy layer map "
+        "post_block). Each hook's value is either a legacy layer map "
         "({layer_idx: list[float] | {\"vector\": [...], \"scale\": float}}) "
         "or a binary-wire SteeringHookPacked blob (base64-encoded "
         "(num_layers, hidden_size) buffer + layer_indices + dtype/shape, "
diff --git a/vllm/model_executor/layers/activation_capture.py b/vllm/model_executor/layers/activation_capture.py
index 1838abb7e5c8..c35e9fd86f0d 100644
--- a/vllm/model_executor/layers/activation_capture.py
+++ b/vllm/model_executor/layers/activation_capture.py
@@ -45,7 +45,7 @@
 _HOOK_NAME_TO_ID: dict[str, int] = {
     "pre_attn": 0,
     "post_attn": 1,
-    "post_mlp": 2,
+    "post_block": 2,
     "mlp_in": 3,
     "mlp_out": 4,
 }
diff --git a/vllm/model_executor/layers/steering.py b/vllm/model_executor/layers/steering.py
index 6c65fe332cd1..bb547b35401d 100644
--- a/vllm/model_executor/layers/steering.py
+++ b/vllm/model_executor/layers/steering.py
@@ -36,7 +36,7 @@ class SteeringHookPoint(str, Enum):
     POST_ATTN = "post_attn"
     """Steer the residual skip tensor in the post-attention region."""
 
-    POST_MLP = "post_mlp"
+    POST_BLOCK = "post_block"
     """Steer the residual skip tensor in the post-MLP region."""
 
 
@@ -44,7 +44,7 @@ class SteeringHookPoint(str, Enum):
 HOOK_POINT_TABLE_ATTR: dict[SteeringHookPoint, str] = {
     SteeringHookPoint.PRE_ATTN: "steering_table_pre_attn",
     SteeringHookPoint.POST_ATTN: "steering_table_post_attn",
-    SteeringHookPoint.POST_MLP: "steering_table_post_mlp",
+    SteeringHookPoint.POST_BLOCK: "steering_table_post_block",
 }
 
 # Per-hook ``any-active`` flag attribute names. The flag is a single-element
@@ -59,7 +59,7 @@ class SteeringHookPoint(str, Enum):
 # Valid hook point string values for validation.
 VALID_HOOK_POINT_NAMES: frozenset[str] = frozenset(hp.value for hp in SteeringHookPoint)
 
-DEFAULT_HOOK_POINT = SteeringHookPoint.POST_MLP
+DEFAULT_HOOK_POINT = SteeringHookPoint.POST_BLOCK
 
 
 def register_steering_buffers(
diff --git a/vllm/model_executor/models/AXK1.py b/vllm/model_executor/models/AXK1.py
index f2fdc629f5a2..b3196a1d2438 100644
--- a/vllm/model_executor/models/AXK1.py
+++ b/vllm/model_executor/models/AXK1.py
@@ -649,7 +649,7 @@ def __init__(
         self.post_attention_layernorm = RMSNorm(
             config.hidden_size, eps=config.rms_norm_eps
         )
-        self.post_mlp_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_block_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
         self.routed_scaling_factor = config.routed_scaling_factor
 
     def _is_layer_sparse(self) -> bool:
@@ -701,7 +701,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
 
         if self.is_layer_sparse:
-            hidden_states = self.post_mlp_layernorm(hidden_states)
+            hidden_states = self.post_block_layernorm(hidden_states)
 
         if isinstance(self.mlp, AXK1MLP) and hidden_states.dtype == torch.float16:
             # Fix FP16 overflow
@@ -710,7 +710,7 @@ def forward(
             # The scaling of AXK1MOE output would be done in the forward
             # of AXK1MOE
             hidden_states *= 1.0 / self.routed_scaling_factor
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/afmoe.py b/vllm/model_executor/models/afmoe.py
index 2216e4948bd9..1aa111add2ae 100644
--- a/vllm/model_executor/models/afmoe.py
+++ b/vllm/model_executor/models/afmoe.py
@@ -339,7 +339,7 @@ def __init__(
             config.hidden_size, eps=config.rms_norm_eps
         )
         self.pre_mlp_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
-        self.post_mlp_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_block_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
 
     def forward(
         self,
@@ -364,7 +364,7 @@ def forward(
             hidden_states, residual
         )
         hidden_states = self.mlp(hidden_states)
-        hidden_states = self.post_mlp_layernorm(hidden_states)  # ffn norm b
+        hidden_states = self.post_block_layernorm(hidden_states)  # ffn norm b
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/apertus.py b/vllm/model_executor/models/apertus.py
index 234818b38307..c2bb8a9c7452 100644
--- a/vllm/model_executor/models/apertus.py
+++ b/vllm/model_executor/models/apertus.py
@@ -337,7 +337,7 @@ def forward(
         hidden_states, residual = self.feedforward_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/arcee.py b/vllm/model_executor/models/arcee.py
index cf1653db4eef..3d9dd49bb0e7 100644
--- a/vllm/model_executor/models/arcee.py
+++ b/vllm/model_executor/models/arcee.py
@@ -195,7 +195,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/arctic.py b/vllm/model_executor/models/arctic.py
index ccdf6c862b1a..a9c6f6d2fe63 100644
--- a/vllm/model_executor/models/arctic.py
+++ b/vllm/model_executor/models/arctic.py
@@ -401,7 +401,7 @@ def forward(
             hidden_states = self.block_sparse_moe(hidden_states)
             hidden_states = residual_attn + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/baichuan.py b/vllm/model_executor/models/baichuan.py
index 80a7c299f40e..e70e2a60cc1b 100644
--- a/vllm/model_executor/models/baichuan.py
+++ b/vllm/model_executor/models/baichuan.py
@@ -293,7 +293,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/commandr.py b/vllm/model_executor/models/commandr.py
index c40d7d5fb439..1f8dfea5d7f5 100644
--- a/vllm/model_executor/models/commandr.py
+++ b/vllm/model_executor/models/commandr.py
@@ -297,7 +297,7 @@ def forward(
         )
         hidden_states = hidden_states + hidden_states_mlp
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, residual
diff --git a/vllm/model_executor/models/deepseek_v2.py b/vllm/model_executor/models/deepseek_v2.py
index cb3ac5c39cbb..1de61b805cc2 100644
--- a/vllm/model_executor/models/deepseek_v2.py
+++ b/vllm/model_executor/models/deepseek_v2.py
@@ -1214,7 +1214,7 @@ def forward(
             # The scaling of DeepseekV2MOE output would be done in the forward
             # of DeepseekV2MOE
             hidden_states *= 1.0 / self.routed_scaling_factor
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/dots1.py b/vllm/model_executor/models/dots1.py
index 16d6d7eee2d9..1ddbe318451e 100644
--- a/vllm/model_executor/models/dots1.py
+++ b/vllm/model_executor/models/dots1.py
@@ -356,7 +356,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/ernie45_moe.py b/vllm/model_executor/models/ernie45_moe.py
index ec62e532eaa3..4e74d7b1efec 100644
--- a/vllm/model_executor/models/ernie45_moe.py
+++ b/vllm/model_executor/models/ernie45_moe.py
@@ -421,7 +421,7 @@ def forward(
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
 
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/exaone.py b/vllm/model_executor/models/exaone.py
index c10a5a234361..07ed518fc7ca 100644
--- a/vllm/model_executor/models/exaone.py
+++ b/vllm/model_executor/models/exaone.py
@@ -314,7 +314,7 @@ def forward(
         hidden_states, residual = self.ln_2(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/exaone4.py b/vllm/model_executor/models/exaone4.py
index c9db3f8bfd1a..fada21d69cbb 100644
--- a/vllm/model_executor/models/exaone4.py
+++ b/vllm/model_executor/models/exaone4.py
@@ -314,7 +314,7 @@ def forward(
         hidden_states = self.post_feedforward_layernorm(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, residual
diff --git a/vllm/model_executor/models/exaone_moe.py b/vllm/model_executor/models/exaone_moe.py
index 5f47817fe155..56681f38c753 100644
--- a/vllm/model_executor/models/exaone_moe.py
+++ b/vllm/model_executor/models/exaone_moe.py
@@ -259,7 +259,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/falcon.py b/vllm/model_executor/models/falcon.py
index 974aaa2a18ba..4b76b1eea30a 100644
--- a/vllm/model_executor/models/falcon.py
+++ b/vllm/model_executor/models/falcon.py
@@ -384,7 +384,7 @@ def forward(
                 mlp_output += mlp_bias
 
         output = mlp_output + residual
-        output = apply_layer_steering(self, output, SteeringHookPoint.POST_MLP)
+        output = apply_layer_steering(self, output, SteeringHookPoint.POST_BLOCK)
         return output
 
 
diff --git a/vllm/model_executor/models/flex_olmo.py b/vllm/model_executor/models/flex_olmo.py
index 84b5a86708ef..4b04d593c159 100644
--- a/vllm/model_executor/models/flex_olmo.py
+++ b/vllm/model_executor/models/flex_olmo.py
@@ -167,7 +167,7 @@ def forward(
         hidden_states = self.post_feedforward_layernorm(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states, None
 
diff --git a/vllm/model_executor/models/gemma.py b/vllm/model_executor/models/gemma.py
index 16f1326d7250..e8385fcd6870 100644
--- a/vllm/model_executor/models/gemma.py
+++ b/vllm/model_executor/models/gemma.py
@@ -278,7 +278,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/gemma2.py b/vllm/model_executor/models/gemma2.py
index 0abcf9e9c256..f57ff9a15499 100644
--- a/vllm/model_executor/models/gemma2.py
+++ b/vllm/model_executor/models/gemma2.py
@@ -268,7 +268,7 @@ def forward(
             hidden_states, residual
         )
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         hidden_states = self.post_feedforward_layernorm(hidden_states)
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/gemma3.py b/vllm/model_executor/models/gemma3.py
index 74043f4adcbc..8a66d94e5104 100644
--- a/vllm/model_executor/models/gemma3.py
+++ b/vllm/model_executor/models/gemma3.py
@@ -330,7 +330,7 @@ def forward(
         )
         hidden_states = self.mlp(hidden_states)
 
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         hidden_states = self.post_feedforward_layernorm(hidden_states)
 
diff --git a/vllm/model_executor/models/gemma3n.py b/vllm/model_executor/models/gemma3n.py
index 7235ca2a533d..4e613338c614 100644
--- a/vllm/model_executor/models/gemma3n.py
+++ b/vllm/model_executor/models/gemma3n.py
@@ -578,7 +578,7 @@ def forward(
         attn_ffw_norm = self.post_feedforward_layernorm(attn_ffw)
         attn_ffw_laurel_gated = attn_laurel + attn_ffw_norm
         attn_ffw_laurel_gated = apply_layer_steering(
-            self, attn_ffw_laurel_gated, SteeringHookPoint.POST_MLP
+            self, attn_ffw_laurel_gated, SteeringHookPoint.POST_BLOCK
         )
 
         # ActUp (connect).
diff --git a/vllm/model_executor/models/gemma4.py b/vllm/model_executor/models/gemma4.py
index b98024095e1b..8e73d689c898 100644
--- a/vllm/model_executor/models/gemma4.py
+++ b/vllm/model_executor/models/gemma4.py
@@ -769,9 +769,9 @@ def forward(
 
         hidden_states = self.post_feedforward_layernorm(hidden_states)
         hidden_states = hidden_states + residual
-        maybe_capture_residual(hidden_states, self.layer_idx, "post_mlp")
+        maybe_capture_residual(hidden_states, self.layer_idx, "post_block")
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         # Apply PLE (Per-Layer Embedding) if configured
diff --git a/vllm/model_executor/models/glm4.py b/vllm/model_executor/models/glm4.py
index 24e7d00cd0ae..b2888f53daa6 100644
--- a/vllm/model_executor/models/glm4.py
+++ b/vllm/model_executor/models/glm4.py
@@ -209,7 +209,7 @@ def __init__(
         self.post_self_attn_layernorm = RMSNorm(
             config.hidden_size, eps=config.rms_norm_eps
         )
-        self.post_mlp_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_block_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
 
     def forward(
         self,
@@ -235,8 +235,8 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        hidden_states = self.post_mlp_layernorm(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        hidden_states = self.post_block_layernorm(hidden_states)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/glm4_moe.py b/vllm/model_executor/models/glm4_moe.py
index d1e8779b3664..9753fab7d00e 100644
--- a/vllm/model_executor/models/glm4_moe.py
+++ b/vllm/model_executor/models/glm4_moe.py
@@ -405,7 +405,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/glm4_moe_lite.py b/vllm/model_executor/models/glm4_moe_lite.py
index 3cc07a8ed9bd..2c8899990a0e 100644
--- a/vllm/model_executor/models/glm4_moe_lite.py
+++ b/vllm/model_executor/models/glm4_moe_lite.py
@@ -219,7 +219,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/gpt_neox.py b/vllm/model_executor/models/gpt_neox.py
index 7522744d1600..0e0d522c2d52 100644
--- a/vllm/model_executor/models/gpt_neox.py
+++ b/vllm/model_executor/models/gpt_neox.py
@@ -212,7 +212,7 @@ def forward(
                 self, hidden_states, SteeringHookPoint.POST_ATTN
             )
             hidden_states = apply_layer_steering(
-                self, hidden_states, SteeringHookPoint.POST_MLP
+                self, hidden_states, SteeringHookPoint.POST_BLOCK
             )
         else:
             # pseudocode:
@@ -226,7 +226,7 @@ def forward(
             mlp_output = self.mlp(mlp_input)
             hidden_states = mlp_output + attn_output
             hidden_states = apply_layer_steering(
-                self, hidden_states, SteeringHookPoint.POST_MLP
+                self, hidden_states, SteeringHookPoint.POST_BLOCK
             )
         return hidden_states
 
diff --git a/vllm/model_executor/models/granite.py b/vllm/model_executor/models/granite.py
index 86a8e8465e4d..2d6d72c98223 100644
--- a/vllm/model_executor/models/granite.py
+++ b/vllm/model_executor/models/granite.py
@@ -274,7 +274,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = residual + hidden_states * self.residual_multiplier
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/granitemoe.py b/vllm/model_executor/models/granitemoe.py
index 5cb7133aec45..06fefc6ed29c 100644
--- a/vllm/model_executor/models/granitemoe.py
+++ b/vllm/model_executor/models/granitemoe.py
@@ -308,7 +308,7 @@ def forward(
         hidden_states = self.block_sparse_moe(hidden_states)
         hidden_states = residual + hidden_states * self.residual_multiplier
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states
diff --git a/vllm/model_executor/models/granitemoeshared.py b/vllm/model_executor/models/granitemoeshared.py
index 5c4b60d8097c..3ac70b017837 100644
--- a/vllm/model_executor/models/granitemoeshared.py
+++ b/vllm/model_executor/models/granitemoeshared.py
@@ -172,7 +172,7 @@ def forward(
             del moe_hidden_states
         hidden_states = residual + hidden_states * self.residual_multiplier
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states
diff --git a/vllm/model_executor/models/grok1.py b/vllm/model_executor/models/grok1.py
index 0d74632363ad..8d84a7c9f551 100644
--- a/vllm/model_executor/models/grok1.py
+++ b/vllm/model_executor/models/grok1.py
@@ -452,7 +452,7 @@ def forward(
         else:
             hidden_states = self.moe_block(hidden_states)
         hidden_states = self.post_moe_norm(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/hunyuan_v1.py b/vllm/model_executor/models/hunyuan_v1.py
index cb66dbe95c8d..038bef9cbdba 100644
--- a/vllm/model_executor/models/hunyuan_v1.py
+++ b/vllm/model_executor/models/hunyuan_v1.py
@@ -599,7 +599,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual, ori_kv_states
 
 
diff --git a/vllm/model_executor/models/hyperclovax.py b/vllm/model_executor/models/hyperclovax.py
index 7f54923505fa..46746ceca4f5 100644
--- a/vllm/model_executor/models/hyperclovax.py
+++ b/vllm/model_executor/models/hyperclovax.py
@@ -319,7 +319,7 @@ def forward(
         # The residual is added outside the layernorm function to apply muP.
         hidden_states = residual + hidden_states * self.residual_multiplier  # muP
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, residual
diff --git a/vllm/model_executor/models/internlm2.py b/vllm/model_executor/models/internlm2.py
index daf9a4863d22..d2948e4459c8 100644
--- a/vllm/model_executor/models/internlm2.py
+++ b/vllm/model_executor/models/internlm2.py
@@ -266,7 +266,7 @@ def forward(
         hidden_states, residual = self.ffn_norm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.feed_forward(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/internlm2_ve.py b/vllm/model_executor/models/internlm2_ve.py
index 49c027359d39..e4f4a431cb70 100644
--- a/vllm/model_executor/models/internlm2_ve.py
+++ b/vllm/model_executor/models/internlm2_ve.py
@@ -110,7 +110,7 @@ def forward(
                 ).flatten()
         else:
             hidden_states = self.feed_forward(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/interns1_pro.py b/vllm/model_executor/models/interns1_pro.py
index fdfc9cf89b5c..ab225a5bda8a 100644
--- a/vllm/model_executor/models/interns1_pro.py
+++ b/vllm/model_executor/models/interns1_pro.py
@@ -476,7 +476,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/iquest_loopcoder.py b/vllm/model_executor/models/iquest_loopcoder.py
index b06f1a70281e..5a1c9d2c20be 100644
--- a/vllm/model_executor/models/iquest_loopcoder.py
+++ b/vllm/model_executor/models/iquest_loopcoder.py
@@ -296,7 +296,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = hidden_states + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states
diff --git a/vllm/model_executor/models/jais2.py b/vllm/model_executor/models/jais2.py
index 42a5fc8c0414..236a617e1966 100644
--- a/vllm/model_executor/models/jais2.py
+++ b/vllm/model_executor/models/jais2.py
@@ -304,7 +304,7 @@ def forward(
         )
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
     def get_quant_config(self, vllm_config: VllmConfig) -> QuantizationConfig | None:
diff --git a/vllm/model_executor/models/kimi_linear.py b/vllm/model_executor/models/kimi_linear.py
index 64f9c9935609..45f4f2647393 100644
--- a/vllm/model_executor/models/kimi_linear.py
+++ b/vllm/model_executor/models/kimi_linear.py
@@ -398,7 +398,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         hidden_states = self.mlp(hidden_states)
 
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/llama.py b/vllm/model_executor/models/llama.py
index 5ff962cab4c9..94d8206fe9f5 100644
--- a/vllm/model_executor/models/llama.py
+++ b/vllm/model_executor/models/llama.py
@@ -352,7 +352,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
     def get_quant_config(self, vllm_config: VllmConfig) -> QuantizationConfig | None:
diff --git a/vllm/model_executor/models/llama4.py b/vllm/model_executor/models/llama4.py
index 6560ae4e7cb2..8c33afc4b4e4 100644
--- a/vllm/model_executor/models/llama4.py
+++ b/vllm/model_executor/models/llama4.py
@@ -402,7 +402,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.feed_forward(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/minicpm.py b/vllm/model_executor/models/minicpm.py
index 0572c2b15df6..bfeab6b78769 100644
--- a/vllm/model_executor/models/minicpm.py
+++ b/vllm/model_executor/models/minicpm.py
@@ -418,7 +418,7 @@ def forward(
             self.config.scale_depth / math.sqrt(self.config.num_hidden_layers)
         )
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, None
diff --git a/vllm/model_executor/models/minimax_m2.py b/vllm/model_executor/models/minimax_m2.py
index 54f6ae32d0a7..64a2eb097784 100644
--- a/vllm/model_executor/models/minimax_m2.py
+++ b/vllm/model_executor/models/minimax_m2.py
@@ -340,7 +340,7 @@ def forward(
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
 
         hidden_states = self.block_sparse_moe(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/minimax_text_01.py b/vllm/model_executor/models/minimax_text_01.py
index 1429e6f6c72c..602915879bb0 100644
--- a/vllm/model_executor/models/minimax_text_01.py
+++ b/vllm/model_executor/models/minimax_text_01.py
@@ -501,7 +501,7 @@ def forward(
 
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, None
diff --git a/vllm/model_executor/models/mistral.py b/vllm/model_executor/models/mistral.py
index 566e4d3c0159..387dcf49233c 100644
--- a/vllm/model_executor/models/mistral.py
+++ b/vllm/model_executor/models/mistral.py
@@ -206,7 +206,7 @@ def forward(
             hidden_states = hidden_states * (1 + self.ada_rms_norm_t_cond(t_cond))
 
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/mixtral.py b/vllm/model_executor/models/mixtral.py
index 7077ae2166e9..0d4d960d3e7d 100644
--- a/vllm/model_executor/models/mixtral.py
+++ b/vllm/model_executor/models/mixtral.py
@@ -313,7 +313,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.block_sparse_moe(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/molmo.py b/vllm/model_executor/models/molmo.py
index f4ba85a90ebe..e3e298f99d1e 100644
--- a/vllm/model_executor/models/molmo.py
+++ b/vllm/model_executor/models/molmo.py
@@ -664,7 +664,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
@@ -694,7 +694,7 @@ def forward(
         hidden_states = self.post_attention_layernorm(hidden_states)
         hidden_states = hidden_states + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         residual = None
         return hidden_states, residual
diff --git a/vllm/model_executor/models/molmo2.py b/vllm/model_executor/models/molmo2.py
index 52214da9f831..3729221c2d73 100644
--- a/vllm/model_executor/models/molmo2.py
+++ b/vllm/model_executor/models/molmo2.py
@@ -1140,7 +1140,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
@@ -1172,7 +1172,7 @@ def forward(
         hidden_states = self.post_attention_layernorm(hidden_states)
         hidden_states = hidden_states + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         residual = None
         return hidden_states, residual
diff --git a/vllm/model_executor/models/nemotron.py b/vllm/model_executor/models/nemotron.py
index eea3702c450b..7fb440bbddea 100644
--- a/vllm/model_executor/models/nemotron.py
+++ b/vllm/model_executor/models/nemotron.py
@@ -312,7 +312,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/nemotron_nas.py b/vllm/model_executor/models/nemotron_nas.py
index 2a72295f7704..a04707529673 100644
--- a/vllm/model_executor/models/nemotron_nas.py
+++ b/vllm/model_executor/models/nemotron_nas.py
@@ -243,7 +243,7 @@ def forward(
             )
             residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
             hidden_states = self.mlp(hidden_states)
-            residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+            residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/olmo.py b/vllm/model_executor/models/olmo.py
index bc8a7655173a..8b1fab09ab8d 100644
--- a/vllm/model_executor/models/olmo.py
+++ b/vllm/model_executor/models/olmo.py
@@ -261,7 +261,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/olmo2.py b/vllm/model_executor/models/olmo2.py
index 5b21a88baea5..40b92f351066 100644
--- a/vllm/model_executor/models/olmo2.py
+++ b/vllm/model_executor/models/olmo2.py
@@ -300,7 +300,7 @@ def forward(
         hidden_states = self.post_feedforward_layernorm(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/olmo_hybrid.py b/vllm/model_executor/models/olmo_hybrid.py
index 0847ac76e57c..65e106422980 100644
--- a/vllm/model_executor/models/olmo_hybrid.py
+++ b/vllm/model_executor/models/olmo_hybrid.py
@@ -839,7 +839,7 @@ def forward(
             hidden_states = self.mlp(hidden_states)
             hidden_states = residual + hidden_states
             hidden_states = apply_layer_steering(
-                self, hidden_states, SteeringHookPoint.POST_MLP
+                self, hidden_states, SteeringHookPoint.POST_BLOCK
             )
         else:
             residual = hidden_states
@@ -856,7 +856,7 @@ def forward(
             hidden_states = self.post_feedforward_layernorm(hidden_states)
             hidden_states = residual + hidden_states
             hidden_states = apply_layer_steering(
-                self, hidden_states, SteeringHookPoint.POST_MLP
+                self, hidden_states, SteeringHookPoint.POST_BLOCK
             )
         return hidden_states
 
diff --git a/vllm/model_executor/models/olmoe.py b/vllm/model_executor/models/olmoe.py
index fca2b878ab1d..f13747172764 100644
--- a/vllm/model_executor/models/olmoe.py
+++ b/vllm/model_executor/models/olmoe.py
@@ -287,7 +287,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/openpangu.py b/vllm/model_executor/models/openpangu.py
index ab638971fa68..39f004405dc7 100644
--- a/vllm/model_executor/models/openpangu.py
+++ b/vllm/model_executor/models/openpangu.py
@@ -960,7 +960,7 @@ def __init__(
             self.pre_mlp_layernorm = RMSNorm(
                 config.hidden_size, eps=config.rms_norm_eps
             )
-            self.post_mlp_layernorm = RMSNorm(
+            self.post_block_layernorm = RMSNorm(
                 config.hidden_size, eps=config.rms_norm_eps
             )
 
@@ -1015,8 +1015,8 @@ def forward(
             hidden_states *= 1.0 / self.routed_scaling_factor
 
         if self.sandwich_norm:
-            hidden_states = self.post_mlp_layernorm(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+            hidden_states = self.post_block_layernorm(hidden_states)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/opt.py b/vllm/model_executor/models/opt.py
index 05c7d025fcbe..96263cb3f369 100644
--- a/vllm/model_executor/models/opt.py
+++ b/vllm/model_executor/models/opt.py
@@ -217,7 +217,7 @@ def forward(
         hidden_states, _ = self.fc2(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         # 350m applies layer norm AFTER attention
         if not self.do_layer_norm_before:
diff --git a/vllm/model_executor/models/orion.py b/vllm/model_executor/models/orion.py
index 9fdc26bf1181..992ac8bc2250 100644
--- a/vllm/model_executor/models/orion.py
+++ b/vllm/model_executor/models/orion.py
@@ -241,7 +241,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/ouro.py b/vllm/model_executor/models/ouro.py
index d295647de6c1..5f0b0e1b210f 100644
--- a/vllm/model_executor/models/ouro.py
+++ b/vllm/model_executor/models/ouro.py
@@ -302,7 +302,7 @@ def forward(
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
         hidden_states = self.post_attention_layernorm_2(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/persimmon.py b/vllm/model_executor/models/persimmon.py
index 5e3a557c365b..5e086b5d7d0e 100644
--- a/vllm/model_executor/models/persimmon.py
+++ b/vllm/model_executor/models/persimmon.py
@@ -259,7 +259,7 @@ def forward(
 
         hidden_states = hidden_states + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         outputs = hidden_states
diff --git a/vllm/model_executor/models/phi.py b/vllm/model_executor/models/phi.py
index f278f6f9bc04..e9e35a2152fa 100644
--- a/vllm/model_executor/models/phi.py
+++ b/vllm/model_executor/models/phi.py
@@ -227,7 +227,7 @@ def forward(
         )
         hidden_states = attn_hidden_states + feed_forward_hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/model_executor/models/phimoe.py b/vllm/model_executor/models/phimoe.py
index 304cd92e0d69..04a58c15f3de 100644
--- a/vllm/model_executor/models/phimoe.py
+++ b/vllm/model_executor/models/phimoe.py
@@ -470,7 +470,7 @@ def forward(
 
         hidden_states = hidden_states + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/plamo2.py b/vllm/model_executor/models/plamo2.py
index deab6ea524d7..8098ef02819a 100644
--- a/vllm/model_executor/models/plamo2.py
+++ b/vllm/model_executor/models/plamo2.py
@@ -690,7 +690,7 @@ def __init__(
         self.pre_mixer_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
         self.post_mixer_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
         self.pre_mlp_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
-        self.post_mlp_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_block_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
 
     def forward(
         self,
@@ -729,9 +729,9 @@ def forward(
         # Fully Connected
         hidden_states, residual = self.pre_mlp_norm(hidden_states, residual)
         hidden_states = self.mlp(hidden_states)
-        hidden_states = self.post_mlp_norm(hidden_states)
+        hidden_states = self.post_block_norm(hidden_states)
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states, residual
 
@@ -1007,7 +1007,7 @@ def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):
                 loaded_weight += 1.0 / 5
             elif ".pre_mlp_norm" in name:
                 loaded_weight += 1.0
-            elif ".post_mlp_norm" in name:
+            elif ".post_block_norm" in name:
                 loaded_weight += 1.0 / (5**1.5)
             elif "model.norm.weight" in name:
                 loaded_weight += 1.0
diff --git a/vllm/model_executor/models/plamo3.py b/vllm/model_executor/models/plamo3.py
index 637f88cfd340..a7097b847159 100644
--- a/vllm/model_executor/models/plamo3.py
+++ b/vllm/model_executor/models/plamo3.py
@@ -275,9 +275,9 @@ def __init__(
             self.pre_mlp_norm.weight,
             {"weight_loader": rms_norm_weight_loader(offset=1.0)},
         )
-        self.post_mlp_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_block_norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
         set_weight_attrs(
-            self.post_mlp_norm.weight,
+            self.post_block_norm.weight,
             {"weight_loader": rms_norm_weight_loader(offset=1.0 / (5**1.5))},
         )
 
@@ -305,9 +305,9 @@ def forward(
         # Fully Connected
         hidden_states, residual = self.pre_mlp_norm(hidden_states, residual)
         hidden_states = self.mlp(hidden_states)
-        hidden_states = self.post_mlp_norm(hidden_states)
+        hidden_states = self.post_block_norm(hidden_states)
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/qwen2.py b/vllm/model_executor/models/qwen2.py
index 85b7dfc0bdc8..268fea5a6071 100644
--- a/vllm/model_executor/models/qwen2.py
+++ b/vllm/model_executor/models/qwen2.py
@@ -342,7 +342,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/qwen2_moe.py b/vllm/model_executor/models/qwen2_moe.py
index 0699f0a71358..c05aaf90391a 100644
--- a/vllm/model_executor/models/qwen2_moe.py
+++ b/vllm/model_executor/models/qwen2_moe.py
@@ -375,7 +375,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/qwen3.py b/vllm/model_executor/models/qwen3.py
index f023b0c8c75f..6e9980d5cf98 100644
--- a/vllm/model_executor/models/qwen3.py
+++ b/vllm/model_executor/models/qwen3.py
@@ -252,7 +252,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/qwen3_moe.py b/vllm/model_executor/models/qwen3_moe.py
index 2af8af43044d..bbaf3493f084 100644
--- a/vllm/model_executor/models/qwen3_moe.py
+++ b/vllm/model_executor/models/qwen3_moe.py
@@ -455,7 +455,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/qwen3_next.py b/vllm/model_executor/models/qwen3_next.py
index 7bd3323df58b..855eea6b33cb 100644
--- a/vllm/model_executor/models/qwen3_next.py
+++ b/vllm/model_executor/models/qwen3_next.py
@@ -468,7 +468,7 @@ def forward(
                     self.ffn_layer_scale.to(hidden_states.dtype) + 1
                 )
 
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/seed_oss.py b/vllm/model_executor/models/seed_oss.py
index 5810f45e513c..94aaa4ee6ab3 100644
--- a/vllm/model_executor/models/seed_oss.py
+++ b/vllm/model_executor/models/seed_oss.py
@@ -276,7 +276,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/solar.py b/vllm/model_executor/models/solar.py
index 1d1a6c53b5db..4c30baee06ee 100644
--- a/vllm/model_executor/models/solar.py
+++ b/vllm/model_executor/models/solar.py
@@ -271,7 +271,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
 
diff --git a/vllm/model_executor/models/stablelm.py b/vllm/model_executor/models/stablelm.py
index 0ccd2c98aa4b..dd89dbfe4813 100644
--- a/vllm/model_executor/models/stablelm.py
+++ b/vllm/model_executor/models/stablelm.py
@@ -236,7 +236,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states, residual
diff --git a/vllm/model_executor/models/starcoder2.py b/vllm/model_executor/models/starcoder2.py
index 6d470eda0e51..44074d2b4ded 100644
--- a/vllm/model_executor/models/starcoder2.py
+++ b/vllm/model_executor/models/starcoder2.py
@@ -239,7 +239,7 @@ def forward(
         hidden_states = self.mlp(hidden_states)
         hidden_states = residual + hidden_states
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
 
         return hidden_states
diff --git a/vllm/model_executor/models/step1.py b/vllm/model_executor/models/step1.py
index eabc61cba139..5d70209d4865 100644
--- a/vllm/model_executor/models/step1.py
+++ b/vllm/model_executor/models/step1.py
@@ -266,7 +266,7 @@ def forward(
         hidden_states, residual = self.post_attention_layernorm(hidden_states, residual)
         residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_ATTN)
         hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
         return hidden_states, residual
 
     def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
diff --git a/vllm/model_executor/models/step3_text.py b/vllm/model_executor/models/step3_text.py
index 1a5d9b699dc2..342b38278f5c 100644
--- a/vllm/model_executor/models/step3_text.py
+++ b/vllm/model_executor/models/step3_text.py
@@ -328,7 +328,7 @@ def forward(
             hidden_states = share_output + moe_output
         else:
             hidden_states = self.mlp(hidden_states)
-        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_MLP)
+        residual = apply_layer_steering(self, residual, SteeringHookPoint.POST_BLOCK)
 
         return hidden_states, residual
 
diff --git a/vllm/model_executor/models/step3p5.py b/vllm/model_executor/models/step3p5.py
index 06fa322f6972..82fb8c695802 100644
--- a/vllm/model_executor/models/step3p5.py
+++ b/vllm/model_executor/models/step3p5.py
@@ -562,7 +562,7 @@ def forward(
             ffn_output = self.mlp(hidden_states)
         hidden_states = ffn_output + residual
         hidden_states = apply_layer_steering(
-            self, hidden_states, SteeringHookPoint.POST_MLP
+            self, hidden_states, SteeringHookPoint.POST_BLOCK
         )
         return hidden_states
 
diff --git a/vllm/sampling_params.py b/vllm/sampling_params.py
index 3f02d47ec315..c37c9afa1aac 100644
--- a/vllm/sampling_params.py
+++ b/vllm/sampling_params.py
@@ -342,7 +342,7 @@ class SamplingParams(
 
     steering_vectors: SteeringVectorSpec | None = None
     """Base steering vectors applied to both prefill and decode phases.
-    Keyed by hook point name (pre_attn, post_attn, post_mlp), then
+    Keyed by hook point name (pre_attn, post_attn, post_block), then
     layer index. Values are either bare
     ``list[float]`` (scale=1.0) or ``{"vector": [...], "scale": float}``."""
 
diff --git a/vllm/v1/capture/consumers/filesystem/validation.py b/vllm/v1/capture/consumers/filesystem/validation.py
index 5285686bda15..60816c15c330 100644
--- a/vllm/v1/capture/consumers/filesystem/validation.py
+++ b/vllm/v1/capture/consumers/filesystem/validation.py
@@ -66,7 +66,7 @@
 # so admission rejects them until they are wired; re-add here once
 # implemented.
 _VALID_HOOK_NAMES: frozenset[str] = frozenset(
-    ("pre_attn", "post_attn", "post_mlp")
+    ("pre_attn", "post_attn", "post_block")
 )
 
 _VALID_POSITION_KINDS: frozenset[str] = frozenset(
@@ -375,7 +375,7 @@ def validate_filesystem_request(
     _structural_validate(raw)
 
     # 2. Parallelism. The residual hooks captured today (pre_attn /
-    # post_attn / post_mlp) read the residual stream after the
+    # post_attn / post_block) read the residual stream after the
     # tensor-parallel all-reduce / MoE combine, so it is replicated and
     # full-width across the TP and EP planes; data parallelism partitions
     # requests across independent engine cores. All four axes are
diff --git a/vllm/v1/capture/manager.py b/vllm/v1/capture/manager.py
index 6fe02d134593..a8f9656c6114 100644
--- a/vllm/v1/capture/manager.py
+++ b/vllm/v1/capture/manager.py
@@ -1461,7 +1461,7 @@ def _run_finalize(
                 dummy_key = (
                     VllmInternalRequestId(req_id),
                     0,
-                    "post_mlp",
+                    "post_block",
                 )
                 results[consumer_idx] = CaptureResult(
                     key=dummy_key,
diff --git a/vllm/v1/capture/plan.py b/vllm/v1/capture/plan.py
index 700d755ba570..9ca7164f44c2 100644
--- a/vllm/v1/capture/plan.py
+++ b/vllm/v1/capture/plan.py
@@ -62,7 +62,7 @@ class CapturePositionEntry:
     Attributes:
         request_id: The owning request's id.
         layer: Decoder-layer index.
-        hook: Hook-point name (e.g. ``"post_mlp"``).
+        hook: Hook-point name (e.g. ``"post_block"``).
         logical_pos: Absolute position in the request's token sequence.
         scratch_row: Index within the ``(layer, hook)``'s scratch tensor.
         step_index: Capture-step ordinal for this request.
diff --git a/vllm/v1/capture/types.py b/vllm/v1/capture/types.py
index 75313f703161..94c961f1edd7 100644
--- a/vllm/v1/capture/types.py
+++ b/vllm/v1/capture/types.py
@@ -42,7 +42,7 @@
 HookName = Literal[
     "pre_attn",
     "post_attn",
-    "post_mlp",
+    "post_block",
     "mlp_in",
     "mlp_out",
 ]
diff --git a/vllm/v1/worker/gpu_model_runner.py b/vllm/v1/worker/gpu_model_runner.py
index d2f2d487e3ab..ef52aaa058c3 100644
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -558,7 +558,7 @@ def __init__(
             self._capture_step_gate = CaptureStepGate()
 
             # Capturer-rank gate. The replicated residual hooks
-            # (pre_attn/post_attn/post_mlp) read the residual stream after
+            # (pre_attn/post_attn/post_block) read the residual stream after
             # the tensor-parallel all-reduce / MoE combine, so it is
             # byte-identical across the tensor-parallel group within each
             # (data-parallel, pipeline) cell. Exactly one rank — TP rank 0
diff --git a/vllm/v1/worker/steering_manager.py b/vllm/v1/worker/steering_manager.py
index 46539fb3f098..78dac194d2ea 100644
--- a/vllm/v1/worker/steering_manager.py
+++ b/vllm/v1/worker/steering_manager.py
@@ -305,7 +305,7 @@ def update_global_vectors(
         """Update cached global vector for a hook point and layer.
 
         Args:
-            hook_point: Hook point string (e.g. ``"post_mlp"``).
+            hook_point: Hook point string (e.g. ``"post_block"``).
             layer_idx: Layer index.
             vector: The global vector tensor.
             phase: ``"base"``, ``"prefill"``, or ``"decode"``.