Skip to content

feat(rust): steering, capture, and named steering modules in the rust frontend#187

Open
RhizoNymph wants to merge 3 commits into
feat/integrationfrom
feat/rust-steering
Open

feat(rust): steering, capture, and named steering modules in the rust frontend#187
RhizoNymph wants to merge 3 commits into
feat/integrationfrom
feat/rust-steering

Conversation

@RhizoNymph

@RhizoNymph RhizoNymph commented Jun 19, 2026

Copy link
Copy Markdown
Owner

What

Adds per-request activation steering and end-to-end activation capture to the Rust frontend, plus named steering modules (startup load + runtime management), across both inbound surfaces (OpenAI HTTP /v1/completions + /v1/chat/completions, and the served gRPC Generate service).

Per-request steering

  • New request fields: steering_vectors / prefill_steering_vectors / decode_steering_vectors (packed base64 wire format matching the Python OpenAI API), steering_name.
  • Packed format (dtype/shape/layer_indices/data/scales) decoded into the inline SteeringVectorSpec engine-core resolves (float32/float16/bfloat16/float64, per-row scales). Layer keys serialize as integer msgpack keys (Python dict[int, ...]). steering_name lowers to steering_module_ref = (name, 1.0).

Named steering modules

  • --steering-modules name=path loads module JSON at startup and broadcasts to the engine workers (register_steering_modules + pre_materialize_steering_module).
  • Runtime endpoints re-broadcast on change: GET / POST / DELETE /v1/steering/modules. Per-request steering_name validated up front (unknown → invalid_request / NotFound); mutations serialized by a lock.

Activation capture (end-to-end)

  • Request side: capture spec accepted on HTTP + gRPC, forwarded verbatim into SamplingParams.capture; engine-core's offline admission resolves prefix-cache flags.
  • Output side (the key fix): Python's EngineCoreOutput is an array_like tuple with capture_results at index 7; the Rust EngineCoreOutput predated that field, so capture-enabled outputs misaligned/failed to decode. Added a CaptureResult type and inserted capture_results at the correct tuple index, threaded it (alongside kv_transfer_params) through llmtextchat, and surfaced it on the non-streaming completion/chat responses and the gRPC FinishInfo — coercing each consumer payload to an object (mirror of Python's _capture_result_to_response_payload). Streaming responses omit capture_results, matching Python (capture still executes; consumers write out of band).
  • The python_compat.py fixture mirror was updated to include capture_results, keeping the Rust↔Python wire-format test faithful; a new wire test pins the index-7 position.

Why

The Rust frontend replaces vLLM's Python OpenAI entrypoint; these are the steering/capture surfaces clients rely on. Per-request resolution/admission stays in the engine-core; the frontend decodes the wire format, forwards, manages the named-module registry, and surfaces capture results.

Notes / deferred

  • Capture-spec validation parity: the Python entrypoint's synchronous _admit_capture spec validation is deferred to the engine-core here, so a malformed capture spec surfaces as an engine error rather than a 400 up front.

See rust/docs/features/steering_capture.md for the full data flow (incl. the capture return path), file map, and invariants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant