[Feat] Support PD Disaggregation via CRD Operator by Vivo50E · Pull Request #841 · vllm-project/production-stack

Vivo50E · 2026-02-19T08:53:21Z

Summary

Add first-class CRD support for Prefill-Decode (PD) disaggregated serving in the production stack operator. Previously, PD disaggregation was only configurable via Helm values with manual multi-modelSpec setup. This PR enables a declarative xPyD topology (e.g., 2P2D) through a single VLLMRuntime resource, using NIXL for point-to-point GPU KV cache transfer.

Key Changes

Operator / CRD

Add enablePDDisaggregation and topology (prefill/decode) fields to VLLMRuntime CRD
Operator controller creates separate Deployments, Services, and PVCs for prefill and decode node pools
Add LMCache/NIXL environment variable injection for KV transfer config
Add xPyD sample YAMLs (VLLMRuntime + VLLMRouter) and unit tests

Router (NIXL-based disaggregated_prefill routing)

Adds a new route_disaggregated_prefill_nixl_request alongside the existing route_disaggregated_prefill_request (LMCache shared storage mode). The NIXL path is automatically selected when ZMQ proxy is active (hasattr(app.state, 'zmq_proxy')), preserving backward compatibility for Helm-based disagg deployments.

_prepare_nixl_prefill_request() — tokenization (NIXL requires token IDs), disagg_spec construction with decode node IP, kv_transfer_params injection
_convert_completion_chunk_to_chat() — converts /v1/completions SSE chunks to chat.completion.chunk format
_clean_completion_chunk() — strips extra fields (prompt_token_ids, token_ids) from completion chunks
Add ZMQ proxy module (zmq_proxy.py) for KV transfer completion notifications
Add wait_decode_kv_ready() with 10s timeout for recompute fallback
Fix ZMQ message decode: LMCache 0.3.13+ sends ProxyNotif but router expected NixlMsg — added fallback dict decoding and removed fatal break
Fix NixlMsg import: add fallback chain for LMCache 0.3.13+ compatibility
Fix HTTP client consistency: StaticServiceDiscovery used httpx.AsyncClient while rest of codebase uses aiohttp.ClientSession
Add --nixl-peer-host/port, --nixl-proxy-host/port CLI args
Add zmq/msgspec deps to pyproject.toml

Docs

Add tutorial 25-disagg-prefill-crd-enabled.md

Component Diagram (2P2D)

graph TB
    Client([Client])

    subgraph "VLLMRouter CR"
        R[Router Pod]
        ZMQ[ZMQ PULL :7500]
    end

    subgraph "VLLMRuntime CR (enablePDDisaggregation: true)"
        subgraph "topology.prefill (replicas: 2)"
            P1["Prefill Pod 1<br/><i>vLLM + LMCache + NIXL</i><br/>kv_producer / sender"]
            P2["Prefill Pod 2<br/><i>vLLM + LMCache + NIXL</i><br/>kv_producer / sender"]
        end
        subgraph "topology.decode (replicas: 2)"
            D1["Decode Pod 1<br/><i>vLLM + LMCache + NIXL</i><br/>kv_consumer / receiver"]
            D2["Decode Pod 2<br/><i>vLLM + LMCache + NIXL</i><br/>kv_consumer / receiver"]
        end
    end

    Client -->|"① request"| R
    R -->|"② prefill"| P1 & P2
    P1 & P2 -.->|"③ NIXL KV transfer"| D1 & D2
    P1 & P2 -->|"④ ZMQ notify"| ZMQ
    R -->|"⑤ decode"| D1 & D2
    D1 & D2 -->|"⑥ tokens"| R
    R -->|"⑦ response"| Client

Relation to PR #669

The router-side NIXL KV transfer logic builds on #669 ([Feat][PD] latest PD support from LMCache with NIXL by @kobe0938). Components originating from #669:

src/vllm_router/services/request_service/request.py — route_disaggregated_prefill_nixl_request flow
src/vllm_router/app.py — ZMQ task lifecycle in FastAPI lifespan
src/vllm_router/parsers/parser.py — nixl CLI args
src/vllm_router/service_discovery.py — prefill/decode client session management

Test plan

Go operator controller unit tests (6/6 passing via go test --ginkgo.focus="VLLMRuntime")
Pre-commit checks all passing
End-to-end manual test on Minikube + 4 GPUs (4x A16) with lmcache/vllm-openai:latest (vLLM 0.15.0, LMCache 0.3.13, NIXL 0.9.0), model meta-llama/Llama-3.2-3B-Instruct — 4/4 tests pass

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

gemini-code-assist · 2026-02-19T08:54:55Z

Summary of Changes

Hello @Vivo50E, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the vLLM production stack by introducing native Kubernetes Custom Resource Definition (CRD) support for Prefill-Decode (PD) disaggregated serving. This change allows users to define a 2-Prefill, 2-Decode (2P2D) topology declaratively within a single VLLMRuntime resource, streamlining the deployment and management of complex, high-throughput model serving architectures. The operator now intelligently provisions and manages distinct prefill and decode components, facilitating efficient KV cache transfer and robust routing.

Highlights

CRD Operator Support for Prefill-Decode (PD) Disaggregation: Introduced first-class CRD support for Prefill-Decode (PD) disaggregated serving, enabling declarative 2-Prefill, 2-Decode (2P2D) topology through a single VLLMRuntime resource, moving away from manual Helm value configurations.
VLLMRuntime CRD Enhancements: Added enablePDDisaggregation and topology fields to the VLLMRuntime CRD, allowing the operator to create separate Deployments, Services, and PVCs for prefill and decode node pools.
VLLMRouter Integration for PD: Extended VLLMRouter to support PD-specific arguments, including Nixl proxy and peer host/port configurations, and conditionally starts a ZMQ task only when routing logic is disaggregated_prefill.
Improved ZMQ Message Decoding: Implemented fixes for ZMQ proxy message decoding, addressing issues where LMCache sent ProxyNotif while the router expected NixlMsg, preventing crashes and adding fallback dictionary decoding.
KV Transfer Timeout: Added a 10-second timeout to wait_decode_kv_ready() to ensure decode operations proceed via kv_load_failure_policy=recompute if KV transfer notifications are delayed.
New Dockerfile and Documentation: Included a new Dockerfile.pd for building PD-capable router images and comprehensive documentation, including a tutorial for end-to-end 2P2D CRD deployment and troubleshooting guides.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.github/values-06-session-routing.yaml
- Updated model specifications to use llama-prefill and llama-decode names.
- Adjusted lmcacheConfig for Prefill-Decode (PD) settings.
- Updated router image and removed some router resource and probe configurations.
.github/values-07-prefix-routing.yaml
- Updated model specifications to use llama-prefill and llama-decode names.
- Adjusted lmcacheConfig for Prefill-Decode (PD) settings.
- Updated router image and removed some router resource and probe configurations.
.github/values-08-roundrobin-routing.yaml
- Updated model specifications to use llama-prefill and llama-decode names.
- Adjusted lmcacheConfig for Prefill-Decode (PD) settings.
- Updated router image and removed some router resource and probe configurations.
.github/values-09-kvaware-routing.yaml
- Updated model specifications to use llama-prefill and llama-decode names.
- Adjusted lmcacheConfig for Prefill-Decode (PD) settings.
- Updated router image and removed some router resource and probe configurations.
.github/values-10-disagg-prefill.yaml
- Updated model names and image tags for prefill and decode nodes.
- Modified lmcacheConfig for PD disaggregation, including Nixl proxy and peer settings.
- Updated router image and added Nixl peer/proxy host/port arguments to the router configuration.
docker/Dockerfile.pd
- Added a new Dockerfile for building a PD-capable router image, including ZMQ, msgspec, and httpx dependencies.
docs/source/developer_guide/docker.rst
- Updated Docker build command to reference the new Dockerfile.pd.
helm/templates/deployment-router.yaml
- Added conditional Nixl peer/proxy host/port arguments to the router deployment.
- Included new PD-related container ports (7100-7500) for Nixl communication.
helm/templates/deployment-vllm-multi.yaml
- Modified kv-transfer-config for LMCacheConnectorV1 to include rpcPort and skip_last_n_tokens.
- Added new environment variables for Python hashing and multiprocessing (PYTHONHASHSEED, VLLM_ENABLE_V1_MULTIPROCESSING, VLLM_WORKER_MULTIPROC_METHOD).
- Removed LMCACHE_LOCAL_CPU and LMCACHE_MAX_LOCAL_CPU_SIZE from PD-specific configuration.
- Added PD-related container ports (7100-7500) to vLLM deployments.
operator/api/v1alpha1/vllmrouter_types.go
- Extended RoutingLogic enum to include kvaware, prefixaware, and disaggregated_prefill.
operator/api/v1alpha1/vllmruntime_types.go
- Added EnablePDDisaggregation flag to control PD mode.
- Introduced TopologySpec with Prefill and Decode NodeConfigs for PD deployments.
- Added new fields to LMCacheConfig for PD-specific parameters, including KVRole, LocalCPU, Nixl settings, RPCPort, and SkipLastNTokens.
operator/api/v1alpha1/zz_generated.deepcopy.go
- Added deepcopy methods for the newly introduced NodeConfig and TopologySpec types.
operator/config/crd/bases/production-stack.vllm.ai_vllmrouters.yaml
- Updated the CRD schema for VLLMRouter to reflect the new RoutingLogic enum values.
operator/config/crd/bases/production-stack.vllm.ai_vllmruntimes.yaml
- Updated the CRD schema for VLLMRuntime to include enablePDDisaggregation, topology, and the expanded lmCacheConfig fields.
operator/config/crd/bases/sample.yaml
- Added a new sample CRD definition for LLMInference with a detailed PD topology configuration.
operator/config/default.yaml
- Updated CRD schemas for VLLMRouter and VLLMRuntime to align with new PD features.
- Changed the operator image and pull policy.
operator/config/default/kustomization.yaml
- Modified kustomization to use manager_pull_policy_patch.yaml instead of manager_image_patch.yaml.
operator/config/default/manager_image_patch.yaml
- Updated the operator image to xueey7/production-stack-operator:latest.
operator/config/default/manager_pull_policy_patch.yaml
- Added a new patch file to set the manager's imagePullPolicy to IfNotPresent.
operator/config/manager/kustomization.yaml
- Added image customization for the controller, specifying production-stack-operator with latest tag.
operator/config/samples/kustomization.yaml
- Added new sample YAMLs for PD runtime (production-stack_v1alpha1_vllmruntime_pd.yaml) and router (production-stack_v1alpha1_vllmrouter_pd.yaml).
operator/config/samples/production-stack_v1alpha1_cacheserver.yaml
- Updated the cache server image tag to latest-nightly.
operator/config/samples/production-stack_v1alpha1_vllmrouter_pd.yaml
- Added a new sample VLLMRouter configuration specifically for PD disaggregation, including disaggregated_prefill routing logic and Nixl parameters.
operator/config/samples/production-stack_v1alpha1_vllmruntime_2p2d.yaml
- Added a new sample VLLMRuntime configuration for a 2P2D disaggregated setup, detailing prefill and decode node configurations.
operator/config/samples/sample.yaml
- Added a new sample LLMInference CRD definition with detailed model, topology, KV, routing, autoscaling, rollout, and observability configurations.
operator/internal/controller/cacheserver_controller.go
- Updated the cache server command to use /opt/venv/bin/lmcache_server.
operator/internal/controller/vllmrouter_controller.go
- Modified router deployment to use a new buildRouterContainerPorts function.
- Added Nixl ports to the router service for disaggregated_prefill routing logic.
- Introduced buildRouterContainerPorts function to dynamically add ports based on routing logic.
operator/internal/controller/vllmruntime_controller.go
- Implemented reconcilePDDisaggregation logic to manage separate prefill and decode resources.
- Added helper functions: reconcilePrefillResources, reconcileDecodeResources, serviceForNode, pvcForNode, deploymentForNode, updatePDStatus, buildVLLMArgsForNode, buildEnvironmentVariablesForNode, buildResourceRequirementsForNode, buildVolumesAndMountsForNode, and buildSidecarContainerForNode.
- Modified serviceNeedsUpdate to skip updates for PD disaggregation mode.
- Adjusted VLLM_USE_V1 environment variable logic for compatibility with LMCacheConnectorV1.
operator/internal/controller/vllmruntime_controller_test.go
- Added new test contexts to cover PD disaggregation mode and legacy mode.
- Included assertions for correct creation of prefill/decode deployments and services, and validation of LMCache environment variables in PD mode.
proposals/pd-disagg-crd-support.md
- Added a new proposal document outlining the summary, motivation, goals, and implementation plan for PD disaggregation CRD support.
src/vllm_router/app.py
- Imported DisaggregatedPrefillRouter and ZMQ task management functions (start_zmq_task, stop_zmq_task).
- Set the event_loop early for service discovery.
- Conditionally started and stopped the ZMQ task within the FastAPI lifespan manager based on whether DisaggregatedPrefillRouter is active.
src/vllm_router/parsers/parser.py
- Added new command-line arguments for configuring Nixl peer and proxy host/port settings.
src/vllm_router/requirements.txt
- Added httpx as a new dependency.
src/vllm_router/routers/main_router.py
- Modified route_chat_completion and route_completion to delegate to route_disaggregated_prefill_request when DisaggregatedPrefillRouter is in use.
src/vllm_router/service_discovery.py
- Updated client session initialization to use httpx.AsyncClient instead of aiohttp.ClientSession for prefill and decode endpoints.
- Adjusted client session initialization logic to be called when the event loop is available.
src/vllm_router/services/request_service/request.py
- Introduced ZMQ-related functions (zmq_pull_server, start_zmq_task, stop_zmq_task) for KV transfer notifications.
- Implemented wait_decode_kv_ready with a timeout for KV transfer completion.
- Refactored route_disaggregated_prefill_request to handle prompt tokenization, construct kv_transfer_params with dynamic receiver host, send requests to prefiller, and stream responses from the decode service, including format conversion for chat completions.
- Added fallback decoding for ZMQ messages to handle ProxyNotif and NixlMsg compatibility.
tests/e2e/run-static-discovery-routing-test.sh
- Added Nixl peer and proxy host/port arguments to the router startup command.
tutorials/25-disagg-prefill-crd-enabled.md
- Added a new comprehensive tutorial document detailing how to deploy and manage the vLLM production stack using the CRD operator, focusing on basic and advanced (2P2D) disaggregated prefill deployments.
tutorials/assets/values-16-disagg-prefill.yaml
- Updated model names, image tags, and Nixl configuration parameters for both prefill and decode nodes in the disaggregated prefill setup.
utils/get_helm.sh
- Added a new utility script for fetching and installing Helm.
uv.lock
- Updated Python package versions and hashes for aiofile, aiohttp, awscrt, cufile-python, hf-xet, huggingface-hub, lmcache, setuptools, setuptools-scm, tokenizers, and transformers.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/functionality-helm-chart.yml
- .github/workflows/router-e2e-test.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new functionality by adding first-class CRD support for Prefill-Decode (PD) disaggregation, which is a great step towards more declarative and Kubernetes-native management of vLLM deployments. The changes are extensive, touching Helm charts, the operator controller, CRD definitions, and the router application. The introduction of the TopologySpec in the VLLMRuntime CRD is a clean approach to managing the disaggregated setup, and the accompanying test updates are thorough.

My review has identified a few key areas for improvement. There is a critical issue regarding inconsistent HTTP client usage (aiohttp vs. httpx) in the router's service discovery, which will likely lead to runtime errors. Additionally, for production readiness, the operator image should be sourced from an official organization repository instead of a personal one. I've also noted a couple of potentially extraneous sample files that might need to be removed for clarity. Overall, this is a strong contribution, and addressing these points will further enhance its quality and robustness.

ruizhang0101

Hi, Thanks for the contribution :))

I have several concerns towards this PR:

For CRD support, why are there changes related to helm such as helm-related GitHub wf and helm template.
Most of the codes seem too old, it actually tries to "revert" some of the PRs, I would suggest a re-implementation based on the current version.

Vivo50E · 2026-02-21T09:20:10Z

Hi, Thanks for the contribution :))

I have several concerns towards this PR:

For CRD support, why are there changes related to helm such as helm-related GitHub wf and helm template.

Most of the codes seem too old, it actually tries to "revert" some of the PRs, I would suggest a re-implementation based on the current version.

@ruizhang0101 , Thanks for reviewing. I drafted this PR a long time ago and haven’t had much time to revisit it since then. I’ll rebase my changes onto the latest version and address your comments:>

ruizhang0101

Hi, Thanks for the prompt update and all the contribution! This version looks so much better, I left some comments, please let me know if there is any questions.

Also, a very IMPORTANT thing for this PR to be merged is that, could you show that it is working as expected following the tutorial? And also, it should not break the original functionality.

ruizhang0101 · 2026-03-17T19:03:45Z

Hi, Could you fix the CI issue? Lemme know if you have any questions :)))

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

…t and improved error handling Signed-off-by: Yiqi Xue <xuey666@gmail.com>

…ency Signed-off-by: Yiqi Xue <xuey666@gmail.com>

…hunk processing Signed-off-by: Yiqi Xue <xuey666@gmail.com>

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

Signed-off-by: Yiqi Xue <xuey666@gmail.com> Made-with: Cursor

Vivo50E · 2026-04-05T20:20:38Z

Hi @ruizhang0101, I pushed a fix to ensure the NIXL path is only activated when nixl_proxy_host is configured. I've run the CI tests locally (static-discovery, k8s-discovery, helm chart Two-Pods-Minimal-Example) and they all pass. Also verified CRD-based 2P2D disaggregated prefill works end-to-end locally with the router correctly splitting traffic to prefill/decode nodes and ZMQ proxy running on the configured port.
Could you please approve and trigger the CI workflow? Thanks!

Vivo50E · 2026-04-06T08:18:34Z

Hi @ruizhang0101, I pushed a fix to ensure the NIXL path is only activated when nixl_proxy_host is configured. I've run the CI tests locally (static-discovery, k8s-discovery, helm chart Two-Pods-Minimal-Example) and they all pass. Also verified CRD-based 2P2D disaggregated prefill works end-to-end locally with the router correctly splitting traffic to prefill/decode nodes and ZMQ proxy running on the configured port. Could you please approve and trigger the CI workflow? Thanks!

@ruizhang0101 The CI failures (Secure-Minimal-Example, CRD-Validation, k8s-discovery-e2e-test) all show minikube host: Stopped at the very first step — this appears to be a runner environment issue rather than a code problem. Could you re-trigger the CI? Thanks

Vivo50E · 2026-04-09T07:54:33Z

@ruizhang0101 The CI checks have all passed now. Could you please take another look and help move this PR forward when you have a moment? Thanks a lot!

ruizhang0101

I think it is not a good idea to have this breaking change in spec schema because of PD. This introduces so much confusion of the existing spec. Also it introduces many duplications in reconcile logic. The helm chart solution for PD is to assign "role" for each model runtime, and routing based on the role. I would suggest the same pattern. Using different CR to represent prefill/decode.

Also, this PR is too large, I would suggest seperate PR to address this problem.
PR1: PD Routing logic
PR2: CRD deployment for PD

Aside from that, I have attached some comments. Please take a look when you have time.

ruizhang0101 · 2026-04-09T18:51:50Z

+
+	// Legacy fields (used when EnablePDDisaggregation=false)
+	// Model configuration
+	Model ModelSpec `json:"model,omitempty"`


For model, vllmconfig and deployment config, they were required, could you also make them required here?

ruizhang0101 · 2026-04-09T19:22:06Z

+                        logger.warning(f"ZMQ: unknown message format: {msg_dict}")
+                        continue
+                req_id = msg.req_id
+                self._finished_reqs.add(req_id)


It is better to have ttl for this list since the request number can be huge, this will cause OOM for router pod

ruizhang0101 · 2026-04-09T19:25:13Z

 	}

+	// Check if PD disaggregation is enabled
+	log.Info("Checking PD disaggregation flag", "Name", vllmRuntime.Name, "EnablePDDisaggregation", vllmRuntime.Spec.EnablePDDisaggregation)


I think these PD logs should be debug level instead of info

ruizhang0101 · 2026-04-09T19:31:43Z

    app.state.request_stats_monitor = get_request_stats_monitor()
    app.state.router = get_routing_logic()
    app.state.request_rewriter = get_request_rewriter()
+    app.state.args = args


It is not recommended to stash all the args, Consider using a dataclass or named field.

ruizhang0101 · 2026-04-09T19:33:51Z


-        if self.event_loop_ready.is_set() and self.event_loop is not None:
-            try:
+        # Track all models we've ever seen


Please revert this since it is not relevant to this PR

ruizhang0101 · 2026-04-09T19:34:44Z

+        Initialize aiohttp client sessions for prefill and decode endpoints.
        This must be called from an async context during app startup.
        """
+        logger.info(


These should not be info logs

Vivo50E · 2026-04-13T00:04:19Z

I think it is not a good idea to have this breaking change in spec schema because of PD. This introduces so much confusion of the existing spec. Also it introduces many duplications in reconcile logic. The helm chart solution for PD is to assign "role" for each model runtime, and routing based on the role. I would suggest the same pattern. Using different CR to represent prefill/decode.

Also, this PR is too large, I would suggest seperate PR to address this problem. PR1: PD Routing logic PR2: CRD deployment for PD

Aside from that, I have attached some comments. Please take a look when you have time.

Hi @ruizhang0101, thanks for the feedback!
Agreed on both points. I've split the work as suggested:
PR1 (#913): PD routing logic only — NIXL-based disaggregated_prefill path + ZMQ proxy. No operator/CRD changes. All E2E routing tests pass locally.
PR2 (follow-up): CRD deployment for PD, redesigned using separate CRs for prefill/decode nodes per your suggestion.
I've also addressed the inline review comments. Please take a look when you have time!

gemini-code-assist Bot reviewed Feb 19, 2026

View reviewed changes

Comment thread src/vllm_router/service_discovery.py Outdated

Comment thread operator/config/default.yaml

Comment thread operator/config/samples/sample.yaml Outdated

Vivo50E changed the title ~~[WIP][Feat] Support PD Disaggregation via CRD Operator~~ [Feat] Support PD Disaggregation via CRD Operator Feb 19, 2026

Vivo50E marked this pull request as ready for review February 19, 2026 22:19

ruizhang0101 requested changes Feb 21, 2026

View reviewed changes

Vivo50E force-pushed the pd_crd branch from ed82bd1 to 7693b18 Compare February 22, 2026 01:46

Vivo50E requested a review from ruizhang0101 February 22, 2026 08:24

ruizhang0101 requested changes Feb 25, 2026

View reviewed changes

Vivo50E force-pushed the pd_crd branch 3 times, most recently from 9ec9991 to e6caf27 Compare March 11, 2026 23:47

Vivo50E requested a review from ruizhang0101 March 12, 2026 04:39

ruizhang0101 mentioned this pull request Mar 12, 2026

[Roadmap] vLLM Production Stack 2026 Roadmap #855

Open

24 tasks

Vivo50E force-pushed the pd_crd branch from 68bc7ba to 8a712da Compare April 4, 2026 21:59

Vivo50E added 8 commits April 5, 2026 19:57

feat(operator): Add CRD support for PD disaggregated serving

f8058da

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

feat(request): Enhance disaggregated prefill routing with NIXL suppor…

9573c29

…t and improved error handling Signed-off-by: Yiqi Xue <xuey666@gmail.com>

fix(config): Update VLLMRouter image name and pull policy for consist…

135bc3f

…ency Signed-off-by: Yiqi Xue <xuey666@gmail.com>

feat(request): Implement NIXL prefill request handling and refactor c…

cccb9e1

…hunk processing Signed-off-by: Yiqi Xue <xuey666@gmail.com>

pass pre-commit

6413c70

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

fix RoutingLogic enum options for VLLMRouter

0a592c1

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

fix: remove 'messages' from request_json for Nixl prefill

08e9d63

Signed-off-by: Yiqi Xue <xuey666@gmail.com>

fix: only use NIXL path when nixl_proxy_host is configured

c30555b

Signed-off-by: Yiqi Xue <xuey666@gmail.com> Made-with: Cursor

Vivo50E force-pushed the pd_crd branch from 8a712da to c30555b Compare April 5, 2026 19:59

ruizhang0101 requested changes Apr 9, 2026

View reviewed changes

Vivo50E mentioned this pull request Apr 12, 2026

[Feat] Add NIXL-based disaggregated prefill routing support #913

Open

3 tasks

Conversation

Vivo50E commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Component Diagram (2P2D)

Relation to PR #669

Test plan

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

gemini-code-assist Bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ruizhang0101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vivo50E commented Feb 21, 2026

Uh oh!

ruizhang0101 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ruizhang0101 commented Mar 17, 2026

Uh oh!

Vivo50E commented Apr 5, 2026

Uh oh!

Vivo50E commented Apr 6, 2026

Uh oh!

Vivo50E commented Apr 9, 2026

Uh oh!

ruizhang0101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ruizhang0101 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Apr 9, 2026

Choose a reason for hiding this comment

Vivo50E commented Feb 19, 2026 •

edited

Loading

ruizhang0101 left a comment •

edited

Loading