Skip to content

[XPU] update dp rank w/o env-var isolation#8

Open
zhenwei-intel wants to merge 700 commits into
mainfrom
fix_xpu_dp
Open

[XPU] update dp rank w/o env-var isolation#8
zhenwei-intel wants to merge 700 commits into
mainfrom
fix_xpu_dp

Conversation

@zhenwei-intel

@zhenwei-intel zhenwei-intel commented Apr 15, 2026

Copy link
Copy Markdown
Owner

Purpose

The old vLLM worker handles DP on XPU by calling set_device_control_env_var to set ZE_AFFINITY_MASK, so each DP worker only sees a device subset. In practice, this approach can introduce accuracy issues when DP+EP.

Example (ZE_AFFINITY_MASK='2,3'):

  • ZE_AFFINITY_MASK='2' for DP0, using torch.device("xpu:0")
  • ZE_AFFINITY_MASK='3' for DP1, also using torch.device("xpu:0")

This PR switches XPU worker device assignment to follow the same approach as the GPU worker
where the device is assigned using:
DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANK

Example (ZE_AFFINITY_MASK='2,3'):

  • DP0, using torch.device("xpu:0")
  • DP1, using torch.device("xpu:1")

Test Plan

ZE_AFFINITY_MASK=4,5 vllm serve Qwen/Qwen3-30B-A3B --enforce-eager --tensor-parallel-size 1 --data-parallel-size 2 --quantization fp8 -ep --port 8222 --block-size 64 --gpu-memory-utilization 0.8 --max-model-len 8192

curl http://localhost:8222/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen3-30B-A3B", "prompt": "Red H
at is the best company in the world to work for because it works on open source software, which means that all the contributions are delivered to the community. As a result, when w
orking on projects like vLLM we are able to meet many amazing people from various organizations like AMD, Google, NVIDIA, We hold these truths to be self-evident, that all men are
created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights
, Governments are instituted among Men, deriving their just powers from the consent of the governed, ", "max_tokens": 20}'

output:

{"id":"cmpl-bfd8e7a7ecbe4688","object":"text_completion","created":1776220625,"model":"Qwen/Qwen3-30B-A3B","choices":[{"index":0,"text":"1776, Declaration of Independence, United S
tates of America. It's a great place to","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier
":null,"system_fingerprint":null,"usage":{"prompt_tokens":131,"total_tokens":151,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}⏎

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces XPU support for device control and worker initialization in the V1 engine. It updates vllm/v1/engine/utils.py to bypass device environment isolation for XPU platforms and modifies vllm/v1/worker/xpu_worker.py to calculate the local rank offset in data-parallel scenarios. Feedback was provided regarding a potential bug where modifying self.local_rank might incorrectly affect the LOCAL_RANK environment variable, along with suggestions to remove redundant calls and improve assertion clarity.

Comment on lines +72 to +81
self.local_rank += dp_local_rank * tp_pp_world_size
assert self.local_rank < torch.accelerator.device_count(), (
f"DP adjusted local rank {self.local_rank} is out of bounds. "
)
visible_device_count = torch.accelerator.device_count()
assert parallel_config.local_world_size <= visible_device_count, (
f"local_world_size ({parallel_config.local_world_size}) must "
f"be less than or equal to the number of visible devices "
f"({visible_device_count})."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are a few issues in this block:

  1. Potential Bug with LOCAL_RANK: Modifying self.local_rank to the absolute device index will cause the environment variable LOCAL_RANK to be set incorrectly at line 105 (os.environ["LOCAL_RANK"] = str(self.local_rank)). In a data-parallel setup, LOCAL_RANK is typically expected by distributed backends (like oneCCL) to be the rank relative to the local collective group (e.g., 0 to TP*PP-1), not the absolute device index on the node. Since environment variable isolation is being disabled for XPU, you should ensure that LOCAL_RANK remains the relative rank while only using the offset value for device selection and init_distributed_environment (to match CUDA behavior).
  2. Redundancy: torch.accelerator.device_count() is called twice redundantly.
  3. Assertion Clarity: The assertion message for the rank check is missing the actual limit, which makes debugging harder.
Suggested change
self.local_rank += dp_local_rank * tp_pp_world_size
assert self.local_rank < torch.accelerator.device_count(), (
f"DP adjusted local rank {self.local_rank} is out of bounds. "
)
visible_device_count = torch.accelerator.device_count()
assert parallel_config.local_world_size <= visible_device_count, (
f"local_world_size ({parallel_config.local_world_size}) must "
f"be less than or equal to the number of visible devices "
f"({visible_device_count})."
)
# DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANK
self.local_rank += dp_local_rank * tp_pp_world_size
visible_device_count = torch.accelerator.device_count()
assert self.local_rank < visible_device_count, (
f"DP adjusted local rank {self.local_rank} is out of bounds "
f"(max {visible_device_count})."
)
assert parallel_config.local_world_size <= visible_device_count, (
f"local_world_size ({parallel_config.local_world_size}) must "
f"be less than or equal to the number of visible devices "
f"({visible_device_count})."
)

sniper35 and others added 28 commits May 1, 2026 10:06
…oadingManager key parameters (vllm-project#41361)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…llm-project#40808)

Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
…m-project#41050)

Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
…fusion ordering vllm-project#27893   (vllm-project#39505)

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
…ect#41199)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…abeled child (vllm-project#40840)

When vLLM runs with Ray Prometheus `vllm:request_success{finished_reason=...}`
only ever increments the repetition bucket regardless of the request's actual finish
reason; stop, length, abort, and error stay at zero. Root cause was `labels()` mutated
the wrapped Ray metric's default tags in place and returned self, so every `.labels(...)`
call on a given wrapper returned the same object. 

Co-authored-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: George-ao <yuyiao772@gmail.com>
This is the final PR in a series to enables HMA support for the
offloading connector. The connector advertises `SupportsHMA`
and is validated with unit tests and e2e tests.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
Signed-off-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
Co-authored-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
…lm-project#32623)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…-V4 (vllm-project#41255)

Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
…oject#41478)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
juhi10071998 and others added 30 commits May 9, 2026 21:15
…ivations) support (vllm-project#41769)

Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…atures without `KVCacheConfig` (vllm-project#39832)

The v0.12.0 release contained initial support for HMA in KV Connectors. As part
of these changes, a KVCacheConfig argument was added to KV connector
constructors. Backwards compatibility support for out-of-tree connectors was
included in this change, with a very prominent warning. See vllm-project#25712 and vllm-project#27887.

Since the warning has been around for over 5 months, we can safely remove
the support of it.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…llm-project#41846)

Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
vllm-project#33322)

Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com>
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
…0951)

Signed-off-by: Christian Van <cvan20191@gmail.com>
Co-authored-by: Christian Van <cvan20191@gmail.com>
…ject#39306)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: Itay Etelis <etelis2019@gmail.com>
Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Itay Etelis <etelis2019@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
)

Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
…oject#41266)

Signed-off-by: abdulrahman-cohere <abdulrahman.abdulrazzag@cohere.com>
Signed-off-by: <>
Co-authored-by: Cursor Agent <cursor-agent@cursor.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
…y OOM (vllm-project#38502)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…-project#37912)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
…ity (vllm-project#41932)

Signed-off-by: jmamou <jonathan.mamou@intel.com>
Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
…e_store() (vllm-project#41366)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…project#41617)

Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
…ble (vllm-project#41499)

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
…t#41979)

Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Jackmin801 <ongjackm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.