[XPU] update dp rank w/o env-var isolation#8
Open
zhenwei-intel wants to merge 700 commits into
Open
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces XPU support for device control and worker initialization in the V1 engine. It updates vllm/v1/engine/utils.py to bypass device environment isolation for XPU platforms and modifies vllm/v1/worker/xpu_worker.py to calculate the local rank offset in data-parallel scenarios. Feedback was provided regarding a potential bug where modifying self.local_rank might incorrectly affect the LOCAL_RANK environment variable, along with suggestions to remove redundant calls and improve assertion clarity.
Comment on lines
+72
to
+81
| self.local_rank += dp_local_rank * tp_pp_world_size | ||
| assert self.local_rank < torch.accelerator.device_count(), ( | ||
| f"DP adjusted local rank {self.local_rank} is out of bounds. " | ||
| ) | ||
| visible_device_count = torch.accelerator.device_count() | ||
| assert parallel_config.local_world_size <= visible_device_count, ( | ||
| f"local_world_size ({parallel_config.local_world_size}) must " | ||
| f"be less than or equal to the number of visible devices " | ||
| f"({visible_device_count})." | ||
| ) |
There was a problem hiding this comment.
There are a few issues in this block:
- Potential Bug with
LOCAL_RANK: Modifyingself.local_rankto the absolute device index will cause the environment variableLOCAL_RANKto be set incorrectly at line 105 (os.environ["LOCAL_RANK"] = str(self.local_rank)). In a data-parallel setup,LOCAL_RANKis typically expected by distributed backends (like oneCCL) to be the rank relative to the local collective group (e.g.,0toTP*PP-1), not the absolute device index on the node. Since environment variable isolation is being disabled for XPU, you should ensure thatLOCAL_RANKremains the relative rank while only using the offset value for device selection andinit_distributed_environment(to match CUDA behavior). - Redundancy:
torch.accelerator.device_count()is called twice redundantly. - Assertion Clarity: The assertion message for the rank check is missing the actual limit, which makes debugging harder.
Suggested change
| self.local_rank += dp_local_rank * tp_pp_world_size | |
| assert self.local_rank < torch.accelerator.device_count(), ( | |
| f"DP adjusted local rank {self.local_rank} is out of bounds. " | |
| ) | |
| visible_device_count = torch.accelerator.device_count() | |
| assert parallel_config.local_world_size <= visible_device_count, ( | |
| f"local_world_size ({parallel_config.local_world_size}) must " | |
| f"be less than or equal to the number of visible devices " | |
| f"({visible_device_count})." | |
| ) | |
| # DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANK | |
| self.local_rank += dp_local_rank * tp_pp_world_size | |
| visible_device_count = torch.accelerator.device_count() | |
| assert self.local_rank < visible_device_count, ( | |
| f"DP adjusted local rank {self.local_rank} is out of bounds " | |
| f"(max {visible_device_count})." | |
| ) | |
| assert parallel_config.local_world_size <= visible_device_count, ( | |
| f"local_world_size ({parallel_config.local_world_size}) must " | |
| f"be less than or equal to the number of visible devices " | |
| f"({visible_device_count})." | |
| ) |
8188faa to
e47557b
Compare
…llm-project#32325) Signed-off-by: Dong Wang <dongw2019@gmail.com>
…oadingManager key parameters (vllm-project#41361) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…llm-project#40808) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
…m-project#41050) Signed-off-by: Juhi Mittal <juhim@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…ups (vllm-project#41228) Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
…fusion ordering vllm-project#27893 (vllm-project#39505) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
…ect#41199) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…abeled child (vllm-project#40840) When vLLM runs with Ray Prometheus `vllm:request_success{finished_reason=...}` only ever increments the repetition bucket regardless of the request's actual finish reason; stop, length, abort, and error stay at zero. Root cause was `labels()` mutated the wrapped Ray metric's default tags in place and returned self, so every `.labels(...)` call on a given wrapper returned the same object. Co-authored-by: Marwan Sarieddine <sarieddine.marwan@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com> Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: George-ao <yuyiao772@gmail.com>
…34726) Signed-off-by: raviguptaamd <ravi.gupta@amd.com>
This is the final PR in a series to enables HMA support for the offloading connector. The connector advertises `SupportsHMA` and is validated with unit tests and e2e tests. Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: ganyi <ygan@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com> Signed-off-by: junkang1991 <junkangchow@gmail.com> Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com> Co-authored-by: junkang1991 <junkangchow@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>
…lm-project#32623) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…-V4 (vllm-project#41255) Signed-off-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
…lm-project#41444) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
…llm-project#41476) Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
…oject#41478) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ivations) support (vllm-project#41769) Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…atures without `KVCacheConfig` (vllm-project#39832) The v0.12.0 release contained initial support for HMA in KV Connectors. As part of these changes, a KVCacheConfig argument was added to KV connector constructors. Backwards compatibility support for out-of-tree connectors was included in this change, with a very prominent warning. See vllm-project#25712 and vllm-project#27887. Since the warning has been around for over 5 months, we can safely remove the support of it. Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…llm-project#41846) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
vllm-project#33322) Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com> Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
…llm-project#42176) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…0951) Signed-off-by: Christian Van <cvan20191@gmail.com> Co-authored-by: Christian Van <cvan20191@gmail.com>
…ject#39306) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Signed-off-by: Itay Etelis <etelis2019@gmail.com> Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Itay Etelis <etelis2019@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…roject#41573) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…to free stranded KV blocks (vllm-project#41269)
…llm-project#41313) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…oject#41266) Signed-off-by: abdulrahman-cohere <abdulrahman.abdulrazzag@cohere.com> Signed-off-by: <> Co-authored-by: Cursor Agent <cursor-agent@cursor.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
…y OOM (vllm-project#38502) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…-project#37912) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <Isotr0py@outlook.com>
…ity (vllm-project#41932) Signed-off-by: jmamou <jonathan.mamou@intel.com> Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
…e_store() (vllm-project#41366) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…project#41617) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
…ble (vllm-project#41499) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
…t#41979) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Jackmin801 <ongjackm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
The old vLLM worker handles DP on XPU by calling set_device_control_env_var to set ZE_AFFINITY_MASK, so each DP worker only sees a device subset. In practice, this approach can introduce accuracy issues when DP+EP.
Example (ZE_AFFINITY_MASK='2,3'):
This PR switches XPU worker device assignment to follow the same approach as the GPU worker
where the device is assigned using:
DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANKExample (ZE_AFFINITY_MASK='2,3'):
Test Plan
output:
{"id":"cmpl-bfd8e7a7ecbe4688","object":"text_completion","created":1776220625,"model":"Qwen/Qwen3-30B-A3B","choices":[{"index":0,"text":"1776, Declaration of Independence, United S tates of America. It's a great place to","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier ":null,"system_fingerprint":null,"usage":{"prompt_tokens":131,"total_tokens":151,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}⏎Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.