[XPU] update dp rank w/o env-var isolation by zhenwei-intel · Pull Request #8 · zhenwei-intel/vllm

zhenwei-intel · 2026-04-15T02:06:54Z

Purpose

The old vLLM worker handles DP on XPU by calling set_device_control_env_var to set ZE_AFFINITY_MASK, so each DP worker only sees a device subset. In practice, this approach can introduce accuracy issues when DP+EP.

Example (ZE_AFFINITY_MASK='2,3'):

ZE_AFFINITY_MASK='2' for DP0, using torch.device("xpu:0")
ZE_AFFINITY_MASK='3' for DP1, also using torch.device("xpu:0")

This PR switches XPU worker device assignment to follow the same approach as the GPU worker
where the device is assigned using:
DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANK

Example (ZE_AFFINITY_MASK='2,3'):

DP0, using torch.device("xpu:0")
DP1, using torch.device("xpu:1")

Test Plan

ZE_AFFINITY_MASK=4,5 vllm serve Qwen/Qwen3-30B-A3B --enforce-eager --tensor-parallel-size 1 --data-parallel-size 2 --quantization fp8 -ep --port 8222 --block-size 64 --gpu-memory-utilization 0.8 --max-model-len 8192

curl http://localhost:8222/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen3-30B-A3B", "prompt": "Red H
at is the best company in the world to work for because it works on open source software, which means that all the contributions are delivered to the community. As a result, when w
orking on projects like vLLM we are able to meet many amazing people from various organizations like AMD, Google, NVIDIA, We hold these truths to be self-evident, that all men are
created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights
, Governments are instituted among Men, deriving their just powers from the consent of the governed, ", "max_tokens": 20}'

output:

{"id":"cmpl-bfd8e7a7ecbe4688","object":"text_completion","created":1776220625,"model":"Qwen/Qwen3-30B-A3B","choices":[{"index":0,"text":"1776, Declaration of Independence, United S
tates of America. It's a great place to","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier
":null,"system_fingerprint":null,"usage":{"prompt_tokens":131,"total_tokens":151,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}⏎

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces XPU support for device control and worker initialization in the V1 engine. It updates vllm/v1/engine/utils.py to bypass device environment isolation for XPU platforms and modifies vllm/v1/worker/xpu_worker.py to calculate the local rank offset in data-parallel scenarios. Feedback was provided regarding a potential bug where modifying self.local_rank might incorrectly affect the LOCAL_RANK environment variable, along with suggestions to remove redundant calls and improve assertion clarity.

gemini-code-assist · 2026-04-15T02:08:52Z

+            self.local_rank += dp_local_rank * tp_pp_world_size
+            assert self.local_rank < torch.accelerator.device_count(), (
+                f"DP adjusted local rank {self.local_rank} is out of bounds. "
+            )
+            visible_device_count = torch.accelerator.device_count()
+            assert parallel_config.local_world_size <= visible_device_count, (
+                f"local_world_size ({parallel_config.local_world_size}) must "
+                f"be less than or equal to the number of visible devices "
+                f"({visible_device_count})."
+            )


There are a few issues in this block:

Potential Bug with LOCAL_RANK: Modifying self.local_rank to the absolute device index will cause the environment variable LOCAL_RANK to be set incorrectly at line 105 (os.environ["LOCAL_RANK"] = str(self.local_rank)). In a data-parallel setup, LOCAL_RANK is typically expected by distributed backends (like oneCCL) to be the rank relative to the local collective group (e.g., 0 to TP*PP-1), not the absolute device index on the node. Since environment variable isolation is being disabled for XPU, you should ensure that LOCAL_RANK remains the relative rank while only using the offset value for device selection and init_distributed_environment (to match CUDA behavior).

Redundancy: torch.accelerator.device_count() is called twice redundantly.

Assertion Clarity: The assertion message for the rank check is missing the actual limit, which makes debugging harder.

Suggested change

self.local_rank += dp_local_rank * tp_pp_world_size

assert self.local_rank < torch.accelerator.device_count(), (

f"DP adjusted local rank {self.local_rank} is out of bounds. "

)

visible_device_count = torch.accelerator.device_count()

assert parallel_config.local_world_size <= visible_device_count, (

f"local_world_size ({parallel_config.local_world_size}) must "

f"be less than or equal to the number of visible devices "

f"({visible_device_count})."

)

# DP_LOCAL_RANK * TP_PP_WORLD_SIZE + TP_LOCAL_RANK

self.local_rank += dp_local_rank * tp_pp_world_size

visible_device_count = torch.accelerator.device_count()

assert self.local_rank < visible_device_count, (

f"DP adjusted local rank {self.local_rank} is out of bounds "

f"(max {visible_device_count})."

)

assert parallel_config.local_world_size <= visible_device_count, (

f"local_world_size ({parallel_config.local_world_size}) must "

f"be less than or equal to the number of visible devices "

f"({visible_device_count})."

)

…llm-project#32325) Signed-off-by: Dong Wang <dongw2019@gmail.com>

…oadingManager key parameters (vllm-project#41361) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

…llm-project#40808) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

…m-project#41050) Signed-off-by: Juhi Mittal <juhim@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…ups (vllm-project#41228) Signed-off-by: Or Ozeri <oro@il.ibm.com>

Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

…fusion ordering vllm-project#27893 (vllm-project#39505) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Signed-off-by: roG0d <baonudesifeizhai@gmail.com>

…ect#41199) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…abeled child (vllm-project#40840) When vLLM runs with Ray Prometheus `vllm:request_success{finished_reason=...}` only ever increments the repetition bucket regardless of the request's actual finish reason; stop, length, abort, and error stay at zero. Root cause was `labels()` mutated the wrapped Ray metric's default tags in place and returned self, so every `.labels(...)` call on a given wrapper returned the same object. Co-authored-by: Marwan Sarieddine <sarieddine.marwan@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com> Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Signed-off-by: George-ao <yuyiao772@gmail.com>

…34726) Signed-off-by: raviguptaamd <ravi.gupta@amd.com>

This is the final PR in a series to enables HMA support for the offloading connector. The connector advertises `SupportsHMA` and is validated with unit tests and e2e tests. Signed-off-by: Or Ozeri <oro@il.ibm.com>

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: ganyi <ygan@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com> Signed-off-by: junkang1991 <junkangchow@gmail.com> Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com> Co-authored-by: junkang1991 <junkangchow@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

…lm-project#32623) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…-V4 (vllm-project#41255) Signed-off-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

…lm-project#41444) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

…llm-project#41476) Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: Andy Lo <andy@mistral.ai>

Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>

Signed-off-by: John Calderon <jcalderon@nvidia.com>

…oject#41478) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…ivations) support (vllm-project#41769) Signed-off-by: Juhi Mittal <juhim@nvidia.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…atures without `KVCacheConfig` (vllm-project#39832) The v0.12.0 release contained initial support for HMA in KV Connectors. As part of these changes, a KVCacheConfig argument was added to KV connector constructors. Backwards compatibility support for out-of-tree connectors was included in this change, with a very prominent warning. See vllm-project#25712 and vllm-project#27887. Since the warning has been around for over 5 months, we can safely remove the support of it. Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: roG0d <baonudesifeizhai@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…llm-project#41846) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

vllm-project#33322) Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com> Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>

…llm-project#42176) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…0951) Signed-off-by: Christian Van <cvan20191@gmail.com> Co-authored-by: Christian Van <cvan20191@gmail.com>

…ject#39306) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Signed-off-by: Itay Etelis <etelis2019@gmail.com> Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Itay Etelis <etelis2019@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

) Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com> Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>

…-len (vllm-project#42169)

…roject#41573) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…to free stranded KV blocks (vllm-project#41269)

…llm-project#41313) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…oject#41266) Signed-off-by: abdulrahman-cohere <abdulrahman.abdulrazzag@cohere.com> Signed-off-by: <> Co-authored-by: Cursor Agent <cursor-agent@cursor.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

…y OOM (vllm-project#38502) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…-project#37912) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <Isotr0py@outlook.com>

…ity (vllm-project#41932) Signed-off-by: jmamou <jonathan.mamou@intel.com> Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>

…e_store() (vllm-project#41366) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

…project#41617) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>

…ble (vllm-project#41499) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

…t#41979) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Jackmin801 <ongjackm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

zhenwei-intel force-pushed the fix_xpu_dp branch from 8188faa to e47557b Compare April 15, 2026 02:09

sniper35 and others added 28 commits May 1, 2026 10:06

[Model] Add Moondream3 model support(only query and caption skills) (v…

7198940

…llm-project#32325) Signed-off-by: Dong Wang <dongw2019@gmail.com>

[KV Offload] Use Collection instead of Sequence/Iterable for Offl…

415a879

…oadingManager key parameters (vllm-project#41361) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

[Bugfix] Disable FlashInfer CUTLASS MoE on SM110 (Jetson Thor AGX) (v…

b542bdf

…llm-project#40808) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

[Kernel][MoE] Support GELU on TRT-LLM NvFP4 fused MoE for Gemma4 (vll…

6b6ac6c

…m-project#41050) Signed-off-by: Juhi Mittal <juhim@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[kv_offload+HMA][12/N]: Scheduler-side support for sliding window gro…

941fb50

…ups (vllm-project#41228) Signed-off-by: Or Ozeri <oro@il.ibm.com>

Add nvfp4 kv cache support (vllm-project#40177)

947138b

Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

[Bugfix] Pass reasoning parser kwargs to structured output (vllm-proj…

a076426

…ect#41199) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

[ROCm][CI] Upgraded UCX and RIXL (vllm-project#41210)

32964e7

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Refractor longcat loading to use AutoWeightsLoader (vllm-project#41448)

0dbaf9d

Signed-off-by: George-ao <yuyiao772@gmail.com>

[ROCm] Enable DBO (Dynamic Batch Optimization) on ROCm (vllm-project#…

7075df7

…34726) Signed-off-by: raviguptaamd <ravi.gupta@amd.com>

[kv_offload+HMA][13/N]: Enable HMA support (vllm-project#41445)

2fa1f8e

This is the final PR in a series to enables HMA support for the offloading connector. The connector advertises `SupportsHMA` and is validated with unit tests and e2e tests. Signed-off-by: Or Ozeri <oro@il.ibm.com>

[Kernel] Pack output and LSE in DCP A2A (vllm-project#41160)

4f7bde5

[Perf] Warmup forward_native sampler kernel (vllm-project#41375)

c3e6469

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[ROCm][Deepseek] dsv3.2 further optimization (vllm-project#41217)

bc635fa

Signed-off-by: ganyi <ygan@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>

[Eval][CI] Add basic mrcr eval to tests/evals/ (vllm-project#40164)

3ccc1ff

Signed-off-by: mgoin <mgoin64@gmail.com>

[Model Runner V2] Add logprob_token_ids support (vllm-project#40559)

5129579

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

[Perf] Intergrate Tile Kernels head_compute_mix_kernel for Deepseek…

a9484da

…-V4 (vllm-project#41255) Signed-off-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>

[DSV4] Add knob to enable pre-attn gemm (vllm-project#41443)

bcf5cac

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

[Bugfix] Fix persistent_topk inter-CTA init race on RadixRowState (vl…

edd60ac

…lm-project#41444) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

[Build] Make bundled DeepGEMM wheel portable across Python versions (v…

0c99629

…llm-project#41476) Signed-off-by: mgoin <mgoin64@gmail.com>

Re-enable allreduce rms fusion for DP / PP (vllm-project#41458)

5737770

Signed-off-by: Andy Lo <andy@mistral.ai>

[Fix] Sync gemma4 chat template from hf (vllm-project#39570)

c408fdd

Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>

[MM][CG] Support ViT CG for Qwen2.5-VL (vllm-project#40830)

964a4bc

Signed-off-by: John Calderon <jcalderon@nvidia.com>

Limit concurrency on test_transcription_api_correctness.py (vllm-pr…

3e49479

…oject#41478) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

juhi10071998 and others added 30 commits May 9, 2026 21:15

[Quantization] Add ModelOpt NVFP4 W4A16 (4-bit weights, fp16/bf16 act…

7a2b596

…ivations) support (vllm-project#41769) Signed-off-by: Juhi Mittal <juhim@nvidia.com>

[Refactor] Nixl util using lazy init (vllm-project#41392)

f80aa53

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Bugfix] Skip routed-experts hot path when disabled (vllm-project#42148)

006af4b

Add NVFP4 all-gather GEMM fusion for AsyncTP (vllm-project#41882)

bc5fdc1

Signed-off-by: roG0d <baonudesifeizhai@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Fix: Nemotron 3 rescue whitespace-only final_content, not just None (v…

dcb3135

…llm-project#41846) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Bugfix] Fix SP pass for multimodal models and PP+SP residual handling (

0b272a6

vllm-project#33322) Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com> Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>

[CI/Build] Use modelscope's international site for regression test (v…

1029e5e

…llm-project#42176) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Handle optional bool-or-string CLI args in get_kwargs (vllm-project#4…

0d382ec

…0951) Signed-off-by: Christian Van <cvan20191@gmail.com> Co-authored-by: Christian Van <cvan20191@gmail.com>

docs: clarify Gemma 4 assistant speculative decoding (vllm-project#42180

27d3bac

) Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com> Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>

[Bugfix] Fix DeepSeek v4 topk numerical issue for unaligned max-model…

986edc8

…-len (vllm-project#42169)

[ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (vllm-p…

fb1ac80

…roject#41573) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection …

3f5bd48

…to free stranded KV blocks (vllm-project#41269)

[ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (v…

f284012

…llm-project#41313) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Fix mypy failure on main (vllm-project#42197)

efd0e77

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

Add @zyongye to CODEOWNERS (vllm-project#42200)

301305c

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

[Docs] Fix broken local links (vllm-project#42160)

a2c9d54

Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>

[CI] Trigger LoRA test when changing MoE code. (vllm-project#42196)

84f7a55

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

[ROCm] Cap Triton paged attention block size to fix ROCm shared memor…

0a309b5

…y OOM (vllm-project#38502) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled (vllm…

48698b1

…-project#37912) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <Isotr0py@outlook.com>

[CPU] Fix spec decode kernel signatures for synthetic mode compatibil…

a54f0d1

…ity (vllm-project#41932) Signed-off-by: jmamou <jonathan.mamou@intel.com> Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>

[KV Offload] Pass ReqContext to touch(), complete_load(), and complet…

e175192

…e_store() (vllm-project#41366) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

[Bugfix][Mamba] IMA in causal_conv1d kernel for long sequences (vllm-…

215e2f7

…project#41617) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[DSV4] Add PP support for deepseek-v4 (vllm-project#41694)

f396bee

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>

[Performance] Make safetensors checkpoint prefetch settings configura…

21943d4

…ble (vllm-project#41499) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

Fix Molmo2 image token metadata (vllm-project#42162)

879a8c3

Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>

Merge branch 'main' into fix_xpu_dp

9062589

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] update dp rank w/o env-var isolation#8

[XPU] update dp rank w/o env-var isolation#8
zhenwei-intel wants to merge 700 commits into
mainfrom
fix_xpu_dp

zhenwei-intel commented Apr 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

zhenwei-intel commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

zhenwei-intel commented Apr 15, 2026 •

edited

Loading