Integrate upstream/main into feat/integration by RhizoNymph · Pull Request #186 · RhizoNymph/vllm

RhizoNymph · 2026-06-19T00:31:39Z

What

Merges ~846 commits from upstream/main (through a331589394, 2026-06-18) into the integration fork.

Why

Keeps the steering/capture fork current with upstream. Merge-base was ~3.5 weeks stale (2026-05-24).

Conflict resolution

81 files overlapped; 15 conflicted. Notable resolutions:

LLM internals moved upstream: LLM private methods were extracted into a new OfflineInferenceMixin (vllm/entrypoints/offline_utils.py). Ported the inline-steering hook (_maybe_pack_inline_steering + its _add_request call site) into the mixin; removed the now-duplicate block from llm.py (constructor capture-consumer logic auto-merged and is intact).
Scheduler: kept _set_request_block_hash_steering_overrides; took upstream's new schedule(throttle_prefills=...) + current_step increment.
GGUF removed from tree: upstream extracted GGUF into the external vllm-gguf-plugin. Accepted the deletions; repointed the extra-quant optional dependency at the RhizoNymph/vllm-gguf-plugin fork (adds gemma4 support).
Removed models: accepted upstream removal of dots1 and internlm2_ve (our changes there were only hooks).
Remaining conflicts were additive (kept both sides).

Runtime-validated on node0 (RTX 3090): engine initializes and generates correctly (Qwen3-4B). The ~67 textually auto-merged files (incl. gpu_model_runner.py) build and run but warrant deeper capture/steering validation.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…vllm-project#42331) `unregister_vllm_metrics()` currently uses "vllm" in `collector._name` to decide which collectors to remove from the Prometheus registry, removing every even metrics registered by other subsystems or downstream extensions like "vllm_omni:" Signed-off-by: vraiti <vraiti@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>

…vllm-project#36902) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…l (SM 100) (vllm-project#45251) Signed-off-by: Wentian Byte <3400259131@qq.com>

…oject#43965) Co-authored-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>

…project#45287) Signed-off-by: Ben Browning <bbrownin@redhat.com>

Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…lm-project#45171) Signed-off-by: Yifan Zong <yzong@redhat.com>

…nector teardown (vllm-project#45206) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

…m-project#45305) Signed-off-by: Neil Schemenauer <nas@arctrix.com>

…#45217) Signed-off-by: jpwang <jpwang@smail.nju.edu.cn> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…llm-project#45104) Signed-off-by: Yifan Zong <yzong@redhat.com>

…llm-project#44592) Signed-off-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Signed-off-by: littlecircle0730 <littlecircle0730@gmail.com> Signed-off-by: littlecircle0730 <43994952+littlecircle0730@users.noreply.github.com> Co-authored-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Co-authored-by: Or Ozeri <or@ozery.com>

…-project#44899) Co-authored-by: vLLM Contributor <contributor@vllm.ai>

…x downloads (vllm-project#45308) Signed-off-by: Ting Sun <suntcrick@gmail.com>

vllm-project#45345) Signed-off-by: Nick Hill <nickhill123@gmail.com>

…ns (vllm-project#44383) Signed-off-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

…-project#44612) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: Chris Leonard <chleonar@redhat.com>

…A groups (vllm-project#44583)

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Martin Kukla <martin.kukla@cantab.net> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Dipika Sikka <dsikka@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Alec Kohlhoff <134344302+aleckohlhoff@users.noreply.github.com> Co-authored-by: Porras Huang <20535584+porrashuang@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>

…40660) Signed-off-by: allgather <all2allops@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…llm-project#44893) Signed-off-by: Rohan Potdar <rohan.potdar@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>

…ct#45374) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

…stats to the managed Python engine (vllm-project#45300) Signed-off-by: Will Eaton <weaton@redhat.com>

…45917) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…apture`, 26.8% ~ 27.9% E2E TTFT improvement (vllm-project#45309) Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Qiang Li <qiang.li2@amd.com>

vllm-project#45794) Signed-off-by: wangjiaxin99 <jiaxwang@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ls (vllm-project#45867) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

…h clear errors (vllm-project#45196) Signed-off-by: Ting Sun <suntcrick@gmail.com>

…#45849) Signed-off-by: shanjiaz <hezhao@redhat.com> Co-authored-by: shanjiaz <hezhao@redhat.com>

…licated KV heads (vllm-project#45879) Signed-off-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: waynehacking8 <waynehacking8@gmail.com>

…during_capture`" (vllm-project#45309) (vllm-project#45972) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

…llm-project#45448)

…#45826) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>

…lm-project#43958) Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

…roject#45876) Signed-off-by: reidliu41 <reid201711@gmail.com>

Signed-off-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com>

…ect#39726) Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com> Signed-off-by: Jonathan <chenleejonathan@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

…rs drain (vllm-project#45823) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

…or (vllm-project#45905) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@alexai.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>

…offloading scheduler (vllm-project#45679) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@future.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>

…5999)

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Integrates ~846 upstream commits. Resolves 15 conflicts (steering/capture hooks vs upstream refactors); ports inline-steering hook into the new OfflineInferenceMixin (offline_utils.py); accepts upstream removal of GGUF (now the external vllm-gguf-plugin) and of dots1/internlm2_ve models. Repoints extra-quant at the RhizoNymph/vllm-gguf-plugin fork (gemma4 support).

The vllm_c rms_norm/fused_add_rms_norm guards claimed support for weight=None, but torch.ops._C.rms_norm cannot take a None/undefined weight (fails with 'Not yet supported ScalarType'). Weightless norms (e.g. Gemma4 v_norm, has_weight=False) now correctly fall back to the native impl.

The SetSteeringRequest.vectors field is intentionally dict[str, Any] (to admit the packed wire form), so the model does not coerce inner layer keys; coerce_steering_spec does. Test the actual coercion seam (which had no direct coverage) instead of obsolete model-level behavior.

A single third-party capture-consumer plugin that fails to import (e.g. one referencing a module not present in this build) previously crashed _load_entry_points and took down all capture admission. Skip it with a warning so other consumers keep working.

ZJY0516 and others added 30 commits June 11, 2026 11:36

[Attention] add triton diff-kv backend for mimo (vllm-project#41797)

f81daf8

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

[Kernel][Helion][1/N] Add Helion kernel for per_token_group_fp8_quant (…

2ec6594

…vllm-project#36902) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Bugfix] Restrict FlashInfer cuDNN FP8 ViT attention gate to Blackwel…

b814229

…l (SM 100) (vllm-project#45251) Signed-off-by: Wentian Byte <3400259131@qq.com>

[Rust Frontend] Support continuous_usage_stats stream option (vllm-pr…

3b03a2c

…oject#43965) Co-authored-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>

[Bugfix] Fix Anthropic tool_use content handling dropping args (vllm-…

235b63c

…project#45287) Signed-off-by: Ben Browning <bbrownin@redhat.com>

[Model] Remove InternLMForCausalLM registry alias (vllm-project#45128)

c9340e6

Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

[Bug] Fix test flashmla for DSv4 (vllm-project#45052)

5a6c7b7

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Refactor] Chat Completions Harmony Refactor, non-streaming path. (vl…

f712fd0

…lm-project#45171) Signed-off-by: Yifan Zong <yzong@redhat.com>

[Bugfix][KVConnector][Mooncake] Close MooncakeDistributedStore on con…

8a91228

…nector teardown (vllm-project#45206) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

Make mistral_common optional by deferring MistralToolCall import (vll…

9bbf42b

…m-project#45305) Signed-off-by: Neil Schemenauer <nas@arctrix.com>

[Bugfix] Initialize missing attributes in mistral eagle (vllm-project…

6f573f4

…#45217) Signed-off-by: jpwang <jpwang@smail.nju.edu.cn> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Refactor] Chat Completions Streaming Harmony Refactor and Bugfixes (v…

e0871ad

…llm-project#45104) Signed-off-by: Yifan Zong <yzong@redhat.com>

[ROCm][DSv4][Perf] Flash-decode split-K decode attention kernel (vllm…

fcf5115

…-project#44899) Co-authored-by: vLLM Contributor <contributor@vllm.ai>

[Bugfix][Model] Pass revision by name in Run:ai and bitsandbytes inde…

c107683

…x downloads (vllm-project#45308) Signed-off-by: Ting Sun <suntcrick@gmail.com>

[CI][BugFix] Fix broken test_mamba_prefix_cache.py due to stale mock (

2263f8a

vllm-project#45345) Signed-off-by: Nick Hill <nickhill123@gmail.com>

[Bugfix] Fix --enable-prompt-tokens-details omitting zero cached toke…

42ae5e7

…ns (vllm-project#44383) Signed-off-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

[ASR] Optimize CPU preproc to get 2.5x RTFx via multi-threading (vllm…

e0b9fb1

…-project#44612) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Bugfix] Mamba CPU Offloading (vllm-project#44599)

b927004

Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>

[ASR] Add Long Audio benchmark and correctness test (vllm-project#44587)

226ba9f

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[11a/n] Migrate Marlin kernels to torch stable ABI (vllm-project#45176)

7021be6

Signed-off-by: Chris Leonard <chleonar@redhat.com>

[NIXL] Per-region KV transfer classification for mixed full-attn + ML…

6fbfdd1

…A groups (vllm-project#44583)

[ROCm][CI] fix fp8 support for test_deepep_moe (vllm-project#45302)

1ce3cdc

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

[MM][Perf][CG] Support ViT full cudagraphs for mllama4 (vllm-project#…

39dee11

…40660) Signed-off-by: allgather <all2allops@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[ROCm][gpt-oss] Pass GateMode.INTERLEAVE for MXFP4 W4A16 fused MoE (v…

fe04238

…llm-project#44893) Signed-off-by: Rohan Potdar <rohan.potdar@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>

[Bugfix] Fix Dockerfile dependency graph pre-commit error (vllm-proje…

a2c72d4

…ct#45374) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[CPU] Support CPU W4A16 INT4 MoE (vllm-project#43409)

0cd9b7a

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[Rust Frontend][Bugfix] Forward --shutdown-timeout and --disable-log-…

87b98d6

…stats to the managed Python engine (vllm-project#45300) Signed-off-by: Will Eaton <weaton@redhat.com>

danisereb and others added 30 commits June 17, 2026 15:24

[Bugfix] Pass TP group to FlashInfer all-reduce fusion (vllm-project#…

5e27b2b

…45917) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

[Log] Update deepgemm log (vllm-project#45857)

9c7c74b

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[DSV4 Perf] Optimize dsv4 cudagraph by reducing `eager_break_during_c…

2a47a9f

…apture`, 26.8% ~ 27.9% E2E TTFT improvement (vllm-project#45309) Signed-off-by: yewentao256 <zhyanwentao@126.com>

[feature] MiniMax-M3-MXFP4 support added (vllm-project#45896)

d112eb1

Signed-off-by: Qiang Li <qiang.li2@amd.com>

[Bugfix] MiniMax-M3 (AMD): add packed_modules_mapping and pass swiglu… (

091386a

vllm-project#45794) Signed-off-by: wangjiaxin99 <jiaxwang@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>

[Refactor] Remove dead quantization code and tests (vllm-project#45454)

2659f60

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Bugfix][Gemma4] Render reasoning on assistant turns without tool_cal…

58b2e89

…ls (vllm-project#45867) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

[Bugfix][Model] Validate DefaultModelLoader / LoadConfig and fail wit…

9d4b87f

…h clear errors (vllm-project#45196) Signed-off-by: Ting Sun <suntcrick@gmail.com>

[BUG] fix hidden states nan for hybrid attention models (vllm-project…

5fd21eb

…#45849) Signed-off-by: shanjiaz <hezhao@redhat.com> Co-authored-by: shanjiaz <hezhao@redhat.com>

[Bugfix] Fix NixlConnector handshake block_len validation for GQA-rep…

0d339cf

…licated KV heads (vllm-project#45879) Signed-off-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: waynehacking8 <waynehacking8@gmail.com>

Revert "[DSV4 Perf] Optimize dsv4 cudagraph by reducing `eager_break_…

1797576

…during_capture`" (vllm-project#45309) (vllm-project#45972) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[XPU][CI] add model runner v2 into CI (vllm-project#44650)

2959a92

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

[CI/Build][Bugfix] Fix SD LoRA (vllm-project#45941)

ebbb2d5

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

[Bugfix] Complete one-shot fused all-reduce PDL at end to avoid NaN (v…

b409217

…llm-project#45448)

[Rust Frontend][Perf] O(n) argument scan in tool parser (vllm-project…

e1a5fc4

…#45826) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>

[XPU] Fix FP8 block-scaled scheme selection on non-CUDA platforms (vl…

8dd8b6e

…lm-project#43958) Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

[Rust Frontend] Validate tokenized bad_words vocabulary range (vllm-p…

731fb33

…roject#45876) Signed-off-by: reidliu41 <reid201711@gmail.com>

[CPUOffloading] Guard CPU eviction check (vllm-project#45757)

ed938ad

Signed-off-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com>

[SimpleCPUOffloadConnector]: Add support for reset_cache() (vllm-proj…

d57888e

…ect#39726) Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com> Signed-off-by: Jonathan <chenleejonathan@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Kernel] Add PDL support for DeepGEMM kernel (vllm-project#42996)

4403af8

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

[Fix][KV offload] Defer on_request_finished until in-flight transfe…

f428718

…rs drain (vllm-project#45823) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

[Refactor] Remove dead cutlass mxfp8 code (vllm-project#44681)

b4c80ec

Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

[KV Offloading] Remove dummy worker-side stats from OffloadingConnect…

421c1ec

…or (vllm-project#45905) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@alexai.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>

[Test][KV Connector] Add request_finished fence population tests for …

554352a

…offloading scheduler (vllm-project#45679) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@future.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>

Revert "[Kernel] Add PDL support for DeepGEMM kernel" (vllm-project#4…

e945169

…5999)

[XPU] Update nixl to v0.10.1 in Dockerfile (vllm-project#40287)

a331589

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate upstream/main into feat/integration#186

Integrate upstream/main into feat/integration#186
RhizoNymph wants to merge 850 commits into
feat/integrationfrom
chore/integrate-upstream

RhizoNymph commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

RhizoNymph commented Jun 19, 2026

What

Why

Conflict resolution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants