Integrate upstream/main into feat/integration#186
Open
RhizoNymph wants to merge 850 commits into
Open
Conversation
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…vllm-project#42331) `unregister_vllm_metrics()` currently uses "vllm" in `collector._name` to decide which collectors to remove from the Prometheus registry, removing every even metrics registered by other subsystems or downstream extensions like "vllm_omni:" Signed-off-by: vraiti <vraiti@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
…vllm-project#36902) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…l (SM 100) (vllm-project#45251) Signed-off-by: Wentian Byte <3400259131@qq.com>
…oject#43965) Co-authored-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>
…project#45287) Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…lm-project#45171) Signed-off-by: Yifan Zong <yzong@redhat.com>
…nector teardown (vllm-project#45206) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…m-project#45305) Signed-off-by: Neil Schemenauer <nas@arctrix.com>
…#45217) Signed-off-by: jpwang <jpwang@smail.nju.edu.cn> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…llm-project#45104) Signed-off-by: Yifan Zong <yzong@redhat.com>
…llm-project#44592) Signed-off-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Signed-off-by: littlecircle0730 <littlecircle0730@gmail.com> Signed-off-by: littlecircle0730 <43994952+littlecircle0730@users.noreply.github.com> Co-authored-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Co-authored-by: Or Ozeri <or@ozery.com>
…-project#44899) Co-authored-by: vLLM Contributor <contributor@vllm.ai>
…x downloads (vllm-project#45308) Signed-off-by: Ting Sun <suntcrick@gmail.com>
vllm-project#45345) Signed-off-by: Nick Hill <nickhill123@gmail.com>
…ns (vllm-project#44383) Signed-off-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…-project#44612) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Martin Kukla <martin.kukla@cantab.net> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Dipika Sikka <dsikka@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Alec Kohlhoff <134344302+aleckohlhoff@users.noreply.github.com> Co-authored-by: Porras Huang <20535584+porrashuang@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
…40660) Signed-off-by: allgather <all2allops@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…llm-project#44893) Signed-off-by: Rohan Potdar <rohan.potdar@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
…ct#45374) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
…stats to the managed Python engine (vllm-project#45300) Signed-off-by: Will Eaton <weaton@redhat.com>
…45917) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…apture`, 26.8% ~ 27.9% E2E TTFT improvement (vllm-project#45309) Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
vllm-project#45794) Signed-off-by: wangjiaxin99 <jiaxwang@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…ls (vllm-project#45867) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
…h clear errors (vllm-project#45196) Signed-off-by: Ting Sun <suntcrick@gmail.com>
…#45849) Signed-off-by: shanjiaz <hezhao@redhat.com> Co-authored-by: shanjiaz <hezhao@redhat.com>
…licated KV heads (vllm-project#45879) Signed-off-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: waynehacking8 <waynehacking8@gmail.com>
…during_capture`" (vllm-project#45309) (vllm-project#45972) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…#45826) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com>
…lm-project#43958) Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…roject#45876) Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
…ect#39726) Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com> Signed-off-by: Jonathan <chenleejonathan@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
…rs drain (vllm-project#45823) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
…or (vllm-project#45905) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@alexai.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>
…offloading scheduler (vllm-project#45679) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@future.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Integrates ~846 upstream commits. Resolves 15 conflicts (steering/capture hooks vs upstream refactors); ports inline-steering hook into the new OfflineInferenceMixin (offline_utils.py); accepts upstream removal of GGUF (now the external vllm-gguf-plugin) and of dots1/internlm2_ve models. Repoints extra-quant at the RhizoNymph/vllm-gguf-plugin fork (gemma4 support).
The vllm_c rms_norm/fused_add_rms_norm guards claimed support for weight=None, but torch.ops._C.rms_norm cannot take a None/undefined weight (fails with 'Not yet supported ScalarType'). Weightless norms (e.g. Gemma4 v_norm, has_weight=False) now correctly fall back to the native impl.
The SetSteeringRequest.vectors field is intentionally dict[str, Any] (to admit the packed wire form), so the model does not coerce inner layer keys; coerce_steering_spec does. Test the actual coercion seam (which had no direct coverage) instead of obsolete model-level behavior.
A single third-party capture-consumer plugin that fails to import (e.g. one referencing a module not present in this build) previously crashed _load_entry_points and took down all capture admission. Skip it with a warning so other consumers keep working.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Merges ~846 commits from
upstream/main(througha331589394, 2026-06-18) into the integration fork.Why
Keeps the steering/capture fork current with upstream. Merge-base was ~3.5 weeks stale (2026-05-24).
Conflict resolution
81 files overlapped; 15 conflicted. Notable resolutions:
LLMprivate methods were extracted into a newOfflineInferenceMixin(vllm/entrypoints/offline_utils.py). Ported the inline-steering hook (_maybe_pack_inline_steering+ its_add_requestcall site) into the mixin; removed the now-duplicate block fromllm.py(constructor capture-consumer logic auto-merged and is intact)._set_request_block_hash_steering_overrides; took upstream's newschedule(throttle_prefills=...)+current_stepincrement.vllm-gguf-plugin. Accepted the deletions; repointed theextra-quantoptional dependency at theRhizoNymph/vllm-gguf-pluginfork (adds gemma4 support).dots1andinternlm2_ve(our changes there were only hooks).Runtime-validated on node0 (RTX 3090): engine initializes and generates correctly (Qwen3-4B). The ~67 textually auto-merged files (incl.
gpu_model_runner.py) build and run but warrant deeper capture/steering validation.