Skip to content

Update dependency vllm to v0.20.0 [SECURITY]#8

Open
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/pypi-vllm-vulnerability
Open

Update dependency vllm to v0.20.0 [SECURITY]#8
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/pypi-vllm-vulnerability

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate Bot commented May 28, 2025

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
vllm ==0.8.5==0.20.0 age confidence

vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in pythonic_tool_parser.py

CVE-2025-48887 / GHSA-w6q7-j642-7c25

More information

Details

Summary

A Regular Expression Denial of Service (ReDoS) vulnerability exists in the file vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py of the vLLM project. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable.

Details

The following regular expression is used to match tool/function call patterns:

r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]"

This pattern contains multiple nested quantifiers (*, +), optional groups, and inner repetitions which make it vulnerable to catastrophic backtracking.

Attack Example:
A malicious input such as

[A(A=	)A(A=,		)A(A=,		)A(A=,		)... (repeated dozens of times) ...]

or

"[A(A=" + "\t)A(A=,\t" * repeat

can cause the regular expression engine to consume CPU exponentially with the input length, effectively freezing or crashing the server (DoS).

Proof of Concept:
A Python script demonstrates that matching such a crafted string with the above regex results in exponential time complexity. Even moderate input lengths can bring the system to a halt.

Length: 22, Time: 0.0000 seconds, Match: False
Length: 38, Time: 0.0010 seconds, Match: False
Length: 54, Time: 0.0250 seconds, Match: False
Length: 70, Time: 0.5185 seconds, Match: False
Length: 86, Time: 13.2703 seconds, Match: False
Length: 102, Time: 319.0717 seconds, Match: False
Impact
  • Denial of Service (DoS): An attacker can trigger a denial of service by sending specially crafted payloads to any API or interface that invokes this regex, causing excessive CPU usage and making the vLLM service unavailable.
  • Resource Exhaustion and Memory Retention: As this regex is invoked during function call parsing, the matching process may hold on to significant CPU and memory resources for extended periods (due to catastrophic backtracking). In the context of vLLM, this also means that the associated KV cache (used for model inference and typically stored in GPU memory) is not released in a timely manner. This can lead to GPU memory exhaustion, degraded throughput, and service instability.
  • Potential for Broader System Instability: Resource exhaustion from stuck or slow requests may cascade into broader system instability or service downtime if not mitigated.
Fix

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM vulnerable to Regular Expression Denial of Service

GHSA-j828-28rj-hfhp

More information

Details

Summary

A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking.

1. vllm/lora/utils.py Line 173

https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173
Risk Description:

  • The regex r"\((.*?)\)\$?$" matches content inside parentheses. If input such as ((((a|)+)+)+) is passed in, it can cause catastrophic backtracking, leading to a ReDoS vulnerability.
  • Using .*? (non-greedy match) inside group parentheses can be highly sensitive to input length and nesting complexity.

Remediation Suggestions:

  • Limit the input string length.
  • Use a non-recursive matching approach, or write a regex with stricter content constraints.
  • Consider using possessive quantifiers or atomic groups (not supported in Python yet), or split and process before regex matching.

2. vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py Line 52

https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py#L52

Risk Description:

  • The regex r'functools\[(.*?)\]' uses .*? to match content inside brackets, together with re.DOTALL. If the input contains a large number of nested or crafted brackets, it can cause backtracking and ReDoS.

Remediation Suggestions:

  • Limit the length of model_output.
  • Use a stricter, non-greedy pattern (avoid matching across extraneous nesting).
  • Prefer re.finditer() and enforce a length constraint on each match.

3. vllm/entrypoints/openai/serving_chat.py Line 351

https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/serving_chat.py#L351

Risk Description:

  • The regex r'.*"parameters":\s*(.*)' can trigger backtracking if current_text is very long and contains repeated structures.
  • Especially when processing strings from unknown sources, .* matching any content is high risk.

Remediation Suggestions:

  • Use a more specific pattern (e.g., via JSON parsing).
  • Impose limits on current_text length.
  • Avoid using .* to capture large blocks of text; prefer structured parsing when possible.

4. benchmarks/benchmark_serving_structured_output.py Line 650

https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/benchmarks/benchmark_serving_structured_output.py#L650

Risk Description:

  • The regex r'\{.*\}' is used to extract JSON inside curly braces. If the actual string is very long with unbalanced braces, it can cause backtracking, leading to a ReDoS vulnerability.
  • Although this is used for benchmark correctness checking, it should still handle abnormal inputs carefully.

Remediation Suggestions:

  • Limit the length of actual.
  • Prefer stepwise search for { and } or use a robust JSON extraction tool.
  • Recommend first locating the range with simple string search, then applying regex.
Fix

Severity

  • CVSS Score: 4.3 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

CVE-2025-46570 / GHSA-4qjh-9fv9-r85r

More information

Details

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment
  • GPU: NVIDIA A100 (40G)
  • CUDA: 11.8
  • PyTorch: 2.3.1
  • OS: Ubuntu 18.04
  • vLLM: v0.5.1
    Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.
Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.
roc_curves_combined_block_2

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:

  • With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.
  • When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.
Fixes

Severity

  • CVSS Score: 2.6 / 10 (Low)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM has a Weakness in MultiModalHasher Image Hashing Implementation

CVE-2025-46722 / GHSA-c65p-x677-fgj6

More information

Details

Summary

In the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This may lead to hash collisions, incorrect cache hits, and even data leakage or security risks.

Details
  • Affected file: vllm/multimodal/hasher.py
  • Affected method: MultiModalHasher.serialize_item
    https://github.com/vllm-project/vllm/blob/9420a1fc30af1a632bbc2c66eb8668f3af41f026/vllm/multimodal/hasher.py#L34-L35
  • Current behavior: For Image.Image instances, only obj.tobytes() is used for hashing.
  • Problem description: obj.tobytes() does not include the image’s width, height, or mode metadata.
  • Impact: Two images with the same pixel byte sequence but different sizes could be regarded as the same image by the cache and hashing system, which may result in:
    • Incorrect cache hits, leading to abnormal responses
    • Deliberate construction of images with different meanings but the same hash value
Recommendation

In the serialize_item method, serialization of Image.Image objects should include not only pixel data, but also all critical metadata—such as dimensions (size), color mode (mode), format, and especially the info dictionary. The info dictionary is particularly important in palette-based images (e.g., mode 'P'), where the palette itself is stored in info. Ignoring info can result in hash collisions between visually distinct images with the same pixel bytes but different palettes or metadata. This can lead to incorrect cache hits or even data leakage.

Summary:
Serializing only the raw pixel data is insecure. Always include all image metadata (size, mode, format, info) in the hash calculation to prevent collisions, especially in cases like palette-based images.

Impact for other modalities
For the influence of other modalities, since the video modality is transformed into a multi-dimensional array containing the length, width, time, etc. of the video, the same problem exists due to the incorrect sequence of numpy as well.

For audio, since the momo function is not enabled in librosa.load, the loaded audio is automatically encoded into single channels by librosa and returns a one-dimensional array of numpy, thus keeping the structure of numpy fixed and not affected by this issue.

Fixes

Severity

  • CVSS Score: 4.2 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:L/I:N/A:L

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM DOS: Remotely kill vllm over http with invalid JSON schema

CVE-2025-48942 / GHSA-6qc9-v4r8-22xg

More information

Details

Summary

Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server

Details

The following API call
(venv) [derekh@ip-172-31-15-108 ]$ curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
will provoke a Uncaught exceptions from xgrammer in
./lib64/python3.11/site-packages/xgrammar/compiler.py

Issue with more information: https://github.com/vllm-project/vllm/issues/17248

PoC

Make a call to vllm with invalid json_scema e.g. {\"properties\":{\"reason\":{\"type\": \"stsring\"}}}

curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'

Impact

vllm crashes

example traceback

ERROR 03-26 17:25:01 [core.py:340] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 333, in run_engine_core
ERROR 03-26 17:25:01 [core.py:340]     engine_core.run_busy_loop()
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 367, in run_busy_loop
ERROR 03-26 17:25:01 [core.py:340]     outputs = step_fn()
ERROR 03-26 17:25:01 [core.py:340]               ^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 181, in step
ERROR 03-26 17:25:01 [core.py:340]     scheduler_output = self.scheduler.schedule()
ERROR 03-26 17:25:01 [core.py:340]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/core/scheduler.py", line 257, in schedule
ERROR 03-26 17:25:01 [core.py:340]     if structured_output_req and structured_output_req.grammar:
ERROR 03-26 17:25:01 [core.py:340]                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/structured_output/request.py", line 41, in grammar
ERROR 03-26 17:25:01 [core.py:340]     completed = self._check_grammar_completion()
ERROR 03-26 17:25:01 [core.py:340]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/structured_output/request.py", line 29, in _check_grammar_completion
ERROR 03-26 17:25:01 [core.py:340]     self._grammar = self._grammar.result(timeout=0.0001)
ERROR 03-26 17:25:01 [core.py:340]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/usr/lib64/python3.11/concurrent/futures/_base.py", line 456, in result
ERROR 03-26 17:25:01 [core.py:340]     return self.__get_result()
ERROR 03-26 17:25:01 [core.py:340]            ^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/usr/lib64/python3.11/concurrent/futures/_base.py", line 401, in __get_result
ERROR 03-26 17:25:01 [core.py:340]     raise self._exception
ERROR 03-26 17:25:01 [core.py:340]   File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
ERROR 03-26 17:25:01 [core.py:340]     result = self.fn(*self.args, **self.kwargs)
ERROR 03-26 17:25:01 [core.py:340]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/structured_output/__init__.py", line 120, in _async_create_grammar
ERROR 03-26 17:25:01 [core.py:340]     ctx = self.compiler.compile_json_schema(grammar_spec,
ERROR 03-26 17:25:01 [core.py:340]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/venv/lib64/python3.11/site-packages/xgrammar/compiler.py", line 101, in compile_json_schema
ERROR 03-26 17:25:01 [core.py:340]     self._handle.compile_json_schema(
ERROR 03-26 17:25:01 [core.py:340] RuntimeError: [17:25:01] /project/cpp/json_schema_converter.cc:795: Check failed: (schema.is<picojson::object>()) is false: Schema should be an object or bool
ERROR 03-26 17:25:01 [core.py:340] 
ERROR 03-26 17:25:01 [core.py:340] 
CRITICAL 03-26 17:25:01 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Fix

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM allows clients to crash the openai server with invalid regex

CVE-2025-48943 / GHSA-9hcf-v7m4-6m2j

More information

Details

Impact

A denial of service bug caused the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg, but for regex instead of a JSON schema.

Issue with more details: https://github.com/vllm-project/vllm/issues/17313

Patches

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM Tool Schema allows DoS via Malformed pattern and type Fields

CVE-2025-48944 / GHSA-vrq3-r879-7m65

More information

Details

Summary

The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted.

Details

The "type" field is expected to be one of: "string", "number", "object", "boolean", "array", or "null". Supplying any other value will cause the worker to crash with the following error:

RuntimeError: [11:03:34] /project/cpp/json_schema_converter.cc:637: Unsupported type "something_or_nothing"

The "pattern" field undergoes Jinja2 rendering (I think) prior to being passed unsafely into the native regex compiler without validation or escaping. This allows malformed expressions to reach the underlying C++ regex engine, resulting in fatal errors.

For example, the following inputs will crash the worker:

Unclosed {, [, or (

Closed:{} and []

Here are some of runtime errors on the crash depending on what gets injected:

RuntimeError: [12:05:04] /project/cpp/regex_converter.cc:73: Regex parsing error at position 4: The parenthesis is not closed.
RuntimeError: [10:52:27] /project/cpp/regex_converter.cc:73: Regex parsing error at position 2: Invalid repetition count.
RuntimeError: [12:07:18] /project/cpp/regex_converter.cc:73: Regex parsing error at position 6: Two consecutive repetition modifiers are not allowed.

PoC

Here is the POST request using the type field to crash the worker. Note the type field is set to "something" rather than the expected types it is looking for:
POST /v1/chat/completions HTTP/1.1
Host:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer:
Content-Type: application/json
Content-Length: 579
Origin:
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
Te: trailers
Connection: keep-alive

{
"model": "mistral-nemo-instruct",
"messages": [{ "role": "user", "content": "crash via type" }],
"tools": [
{
"type": "function",
"function": {
"name": "crash01",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "something"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "crash01",
"arguments": { "a": "test" }
}
},
"stream": false,
"max_tokens": 1
}

Here is the POST request using the pattern field to crash the worker. Note the pattern field is set to a RCE payload, it could have just been set to {{}}. I was not able to get RCE in my testing, but is does crash the worker.

POST /v1/chat/completions HTTP/1.1
Host:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer:
Content-Type: application/json
Content-Length: 718
Origin:
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
Te: trailers
Connection: keep-alive

{
"model": "mistral-nemo-instruct",
"messages": [
{
"role": "user",
"content": "Crash via Pattern"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "crash02",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "string",
"pattern": "{{ import('os').system('echo RCE_OK > /tmp/pwned') or 'SAFE' }}"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "crash02"
}
},
"stream": false,
"max_tokens": 32,
"temperature": 0.2,
"top_p": 1,
"n": 1
}

Impact

Backend workers can be crashed causing anyone to using the inference engine to get 500 internal server errors on subsequent requests.

Fix

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vllm API endpoints vulnerable to Denial of Service Attacks

CVE-2025-48956 / GHSA-rxc4-3w6r-4v47

More information

Details

Summary

A Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user.

Details

The vulnerability leverages the abuse of HTTP headers. By setting a header such as X-Forwarded-For to a very large value like ("A" * 5_800_000_000), the server's HTTP parser or application logic may attempt to load the entire request into memory, overwhelming system resources.

Impact

What kind of vulnerability is it? Who is impacted?
Type of vulnerability: Denial of Service (DoS)

Resolution

Upgrade to a version of vLLM that includes appropriate HTTP limits by deafult, or use a proxy in front of vLLM which provides protection against this issue.

Severity

  • CVSS Score: 7.5 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM is vulnerable to timing attack at bearer auth

CVE-2025-59425 / GHSA-wr9h-g72x-mwhm

More information

Details

Summary

The API key support in vLLM performed validation using a method that was vulnerable to a timing attack. This could potentially allow an attacker to discover a valid API key using an approach more efficient than brute force.

Details

https://github.com/vllm-project/vllm/blob/4b946d693e0af15740e9ca9c0e059d5f333b1083/vllm/entrypoints/openai/api_server.py#L1270-L1274

API key validation used a string comparison that will take longer the more characters the provided API key gets correct. Data analysis across many attempts can allow an attacker to determine when it finds the next correct character in the key sequence.

Impact

Deployments relying on vLLM's built-in API key validation are vulnerable to authentication bypass using this technique.

Severity

  • CVSS Score: 7.5 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

CVE-2025-61620 / GHSA-6fvq-23cw-5628

More information

Details

Summary

A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the chat_template and chat_template_kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.

Details

When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a chat_template parameter that lets users specify that template. In addition, the server accepts a chat_template_kwargs parameter to pass extra keyword arguments to the rendering function.

Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.

Importantly, simply forbidding the chat_template parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for apply_hf_chat_template and then updates that dictionary with the user-supplied chat_template_kwargs via dict.update. Since dict.update can overwrite existing keys, an attacker can place a chat_template key inside chat_template_kwargs to replace the template that will be used by apply_hf_chat_template.

##### vllm/entrypoints/openai/serving_engine.py#L794-L816
_chat_template_kwargs: dict[str, Any] = dict(
    chat_template=chat_template,
    add_generation_prompt=add_generation_prompt,
    continue_final_message=continue_final_message,
    tools=tool_dicts,
    documents=documents,
)
_chat_template_kwargs.update(chat_template_kwargs or {})

request_prompt: Union[str, list[int]]
if isinstance(tokenizer, MistralTokenizer):
    ...
else:
    request_prompt = apply_hf_chat_template(
        tokenizer=tokenizer,
        conversation=conversation,
        model_config=model_config,
        **_chat_template_kwargs,
    )
Impact

If an OpenAI-Compatible Server exposes endpoints that accept chat_template or chat_template_kwargs from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding chat_template inside chat_template_kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.

Fixes

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM is vulnerable to Server-Side Request Forgery (SSRF) through MediaConnector class

CVE-2025-6242 / GHSA-3f6c-7fw2-ppm4

More information

Details

Summary

A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target hosts. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources.

This vulnerability is particularly critical in containerized environments like llm-d, where a compromised vLLM pod could be used to scan the internal network, interact with other pods, and potentially cause denial of service or access sensitive data. For example, an attacker could make the vLLM pod send malicious requests to an internal llm-d management endpoint, leading to system instability by falsely reporting metrics like the KV cache state.

Vulnerability Details

The core of the vulnerability lies in the MediaConnector.load_from_url method and its asynchronous counterpart. These methods accept a URL string to fetch media content (images, audio, video).

https://github.com/vllm-project/vllm/blob/119f683949dfed10df769fe63b2676d7f1eb644e/vllm/multimodal/utils.py#L97-L113

The function directly processes URLs with http, https, and file schemes. An attacker can supply a URL pointing to an internal IP address or a localhost endpoint. The vLLM server will then initiate a connection to this internal resource.

  • HTTP/HTTPS Scheme: An attacker can craft a request like {"image_url": "http://127.0.0.1:8080/internal_api"}. The vLLM server will send a GET request to this internal endpoint.
  • File Scheme: The _load_file_url method attempts to restrict file access to a subdirectory defined by --allowed-local-media-path. While this is a good security measure for local file access, it does not prevent network-based SSRF attacks.
Impact in llm-d Environments

The risk is significantly amplified in orchestrated environments such as llm-d, where multiple pods communicate over an internal network.

  1. Denial of Service (DoS): An attacker could target internal management endpoints of other services within the llm-d cluster. For instance, if a monitoring or metrics service is exposed internally, an attacker could send malformed requests to it. A specific example is an attacker causing the vLLM pod to call an internal API that reports a false KV cache utilization, potentially triggering incorrect scaling decisions or even a system shutdown.

  2. Internal Network Reconnaissance: Attackers can use the vulnerability to scan the internal network for open ports and services by providing URLs like http://10.0.0.X:PORT and observing the server's response time or error messages.

  3. Interaction with Internal Services: Any unsecured internal service becomes a potential target. This could include databases, internal APIs, or other model pods that might not have robust authentication, as they are not expected to be directly exposed.

Delegating this security responsibility to an upper-level orchestrator like llm-d is problematic. The orchestrator cannot easily distinguish between legitimate requests initiated by the vLLM engine for its own purposes and malicious requests originating from user input, thus complicating traffic filtering rules and increasing management overhead.

Fix

See the --allowed-media-domains option discussed here: https://docs.vllm.ai/en/latest/usage/security.html#4-restrict-domains-access-for-media-urls

Severity

  • CVSS Score: 7.1 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:L/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM vulnerable to DoS with incorrect shape of multimodal embedding inputs

CVE-2025-62372 / GHSA-pmqf-x6x8-p7qw

More information

Details

Summary

Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page).

The issue has existed ever since we added support for image embedding inputs, i.e. #​6613 (released in v0.5.5)

Details

Using image embeddings as an example:

  • For models that support image embedding inputs, the engine crashes when scattering the embeddings to inputs_embeds (mismatched shape)
  • For models that don't support image embedding inputs, the engine crashes when validating the inputs inside get_input_embeddings (validation fails).

This happens because we only validate ndim of the tensor, but not the full shape, in input processor (via MultiModalDataParser).

Impact
  • Denial of service by crashing the engine
Mitigation
  • Use API key to limit access to trusted users.
  • Set --limit-mm-per-prompt to 0 for all non-text modalities to ban multimodal inputs, which includes multimodal embedding inputs. However, the model would then only accept text, defeating the purpose of using a multi-modal model.
Resolution

Severity

  • CVSS Score: 8.3 / 10 (High)
  • Vector String: CVSS:4.0/AV:N/AC:L/AT:N/PR:L/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted chat_template_kwargs

CVE-2025-62426 / GHSA-69j4-grxj-j64p

More information

Details

Summary

The /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests

Details

In serving_engine.py, the chat_template_kwargs are unpacked into kwargs passed to chat_utils.py apply_hf_chat_template with no validation on the keys or values in that chat_template_kwargs dict. This means they can be used to override optional parameters in the apply_hf_chat_template method, such as tokenize, changing its default from False to True.

https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/openai/serving_engine.py#L809-L814

https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/chat_utils.py#L1602-L1610

Both serving_chat.py and serving_tokenization.py call into this _preprocess_chat method of serving_engine.py and they both pass in chat_template_kwargs.

So, a chat_template_kwargs like {"tokenize": True} makes tokenization happen as part of applying the chat template, even though that is not expected. Tokenization is a blocking operation, and with sufficiently large input can block the API server's event loop, which blocks handling of all other requests until this tokenization is complete.

This optional tokenize parameter to apply_hf_chat_template does not appear to be used, so one option would be to just hard-code that to always be False instead of allowing it to be optionally overridden by callers. A better option may be to not pass chat_template_kwargs as unpacked kwargs but instead as a dict, and only unpack them after the logic in apply_hf_chat_template that resolves the kwargs against the chat template.

Impact

Any authenticated user can cause a denial of service to a vLLM server with Chat Completion or Tokenize requests.

Fix

https://github.com/vllm-project/vllm/pull/27205

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM vulnerable to remote code execution via transformers_utils/get_config

CVE-2025-66448 / GHSA-8fr4-5q9j-m8gm

More information

Details

Summary

vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves that mapping with get_class_from_dynamic_module(...) and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in the auto_map string. Crucially, this happens even when the caller explicitly sets trust_remote_code=False in vllm.transformers_utils.config.get_config. In practice, an attacker can publish a benign-looking frontend repo whose config.json points via auto_map to a separate malicious backend repo; loading the frontend will silently run the backend’s code on the victim host.

Details

The vulnerable code resolves and instantiates classes from auto_map entries without checking whether those entries point to a different repo or whether remote code execution is allowed.

class Nemotron_Nano_VL_Config(PretrainedConfig):
    model_type = 'Llama_Nemotron_Nano_VL'

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        if vision_config is not None:
            assert "auto_map" in vision_config and "AutoConfig" in vision_config["auto_map"]
            # <-- vulnerable dynamic resolution + instantiation happens here
            vision_auto_config = get_class_from_dynamic_module(*vision_config["auto_map"]["AutoConfig"].split("--")[::-1])
            self.vision_config = vision_auto_config(**vision_config)
        else:
            self.vision_config = PretrainedConfig()

get_class_from_dynamic_module(...) is capable of fetching and importing code from the Hugging Face repo specified in the mapping. trust_remote_code is not enforced for this code path. As a result, a frontend repo can redirect the loader to any backend repo and cause code execution, bypassing the trust_remote_code guard.

Impact

This is a critical vulnerability because it breaks the documented trust_remote_code safety boundary in a core model-loading utility. The vulnerable code lives in a common loading path, so any application, service, CI job, or developer machine that uses vllm’s transformer utilities to load configs can be affected. The attack requires only two repos and no user interaction beyond loading the frontend model. A successful exploit can execute arbitrary commands on the host.

Fixes

Severity

  • CVSS Score: 7.1 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:H/I:H/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


Remote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration

CVE-2025-30165 / GHSA-9pcc-gvx5-r5wm

More information

Details

Affected Environments

Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern.

Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, we have decided not to fix this issue. Instead we recommend that users ensure their environment is on a secure network in case this pattern is in use.

The V1 engine is not affected by this issue.

Impact

In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a SUB ZeroMQ socket and connect to an XPUB socket on the primary vLLM host.

https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L295-L301

When data is received on

Note

PR body was truncated to here.

@renovate renovate Bot mentioned this pull request Jun 17, 2025
1 task
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from dfe9935 to a8e0894 Compare August 21, 2025 19:05
@renovate renovate Bot changed the title Update dependency vllm to v0.9.0 [SECURITY] Update dependency vllm to v0.10.1.1 [SECURITY] Aug 21, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from a8e0894 to 73acea6 Compare October 7, 2025 18:39
@renovate renovate Bot changed the title Update dependency vllm to v0.10.1.1 [SECURITY] Update dependency vllm to v0.11.0 [SECURITY] Oct 7, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 73acea6 to 9c185d9 Compare November 20, 2025 22:00
@renovate renovate Bot changed the title Update dependency vllm to v0.11.0 [SECURITY] Update dependency vllm to v0.11.1 [SECURITY] Nov 20, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 9c185d9 to feac567 Compare January 13, 2026 21:46
@renovate renovate Bot changed the title Update dependency vllm to v0.11.1 [SECURITY] Update dependency vllm to v0.12.0 [SECURITY] Jan 13, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from feac567 to ef30e5a Compare January 28, 2026 17:57
@renovate renovate Bot changed the title Update dependency vllm to v0.12.0 [SECURITY] Update dependency vllm to v0.14.1 [SECURITY] Jan 28, 2026
@renovate renovate Bot changed the title Update dependency vllm to v0.14.1 [SECURITY] Update dependency vllm to v0.14.1 [SECURITY] - autoclosed Mar 27, 2026
@renovate renovate Bot closed this Mar 27, 2026
@renovate renovate Bot deleted the renovate/pypi-vllm-vulnerability branch March 27, 2026 01:34
@renovate renovate Bot changed the title Update dependency vllm to v0.14.1 [SECURITY] - autoclosed Update dependency vllm to v0.14.1 [SECURITY] Mar 30, 2026
@renovate renovate Bot reopened this Mar 30, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch 3 times, most recently from 05ff46c to dc004ca Compare April 3, 2026 16:54
@renovate renovate Bot changed the title Update dependency vllm to v0.14.1 [SECURITY] Update dependency vllm to v0.19.0 [SECURITY] Apr 3, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from dc004ca to c637e8d Compare May 6, 2026 20:29
@renovate renovate Bot changed the title Update dependency vllm to v0.19.0 [SECURITY] Update dependency vllm to v0.20.0 [SECURITY] May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants