Skip to content

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939

Open
ikaadil wants to merge 53 commits intovllm-project:mainfrom
ikaadil:request-retry
Open

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939
ikaadil wants to merge 53 commits intovllm-project:mainfrom
ikaadil:request-retry

Conversation

@ikaadil
Copy link
Copy Markdown
Contributor

@ikaadil ikaadil commented May 1, 2026

Overview

Implements automatic retry for transient HTTP failures with configurable exponential backoff and jitter. This enhances the router's resilience by automatically retrying failed requests instead of immediately returning errors to clients.

Details: https://medium.com/@avnein4988/mitigating-the-thundering-herd-problem-exponential-backoff-with-jitter-b507cdf90d62

Changes

Core Implementation

  • RetryConfig dataclass: Configurable retry parameters with exponential backoff calculation
  • Retry logic: Integrated into request processing flow in route_general_request()
  • Retryable status detection: Function to identify transient failures

CLI Arguments

Added retry configuration options:

  • --retry-max-retries (default: 5)
  • --retry-initial-backoff-ms (default: 50)
  • --retry-max-backoff-ms (default: 30000)
  • --retry-backoff-multiplier (default: 1.5)
  • --retry-jitter-factor (default: 0.2)
  • --disable-retries: Disable all retries

Key Features

Retryable Status Codes

Automatically retries on transient failures:

  • 408 - Request Timeout
  • 429 - Too Many Requests
  • 500 - Internal Server Error
  • 502 - Bad Gateway
  • 503 - Service Unavailable
  • 504 - Gateway Timeout

Exponential Backoff with Jitter

Prevents thundering herd through randomized delays:
Formula: delay = min(initial_backoff_ms × (multiplier ^ attempt), max_backoff_ms)
With jitter: D' = D × (1 + U[-j, +j])
Example with defaults:

  • Attempt 0: ~50ms
  • Attempt 1: ~75ms
  • Attempt 2: ~112ms
  • Attempt 3: ~168ms
  • Attempt 4: ~253ms

Backward Compatibility

  • Removed existing max_instance_failover_reroute_attempts behavior
  • Falls back to single attempt when set to 0
  • No breaking changes to existing functionality

Usage

vllm-router --port 8000 \
    --service-discovery static \
    --static-backends "http://localhost:9001,http://localhost:9002" \
    --static-models "facebook/opt-125m,facebook/opt-125m" \
    --routing-logic roundrobin \
    --retry-max-retries 5 \
    --retry-initial-backoff-ms 100 \
    --retry-max-backoff-ms 60000 \
    --retry-backoff-multiplier 2.0 \
    --retry-jitter-factor 0.1

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism with exponential backoff and jitter for transient failures, aligning the router's behavior with the sglang model gateway. The changes include a new RetryConfig dataclass, CLI arguments for configuration, updated documentation, and logic in the request service to handle retryable HTTP status codes (408, 429, 500, 502, 503, 504). Feedback identifies a logic error where retries are effectively disabled by default due to the max_attempts calculation, the inclusion of an unused last_response variable, and a concern that blacklisting URLs for transient errors prevents retrying the same backend in single-node environments.

Comment thread src/vllm_router/services/request_service/request.py Outdated
Comment thread src/vllm_router/services/request_service/request.py Outdated
Comment thread src/vllm_router/services/request_service/request.py Outdated
ikaadil and others added 28 commits May 2, 2026 10:07
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Increase timeout values in e2e test workflow

Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…te attempts (vllm-project#839)

* Add max instance failover reroute attempts configuration

- Introduced a new command-line argument  to specify the number of reroute attempts for failed requests.
- Updated the routing logic to utilize this new configuration, allowing for better handling of request failures.
- Enhanced the request routing service to incorporate the maximum reroute attempts in its logic.

This change improves the robustness of the routing mechanism by allowing for configurable failover behavior.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add command-line argument for LMCache health check interval

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Refactor routing logic to directly set max instance failover reroute attempts

- Removed the set_max_instance_failover_reroute_attempts method and directly assigned the value to the router's attribute.
- Simplified the request routing logic by consolidating endpoint filtering and error handling, improving readability and maintainability.

This change enhances the clarity of the routing logic and streamlines the handling of reroute attempts for failed requests.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add unit tests for instance failover routing logic

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Trigger pipeline

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Refactor request routing to improve request tracking

- Moved the tracking of valid incoming requests to a more appropriate location in the routing logic.
- Simplified the retrieval of endpoint information by ensuring it is called only once, enhancing code clarity.

This change improves the maintainability of the request routing service.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add space

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add the comments

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add empty line

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add comment

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Addressed the comment

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Fix the log

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Add comment

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* Resolve conflict

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

---------

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#847)

Signed-off-by: enneitex <etienne.divet@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…llm-project#760) (vllm-project#844)

Allow the router to be served under a subpath (e.g. /vllm) by passing
root_path through to uvicorn. Also adds Helm chart support via
routerSpec.rootPath.

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…t to INFO. (vllm-project#846)

* Expose LMCache log level as configurable Helm value and default to INFO.

Signed-off-by: nargit <NargiT@users.noreply.github.com>

* Fix names

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

* fix tests and code

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

* fix test for ray-cluster

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

* fix typo

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

* fix yet another typo

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

* update doc

Signed-off-by: NargiT <NargiT@users.noreply.github.com>

---------

Signed-off-by: nargit <NargiT@users.noreply.github.com>
Signed-off-by: NargiT <NargiT@users.noreply.github.com>
Co-authored-by: nargit <NargiT@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ect#849)

* feat(router): add --log-format json option for structured logging

Add a JsonFormatter that outputs log records as JSON with timestamp,
level, logger, message, filename, and lineno fields. The new
--log-format flag (choices: text, json) controls the output format
for both the router loggers and uvicorn.

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

* test: add tests for JsonFormatter and --log-format parser arg

Add TestJsonFormatter class covering JSON output validation, exception
inclusion/exclusion, format switching via set_log_format, and
init_logger format respect. Add parser tests verifying --log-format
defaults to text and accepts json. Update README logging options
documentation.

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

* refactor: address review feedback for JsonFormatter

Instantiate formatter once outside the loop in set_log_format to avoid
redundant allocations. Add stack_info support and default=str fallback
to JsonFormatter for robustness. Add tests for stack_info inclusion and
non-serializable object handling.

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

* style: fix black formatting in test files

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

---------

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>
Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* [Router][Image Edit]: routing multi-part form requests

Signed-off-by: Nuno Ramos <nmiguel123@gmail.com>

* [Router][Refactor]: abstraction for proxying multipart form requests

Signed-off-by: Nuno Ramos <nmiguel123@gmail.com>

---------

Signed-off-by: Nuno Ramos <nmiguel123@gmail.com>
Co-authored-by: Nuno Ramos <nmiguel123@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat(helm) add pdb and expose various options in the values. Add tests

Signed-off-by: enneitex <etienne.divet@gmail.com>

* feat(helm) update README and json schema with new fields

Signed-off-by: enneitex <etienne.divet@gmail.com>

---------

Signed-off-by: enneitex <etienne.divet@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…#834)

Includes:
- GPU Bare-Metal node orchestration
- Secure Traefik ingress + TLS Endpoints (cert-manager)
- Prometheus + Grafana monitoring
- Built-in vLLM production stack + Vllm inference dashboards
- Terraform + Helm integration

Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…roject#777)

* [Feat][Router] Add disaggregated prefill orchestrated routing

Implements support for disaggregated prefill as outlined in the 2025 Q1 roadmap.
This enables prefill/decode disaggregation with router-orchestrated KV cache transfer.

Closes vllm-project#26

Signed-off-by: Yahav <yahavb@amazon.com>

* [CI/Build] Lower Python version requirement to 3.10 for Neuron SDK compatibility

Signed-off-by: Yahav <yahavb@amazon.com>

* [Feat][Router] Address PR review feedback for disaggregated prefill orchestrated routing

- Remove dead code (handle_orchestrated_request method in routing_logic.py)
- Fix prefill request to use max_tokens=1 per proposal spec
- Use shared aiohttp client instead of creating new session per request
- Fix streaming to yield chunks immediately (true streaming)
- Remove redundant isinstance check for DisaggregatedPrefillOrchestratedRouter
- Use router's _find_endpoints method to avoid code duplication

Signed-off-by: Yahav <yahavb@amazon.com>

* fix: use kv_transfer_params instead of disagg_prefill_resp

- Add kv_transfer_params to prefill request to enable disaggregated mode
- Extract kv_transfer_params from prefill response and forward to decode
- Set remote_host to prefill endpoint for KV cache retrieval

Signed-off-by: Yahav <yahavb@amazon.com>

* docs: add example for disaggregated_prefill_orchestrated mode

- Add README with usage instructions and configuration notes
- Add sanitized Kubernetes manifests (router, prefill, decode)
- Include example curl command and expected router logs

Signed-off-by: Yahav <yahavb@amazon.com>

* style: fix black formatting

Signed-off-by: Yahav <yahavb@amazon.com>

* style: fix markdownlint errors in README.md

Signed-off-by: Yahav <yahavb@amazon.com>

* style: fix markdownlint errors in proposal doc

Signed-off-by: Yahav <yahavb@amazon.com>

* docs: clean up DisaggregatedPrefillOrchestratedRouter docstring

Signed-off-by: Yahav <yahavb@amazon.com>

* feat: return 503 with distinct codes for prefill/decode unavailability

- PREFILL_SERVICE_UNAVAILABLE: No prefill endpoints discovered
- DECODE_SERVICE_UNAVAILABLE: No decode endpoints discovered

This allows automated tests to distinguish transient startup issues from real bugs.

Signed-off-by: Yahav <yahavb@amazon.com>

* revert: restore requires-python = 3.12

Signed-off-by: Yahav <yahavb@amazon.com>

* fix: replace angle bracket placeholders with uppercase format

Angle brackets like <your-pvc-name> are interpreted as shell redirections
by shellcheck, causing CI failures. Use uppercase format instead:
YOUR-PVC-NAME, YOUR-MODEL-PATH, etc.

Signed-off-by: Yahav <yahavb@amazon.com>

* fix: remove trailing whitespace from YAML files

Signed-off-by: Yahav <yahavb@amazon.com>

---------

Signed-off-by: Yahav <yahavb@amazon.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* bugfix: deprecate disable log request

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com>

* Update helm/README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>

---------

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com>
Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…#875)

* feat(helm): add configurable NodePort to router service

Add optional `routerSpec.nodePort` field that, when set alongside
`routerSpec.serviceType: NodePort`, pins the NodePort to a fixed value
instead of letting Kubernetes assign a random one on every helm upgrade.

Closes vllm-project#763

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

* fix(helm): move nodePort schema to routerSpec and use truthiness check

- Move nodePort JSON schema property from servingEngineSpec to routerSpec
  where it belongs
- Replace hasKey check with truthiness check in service-router.yaml to
  correctly handle nodePort: null

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>

* docs(helm): document nodePort field in Router Configuration table

Add routerSpec.nodePort entry to the Helm README to document the
configurable NodePort introduced for the router service.

Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com>

---------

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>
Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com>
Co-authored-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…oning models emitting reasoning_content instead of content (vllm-project#873)

Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: enneitex <etienne.divet@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ect#880)

* fix: Detect the media_type instead of hardcode to text/event-stream

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

* test: Add test for audio/wav and text/event-stream

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

* fix: Move media-type before header

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

---------

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…lm-project#889)

Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Isakgicu and others added 19 commits May 2, 2026 10:07
…#891)

Signed-off-by: Gheorghe Isac <gheorghe_isac@smart-x.net>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…outer CRD (vllm-project#881)

- Extend RoutingLogic enum with prefixaware and kvaware strategies
- Add LmcacheControllerPort field with maximum=65535 validation
- Pass --lmcache-controller-port flag when routingLogic is kvaware and port is set
- Add unit tests for routing logic and deployment update detection
- Update CRD YAML, sample manifest, Helm values and README

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…-project#894)

Remove unnecessary secrets RBAC from vllmruntime and vllmrouter controllers
(they only use SecretKeyRef in pod specs, kubelet reads secrets).
Add secrets RBAC (get only) to loraadapter controller which actually reads
secrets via r.Get() for HuggingFace tokens and VLLM API keys.
Fixes vllm-project#871

Signed-off-by: EzgiTastan <gezgit.tech@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…vllm-project#895)

* [CI/Build] Add .dockerignore to exclude test files from Docker builds
Exclude src/tests/, src/examples/, and src/gateway_inference_extension/
from Docker build context to reduce image size.
Fixes vllm-project#615

Signed-off-by: EzgiTastan <gezgit.tech@gmail.com>

* Address review: exclude .dockerignore and Dockerfiles

Signed-off-by: EzgiTastan <gezgit.tech@gmail.com>

---------

Signed-off-by: EzgiTastan <gezgit.tech@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…vllm-project#898)

* Add generic cache-server resources support for InfiniBand/RDMA

Signed-off-by: happytreees <eevans@vultr.com>

* Update helm/README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com>

---------

Signed-off-by: happytreees <eevans@vultr.com>
Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ct#906)

Signed-off-by: Max Wittig <max.wittig@siemens.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…llm-project#908)

Signed-off-by: Can Sun <sucan@amazon.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#886)

* fix(helm) update sharedPvcStorage so it correctly create PVC and/or PV

Signed-off-by: enneitex <etienne.divet@gmail.com>

* fix(helm) various fix to align values, templates and README. update json schema

Signed-off-by: enneitex <etienne.divet@gmail.com>

* fix(helm) use correct type for lmcacheControllerPort

Signed-off-by: enneitex <etienne.divet@gmail.com>

* fix(helm) rayCluster: fix templating and add tests

Signed-off-by: enneitex <etienne.divet@gmail.com>

* fix(helm) deployment-router: simplify templating and add tests

Signed-off-by: enneitex <etienne.divet@gmail.com>

* feat(helm) improve schema, add tests and CI validation

Signed-off-by: enneitex <etienne.divet@gmail.com>

feat(helm) improve testing around rayCluster and engine deployment

Signed-off-by: enneitex <etienne.divet@gmail.com>

ci: retrigger

Signed-off-by: enneitex <etienne.divet@gmail.com>

fix(tutorials) rename ScaledObject file name

Signed-off-by: enneitex <etienne.divet@gmail.com>

fix pre-commit

Signed-off-by: enneitex <etienne.divet@gmail.com>

---------

Signed-off-by: enneitex <etienne.divet@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…nership conflict (vllm-project#907)

* fix: omit .spec.replicas when KEDA is enabled to prevent field ownership conflict

When KEDA manages a Deployment via ScaledObject, both KEDA and Helm claim
ownership of .spec.replicas, causing server-side apply conflicts on
helm upgrade. This follows the same pattern already used in
deployment-router.yaml (lines 11-13) where replicas is conditionally
omitted when autoscaling is enabled.

Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com>

* fix: use direct truthiness check for keda config instead of hasKey

Avoids nil pointer evaluation if keda is explicitly set to null in values.
More idiomatic and consistent with template patterns used elsewhere.

Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com>

---------

Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…roject#903)

* init

Signed-off-by: aeon-x <talexcao@gmail.com>

* update proto

Signed-off-by: aeon-x <talexcao@gmail.com>

* fix review comments

Signed-off-by: aeon-x <talexcao@gmail.com>

* fix review comments for autoscaling

Signed-off-by: aeon-x <talexcao@gmail.com>

* fix lint

Signed-off-by: aeon-x <talexcao@gmail.com>

---------

Signed-off-by: aeon-x <talexcao@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat(helm) add support for per-model tolerations

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

* fix(helm) simplify if/else branching

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

* fix(helm) fix indentations

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

* fix(helm) add readme entry and also apply tolerations to ray-cluster.yaml

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

* fix(helm) fix remaining indentation errors

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

* fix(helm) fix generated schema

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

---------

Signed-off-by: Alexander Sing <AlexanderSing@live.de>
Signed-off-by: Alexander Sing <56878419+AlexanderSing@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* security: pin GitHub Actions to SHAs

Pin workflow actions to immutable commit SHAs and add Dependabot updates for github-actions. Refs vllm-project#904.

Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com>

* Specify a day for the weekly schedule of dependabot

Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com>

* ci: retrigger workflow

Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com>

---------

Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat: Implement openai provider

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

* test: Update tests

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

* test: Add unit test for external providers

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

* fix: Increase timeout

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>

---------

Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: aeon-x <talexcao@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Yaoming Zhan <yzhan@Mac.attlocal.net>
Co-authored-by: Yaoming Zhan <yzhan@Mac.attlocal.net>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: enneitex <etienne.divet@gmail.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#916)

Closes vllm-project#915

Signed-off-by: Linus Schlumberger <linus.schlumberger@siemens.com>
Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Add configurable retry mechanism for transient failures:
- RetryConfig dataclass with exponential backoff calculation
- CLI arguments: --retry-max-retries, --retry-initial-backoff-ms,
  --retry-max-backoff-ms, --retry-backoff-multiplier,
  --retry-jitter-factor, --disable-retries
- Retryable status codes: 408, 429, 500, 502, 503, 504
- Remove max_instance_failover_reroute_attempts in favor of retry_config

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
ikaadil and others added 4 commits May 2, 2026 10:12
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
- Fix Helm chart version regression (0.1.10 -> 0.1.19)
- Remove dead max_instance_failover_reroute_attempts parameter
- Clarify max_retries semantics in docs (total attempts, not retries)
- Add input validation for retry CLI arguments
- Fix thread safety in RoutingInterface singleton init
- Add super().__init__() to DisaggregatedPrefillOrchestratedRouter
- Add comment explaining HTTPException retry behavior
- Rename test to test_non_retryable_http_exception_not_retried
- Add test for retryable HTTPException (503)

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
… 0.1.10.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Move media_type extraction after headers are received to properly
capture the Content-Type from backend responses. This fixes the
audio content type forwarding test.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
ikaadil added 2 commits May 2, 2026 15:53
- Add tests for is_retryable_status function

- Fix retry logic to exclude backends with retryable errors

- Update validation message for retry_max_retries

- Fix type annotation in routing_logic.py

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
@ikaadil
Copy link
Copy Markdown
Contributor Author

ikaadil commented May 4, 2026

@ruizhang0101 could you please review the MR? Thanks!

@ruizhang0101
Copy link
Copy Markdown
Collaborator

@aeon-x Could you take a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.