[Feat][Router]: Add automatic retry with exponential backoff and jitter by ikaadil · Pull Request #939 · vllm-project/production-stack

ikaadil · 2026-05-01T22:49:01Z

Overview

Implements automatic retry for transient HTTP failures with configurable exponential backoff and jitter. This enhances the router's resilience by automatically retrying failed requests instead of immediately returning errors to clients.

Details: https://medium.com/@avnein4988/mitigating-the-thundering-herd-problem-exponential-backoff-with-jitter-b507cdf90d62

Changes

Core Implementation

RetryConfig dataclass: Configurable retry parameters with exponential backoff calculation
Retry logic: Integrated into request processing flow in route_general_request()
Retryable status detection: Function to identify transient failures

CLI Arguments

Added retry configuration options:

--retry-max-retries (default: 5)
--retry-initial-backoff-ms (default: 50)
--retry-max-backoff-ms (default: 30000)
--retry-backoff-multiplier (default: 1.5)
--retry-jitter-factor (default: 0.2)
--disable-retries: Disable all retries

Key Features

Retryable Status Codes

Automatically retries on transient failures:

408 - Request Timeout
429 - Too Many Requests
500 - Internal Server Error
502 - Bad Gateway
503 - Service Unavailable
504 - Gateway Timeout

Exponential Backoff with Jitter

Prevents thundering herd through randomized delays:
Formula: delay = min(initial_backoff_ms × (multiplier ^ attempt), max_backoff_ms)
With jitter: D' = D × (1 + U[-j, +j])
Example with defaults:

Attempt 0: ~50ms
Attempt 1: ~75ms
Attempt 2: ~112ms
Attempt 3: ~168ms
Attempt 4: ~253ms

Backward Compatibility

Removed existing max_instance_failover_reroute_attempts behavior
Falls back to single attempt when set to 0
No breaking changes to existing functionality

Usage

vllm-router --port 8000 \
    --service-discovery static \
    --static-backends "http://localhost:9001,http://localhost:9002" \
    --static-models "facebook/opt-125m,facebook/opt-125m" \
    --routing-logic roundrobin \
    --retry-max-retries 5 \
    --retry-initial-backoff-ms 100 \
    --retry-max-backoff-ms 60000 \
    --retry-backoff-multiplier 2.0 \
    --retry-jitter-factor 0.1

gemini-code-assist

Code Review

This pull request introduces a retry mechanism with exponential backoff and jitter for transient failures, aligning the router's behavior with the sglang model gateway. The changes include a new RetryConfig dataclass, CLI arguments for configuration, updated documentation, and logic in the request service to handle retryable HTTP status codes (408, 429, 500, 502, 503, 504). Feedback identifies a logic error where retries are effectively disabled by default due to the max_attempts calculation, the inclusion of an unused last_response variable, and a concern that blacklisting URLs for transient errors prevents retrying the same backend in single-node environments.

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Increase timeout values in e2e test workflow Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…te attempts (vllm-project#839) * Add max instance failover reroute attempts configuration - Introduced a new command-line argument to specify the number of reroute attempts for failed requests. - Updated the routing logic to utilize this new configuration, allowing for better handling of request failures. - Enhanced the request routing service to incorporate the maximum reroute attempts in its logic. This change improves the robustness of the routing mechanism by allowing for configurable failover behavior. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add command-line argument for LMCache health check interval Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor routing logic to directly set max instance failover reroute attempts - Removed the set_max_instance_failover_reroute_attempts method and directly assigned the value to the router's attribute. - Simplified the request routing logic by consolidating endpoint filtering and error handling, improving readability and maintainability. This change enhances the clarity of the routing logic and streamlines the handling of reroute attempts for failed requests. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add unit tests for instance failover routing logic Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Trigger pipeline Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor request routing to improve request tracking - Moved the tracking of valid incoming requests to a more appropriate location in the routing logic. - Simplified the retrieval of endpoint information by ensuring it is called only once, enhancing code clarity. This change improves the maintainability of the request routing service. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add space Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add the comments Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add empty line Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Addressed the comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Fix the log Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Resolve conflict Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> --------- Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…m-project#847) Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…llm-project#760) (vllm-project#844) Allow the router to be served under a subpath (e.g. /vllm) by passing root_path through to uvicorn. Also adds Helm chart support via routerSpec.rootPath. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…t to INFO. (vllm-project#846) * Expose LMCache log level as configurable Helm value and default to INFO. Signed-off-by: nargit <NargiT@users.noreply.github.com> * Fix names Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix tests and code Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix test for ray-cluster Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix yet another typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * update doc Signed-off-by: NargiT <NargiT@users.noreply.github.com> --------- Signed-off-by: nargit <NargiT@users.noreply.github.com> Signed-off-by: NargiT <NargiT@users.noreply.github.com> Co-authored-by: nargit <NargiT@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…ect#849) * feat(router): add --log-format json option for structured logging Add a JsonFormatter that outputs log records as JSON with timestamp, level, logger, message, filename, and lineno fields. The new --log-format flag (choices: text, json) controls the output format for both the router loggers and uvicorn. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * test: add tests for JsonFormatter and --log-format parser arg Add TestJsonFormatter class covering JSON output validation, exception inclusion/exclusion, format switching via set_log_format, and init_logger format respect. Add parser tests verifying --log-format defaults to text and accepts json. Update README logging options documentation. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * refactor: address review feedback for JsonFormatter Instantiate formatter once outside the loop in set_log_format to avoid redundant allocations. Add stack_info support and default=str fallback to JsonFormatter for robustness. Add tests for stack_info inclusion and non-serializable object handling. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * style: fix black formatting in test files Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* [Router][Image Edit]: routing multi-part form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> * [Router][Refactor]: abstraction for proxying multipart form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> --------- Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> Co-authored-by: Nuno Ramos <nmiguel123@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* feat(helm) add pdb and expose various options in the values. Add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * feat(helm) update README and json schema with new fields Signed-off-by: enneitex <etienne.divet@gmail.com> --------- Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…#834) Includes: - GPU Bare-Metal node orchestration - Secure Traefik ingress + TLS Endpoints (cert-manager) - Prometheus + Grafana monitoring - Built-in vLLM production stack + Vllm inference dashboards - Terraform + Helm integration Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…roject#777) * [Feat][Router] Add disaggregated prefill orchestrated routing Implements support for disaggregated prefill as outlined in the 2025 Q1 roadmap. This enables prefill/decode disaggregation with router-orchestrated KV cache transfer. Closes vllm-project#26 Signed-off-by: Yahav <yahavb@amazon.com> * [CI/Build] Lower Python version requirement to 3.10 for Neuron SDK compatibility Signed-off-by: Yahav <yahavb@amazon.com> * [Feat][Router] Address PR review feedback for disaggregated prefill orchestrated routing - Remove dead code (handle_orchestrated_request method in routing_logic.py) - Fix prefill request to use max_tokens=1 per proposal spec - Use shared aiohttp client instead of creating new session per request - Fix streaming to yield chunks immediately (true streaming) - Remove redundant isinstance check for DisaggregatedPrefillOrchestratedRouter - Use router's _find_endpoints method to avoid code duplication Signed-off-by: Yahav <yahavb@amazon.com> * fix: use kv_transfer_params instead of disagg_prefill_resp - Add kv_transfer_params to prefill request to enable disaggregated mode - Extract kv_transfer_params from prefill response and forward to decode - Set remote_host to prefill endpoint for KV cache retrieval Signed-off-by: Yahav <yahavb@amazon.com> * docs: add example for disaggregated_prefill_orchestrated mode - Add README with usage instructions and configuration notes - Add sanitized Kubernetes manifests (router, prefill, decode) - Include example curl command and expected router logs Signed-off-by: Yahav <yahavb@amazon.com> * style: fix black formatting Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in README.md Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in proposal doc Signed-off-by: Yahav <yahavb@amazon.com> * docs: clean up DisaggregatedPrefillOrchestratedRouter docstring Signed-off-by: Yahav <yahavb@amazon.com> * feat: return 503 with distinct codes for prefill/decode unavailability - PREFILL_SERVICE_UNAVAILABLE: No prefill endpoints discovered - DECODE_SERVICE_UNAVAILABLE: No decode endpoints discovered This allows automated tests to distinguish transient startup issues from real bugs. Signed-off-by: Yahav <yahavb@amazon.com> * revert: restore requires-python = 3.12 Signed-off-by: Yahav <yahavb@amazon.com> * fix: replace angle bracket placeholders with uppercase format Angle brackets like <your-pvc-name> are interpreted as shell redirections by shellcheck, causing CI failures. Use uppercase format instead: YOUR-PVC-NAME, YOUR-MODEL-PATH, etc. Signed-off-by: Yahav <yahavb@amazon.com> * fix: remove trailing whitespace from YAML files Signed-off-by: Yahav <yahavb@amazon.com> --------- Signed-off-by: Yahav <yahavb@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* bugfix: deprecate disable log request Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> * Update helm/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> --------- Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…#875) * feat(helm): add configurable NodePort to router service Add optional `routerSpec.nodePort` field that, when set alongside `routerSpec.serviceType: NodePort`, pins the NodePort to a fixed value instead of letting Kubernetes assign a random one on every helm upgrade. Closes vllm-project#763 Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * fix(helm): move nodePort schema to routerSpec and use truthiness check - Move nodePort JSON schema property from servingEngineSpec to routerSpec where it belongs - Replace hasKey check with truthiness check in service-router.yaml to correctly handle nodePort: null Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * docs(helm): document nodePort field in Router Configuration table Add routerSpec.nodePort entry to the Helm README to document the configurable NodePort introduced for the router service. Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…oning models emitting reasoning_content instead of content (vllm-project#873) Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…ect#880) * fix: Detect the media_type instead of hardcode to text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Add test for audio/wav and text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * fix: Move media-type before header Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> --------- Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…lm-project#889) Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…#891) Signed-off-by: Gheorghe Isac <gheorghe_isac@smart-x.net> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…outer CRD (vllm-project#881) - Extend RoutingLogic enum with prefixaware and kvaware strategies - Add LmcacheControllerPort field with maximum=65535 validation - Pass --lmcache-controller-port flag when routingLogic is kvaware and port is set - Add unit tests for routing logic and deployment update detection - Update CRD YAML, sample manifest, Helm values and README Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…-project#894) Remove unnecessary secrets RBAC from vllmruntime and vllmrouter controllers (they only use SecretKeyRef in pod specs, kubelet reads secrets). Add secrets RBAC (get only) to loraadapter controller which actually reads secrets via r.Get() for HuggingFace tokens and VLLM API keys. Fixes vllm-project#871 Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…vllm-project#895) * [CI/Build] Add .dockerignore to exclude test files from Docker builds Exclude src/tests/, src/examples/, and src/gateway_inference_extension/ from Docker build context to reduce image size. Fixes vllm-project#615 Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> * Address review: exclude .dockerignore and Dockerfiles Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> --------- Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…vllm-project#898) * Add generic cache-server resources support for InfiniBand/RDMA Signed-off-by: happytreees <eevans@vultr.com> * Update helm/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com> --------- Signed-off-by: happytreees <eevans@vultr.com> Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…ct#906) Signed-off-by: Max Wittig <max.wittig@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…llm-project#908) Signed-off-by: Can Sun <sucan@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…m-project#886) * fix(helm) update sharedPvcStorage so it correctly create PVC and/or PV Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) various fix to align values, templates and README. update json schema Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) use correct type for lmcacheControllerPort Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) rayCluster: fix templating and add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) deployment-router: simplify templating and add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * feat(helm) improve schema, add tests and CI validation Signed-off-by: enneitex <etienne.divet@gmail.com> feat(helm) improve testing around rayCluster and engine deployment Signed-off-by: enneitex <etienne.divet@gmail.com> ci: retrigger Signed-off-by: enneitex <etienne.divet@gmail.com> fix(tutorials) rename ScaledObject file name Signed-off-by: enneitex <etienne.divet@gmail.com> fix pre-commit Signed-off-by: enneitex <etienne.divet@gmail.com> --------- Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…nership conflict (vllm-project#907) * fix: omit .spec.replicas when KEDA is enabled to prevent field ownership conflict When KEDA manages a Deployment via ScaledObject, both KEDA and Helm claim ownership of .spec.replicas, causing server-side apply conflicts on helm upgrade. This follows the same pattern already used in deployment-router.yaml (lines 11-13) where replicas is conditionally omitted when autoscaling is enabled. Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> * fix: use direct truthiness check for keda config instead of hasKey Avoids nil pointer evaluation if keda is explicitly set to null in values. More idiomatic and consistent with template patterns used elsewhere. Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> --------- Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…roject#903) * init Signed-off-by: aeon-x <talexcao@gmail.com> * update proto Signed-off-by: aeon-x <talexcao@gmail.com> * fix review comments Signed-off-by: aeon-x <talexcao@gmail.com> * fix review comments for autoscaling Signed-off-by: aeon-x <talexcao@gmail.com> * fix lint Signed-off-by: aeon-x <talexcao@gmail.com> --------- Signed-off-by: aeon-x <talexcao@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* feat(helm) add support for per-model tolerations Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) simplify if/else branching Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix indentations Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) add readme entry and also apply tolerations to ray-cluster.yaml Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix remaining indentation errors Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix generated schema Signed-off-by: Alexander Sing <AlexanderSing@live.de> --------- Signed-off-by: Alexander Sing <AlexanderSing@live.de> Signed-off-by: Alexander Sing <56878419+AlexanderSing@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* security: pin GitHub Actions to SHAs Pin workflow actions to immutable commit SHAs and add Dependabot updates for github-actions. Refs vllm-project#904. Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> * Specify a day for the weekly schedule of dependabot Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> * ci: retrigger workflow Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> --------- Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

* feat: Implement openai provider Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Update tests Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Add unit test for external providers Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * fix: Increase timeout Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> --------- Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: aeon-x <talexcao@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Yaoming Zhan <yzhan@Mac.attlocal.net> Co-authored-by: Yaoming Zhan <yzhan@Mac.attlocal.net> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…m-project#916) Closes vllm-project#915 Signed-off-by: Linus Schlumberger <linus.schlumberger@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Add configurable retry mechanism for transient failures: - RetryConfig dataclass with exponential backoff calculation - CLI arguments: --retry-max-retries, --retry-initial-backoff-ms, --retry-max-backoff-ms, --retry-backoff-multiplier, --retry-jitter-factor, --disable-retries - Retryable status codes: 408, 429, 500, 502, 503, 504 - Remove max_instance_failover_reroute_attempts in favor of retry_config Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>

- Fix Helm chart version regression (0.1.10 -> 0.1.19) - Remove dead max_instance_failover_reroute_attempts parameter - Clarify max_retries semantics in docs (total attempts, not retries) - Add input validation for retry CLI arguments - Fix thread safety in RoutingInterface singleton init - Add super().__init__() to DisaggregatedPrefillOrchestratedRouter - Add comment explaining HTTPException retry behavior - Rename test to test_non_retryable_http_exception_not_retried - Add test for retryable HTTPException (503) Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

… 0.1.10. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Move media_type extraction after headers are received to properly capture the Content-Type from backend responses. This fixes the audio content type forwarding test. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

- Add tests for is_retryable_status function - Fix retry logic to exclude backends with retryable errors - Update validation message for retry_max_retries - Fix type annotation in routing_logic.py Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil · 2026-05-04T09:43:00Z

@ruizhang0101 could you please review the MR? Thanks!

ruizhang0101 · 2026-05-04T16:37:46Z

@aeon-x Could you take a look at this?

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/vllm_router/services/request_service/request.py Outdated

Comment thread src/vllm_router/services/request_service/request.py Outdated

Comment thread src/vllm_router/services/request_service/request.py Outdated

ikaadil force-pushed the request-retry branch from bd6ced6 to 2190348 Compare May 2, 2026 08:03

ikaadil and others added 28 commits May 2, 2026 10:07

Bump Helm chart version to 0.1.10

1b544ca

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Increase timeout values in e2e test workflow (vllm-project#848)

71c5e17

Increase timeout values in e2e test workflow Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

feat(helm) add support for extra manifests and annotation on pvc (vll…

091e76b

…m-project#847) Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Update roadmap year from 2025 to 2026 (vllm-project#856)

2996d29

Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.11

a454441

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.12

b61976a

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.13

6dddbcb

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.14

becb7d1

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.15

1ef3919

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.16

b353576

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.17

fb58295

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.18

bc543e9

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Bump Helm chart version to 0.1.19

38ecc59

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix(benchmark/multi-round-qa): fix TTFT NoneType crash caused by reas…

ca73ff4

…oning models emitting reasoning_content instead of content (vllm-project#873) Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix: fix cache server start command (vllm-project#872)

0f161ea

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

feat(helm): refactor monitoring installation (vllm-project#860)

14d09cb

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix(service_discovery): correctly return 503 on missing endpoints (vl…

205dae3

…lm-project#889) Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Isakgicu and others added 19 commits May 2, 2026 10:07

bugfix: omit replicas field when autoscaling is enabled (vllm-project…

37587a7

…#891) Signed-off-by: Gheorghe Isac <gheorghe_isac@smart-x.net> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

feat(vllm-router): make healthcheck interval configurable (vllm-proje…

07ed572

…ct#906) Signed-off-by: Max Wittig <max.wittig@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

[Router] Add reply and heartbeat port options for KV-aware routing (v…

bc813f3

…llm-project#908) Signed-off-by: Can Sun <sucan@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ci: remove local registry and add runner cleanup (vllm-project#922)

dbcdef5

Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

[Minor Improvements] Vllmruntime Autoscaling in Operator

878840c

Signed-off-by: aeon-x <talexcao@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

fix(helm) fix default values for cache deployment (vllm-project#917)

cc8d465

Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil force-pushed the request-retry branch from 2190348 to 5fe54db Compare May 2, 2026 08:07

ikaadil and others added 4 commits May 2, 2026 10:12

Merge branch 'main' into request-retry

fe5d61a

Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>

Remove AGENTS.md file and downgrade Helm chart version from 0.1.19 to…

946eae0

… 0.1.10. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

[Bugfix] Fix media_type extraction for streaming responses

1c7980e

Move media_type extraction after headers are received to properly capture the Content-Type from backend responses. This fixes the audio content type forwarding test. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

ikaadil force-pushed the request-retry branch from a5bac11 to 1c7980e Compare May 2, 2026 09:54

ikaadil added 2 commits May 2, 2026 15:53

Fix code review issues

997d18a

- Add tests for is_retryable_status function - Fix retry logic to exclude backends with retryable errors - Update validation message for retry_max_retries - Fix type annotation in routing_logic.py Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Trigger pipeline

2f6a063

Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939

[Feat][Router]: Add automatic retry with exponential backoff and jitter#939
ikaadil wants to merge 53 commits intovllm-project:mainfrom
ikaadil:request-retry

ikaadil commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikaadil commented May 4, 2026

Uh oh!

ruizhang0101 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ikaadil commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Core Implementation

CLI Arguments

Key Features

Retryable Status Codes

Exponential Backoff with Jitter

Backward Compatibility

Usage

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikaadil commented May 4, 2026

Uh oh!

ruizhang0101 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ikaadil commented May 1, 2026 •

edited

Loading