[Feat][Router]: Add automatic retry with exponential backoff and jitter#939
[Feat][Router]: Add automatic retry with exponential backoff and jitter#939ikaadil wants to merge 53 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a retry mechanism with exponential backoff and jitter for transient failures, aligning the router's behavior with the sglang model gateway. The changes include a new RetryConfig dataclass, CLI arguments for configuration, updated documentation, and logic in the request service to handle retryable HTTP status codes (408, 429, 500, 502, 503, 504). Feedback identifies a logic error where retries are effectively disabled by default due to the max_attempts calculation, the inclusion of an unused last_response variable, and a concern that blacklisting URLs for transient errors prevents retrying the same backend in single-node environments.
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Increase timeout values in e2e test workflow Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…te attempts (vllm-project#839) * Add max instance failover reroute attempts configuration - Introduced a new command-line argument to specify the number of reroute attempts for failed requests. - Updated the routing logic to utilize this new configuration, allowing for better handling of request failures. - Enhanced the request routing service to incorporate the maximum reroute attempts in its logic. This change improves the robustness of the routing mechanism by allowing for configurable failover behavior. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add command-line argument for LMCache health check interval Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor routing logic to directly set max instance failover reroute attempts - Removed the set_max_instance_failover_reroute_attempts method and directly assigned the value to the router's attribute. - Simplified the request routing logic by consolidating endpoint filtering and error handling, improving readability and maintainability. This change enhances the clarity of the routing logic and streamlines the handling of reroute attempts for failed requests. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add unit tests for instance failover routing logic Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Trigger pipeline Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Refactor request routing to improve request tracking - Moved the tracking of valid incoming requests to a more appropriate location in the routing logic. - Simplified the retrieval of endpoint information by ensuring it is called only once, enhancing code clarity. This change improves the maintainability of the request routing service. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add space Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add the comments Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add empty line Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Addressed the comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Fix the log Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Add comment Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> * Resolve conflict Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> --------- Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#847) Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…llm-project#760) (vllm-project#844) Allow the router to be served under a subpath (e.g. /vllm) by passing root_path through to uvicorn. Also adds Helm chart support via routerSpec.rootPath. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…t to INFO. (vllm-project#846) * Expose LMCache log level as configurable Helm value and default to INFO. Signed-off-by: nargit <NargiT@users.noreply.github.com> * Fix names Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix tests and code Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix test for ray-cluster Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * fix yet another typo Signed-off-by: NargiT <NargiT@users.noreply.github.com> * update doc Signed-off-by: NargiT <NargiT@users.noreply.github.com> --------- Signed-off-by: nargit <NargiT@users.noreply.github.com> Signed-off-by: NargiT <NargiT@users.noreply.github.com> Co-authored-by: nargit <NargiT@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ect#849) * feat(router): add --log-format json option for structured logging Add a JsonFormatter that outputs log records as JSON with timestamp, level, logger, message, filename, and lineno fields. The new --log-format flag (choices: text, json) controls the output format for both the router loggers and uvicorn. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * test: add tests for JsonFormatter and --log-format parser arg Add TestJsonFormatter class covering JSON output validation, exception inclusion/exclusion, format switching via set_log_format, and init_logger format respect. Add parser tests verifying --log-format defaults to text and accepts json. Update README logging options documentation. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * refactor: address review feedback for JsonFormatter Instantiate formatter once outside the loop in set_log_format to avoid redundant allocations. Add stack_info support and default=str fallback to JsonFormatter for robustness. Add tests for stack_info inclusion and non-serializable object handling. Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * style: fix black formatting in test files Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* [Router][Image Edit]: routing multi-part form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> * [Router][Refactor]: abstraction for proxying multipart form requests Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> --------- Signed-off-by: Nuno Ramos <nmiguel123@gmail.com> Co-authored-by: Nuno Ramos <nmiguel123@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat(helm) add pdb and expose various options in the values. Add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * feat(helm) update README and json schema with new fields Signed-off-by: enneitex <etienne.divet@gmail.com> --------- Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…#834) Includes: - GPU Bare-Metal node orchestration - Secure Traefik ingress + TLS Endpoints (cert-manager) - Prometheus + Grafana monitoring - Built-in vLLM production stack + Vllm inference dashboards - Terraform + Helm integration Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…roject#777) * [Feat][Router] Add disaggregated prefill orchestrated routing Implements support for disaggregated prefill as outlined in the 2025 Q1 roadmap. This enables prefill/decode disaggregation with router-orchestrated KV cache transfer. Closes vllm-project#26 Signed-off-by: Yahav <yahavb@amazon.com> * [CI/Build] Lower Python version requirement to 3.10 for Neuron SDK compatibility Signed-off-by: Yahav <yahavb@amazon.com> * [Feat][Router] Address PR review feedback for disaggregated prefill orchestrated routing - Remove dead code (handle_orchestrated_request method in routing_logic.py) - Fix prefill request to use max_tokens=1 per proposal spec - Use shared aiohttp client instead of creating new session per request - Fix streaming to yield chunks immediately (true streaming) - Remove redundant isinstance check for DisaggregatedPrefillOrchestratedRouter - Use router's _find_endpoints method to avoid code duplication Signed-off-by: Yahav <yahavb@amazon.com> * fix: use kv_transfer_params instead of disagg_prefill_resp - Add kv_transfer_params to prefill request to enable disaggregated mode - Extract kv_transfer_params from prefill response and forward to decode - Set remote_host to prefill endpoint for KV cache retrieval Signed-off-by: Yahav <yahavb@amazon.com> * docs: add example for disaggregated_prefill_orchestrated mode - Add README with usage instructions and configuration notes - Add sanitized Kubernetes manifests (router, prefill, decode) - Include example curl command and expected router logs Signed-off-by: Yahav <yahavb@amazon.com> * style: fix black formatting Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in README.md Signed-off-by: Yahav <yahavb@amazon.com> * style: fix markdownlint errors in proposal doc Signed-off-by: Yahav <yahavb@amazon.com> * docs: clean up DisaggregatedPrefillOrchestratedRouter docstring Signed-off-by: Yahav <yahavb@amazon.com> * feat: return 503 with distinct codes for prefill/decode unavailability - PREFILL_SERVICE_UNAVAILABLE: No prefill endpoints discovered - DECODE_SERVICE_UNAVAILABLE: No decode endpoints discovered This allows automated tests to distinguish transient startup issues from real bugs. Signed-off-by: Yahav <yahavb@amazon.com> * revert: restore requires-python = 3.12 Signed-off-by: Yahav <yahavb@amazon.com> * fix: replace angle bracket placeholders with uppercase format Angle brackets like <your-pvc-name> are interpreted as shell redirections by shellcheck, causing CI failures. Use uppercase format instead: YOUR-PVC-NAME, YOUR-MODEL-PATH, etc. Signed-off-by: Yahav <yahavb@amazon.com> * fix: remove trailing whitespace from YAML files Signed-off-by: Yahav <yahavb@amazon.com> --------- Signed-off-by: Yahav <yahavb@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* bugfix: deprecate disable log request Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> * Update helm/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> --------- Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…#875) * feat(helm): add configurable NodePort to router service Add optional `routerSpec.nodePort` field that, when set alongside `routerSpec.serviceType: NodePort`, pins the NodePort to a fixed value instead of letting Kubernetes assign a random one on every helm upgrade. Closes vllm-project#763 Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * fix(helm): move nodePort schema to routerSpec and use truthiness check - Move nodePort JSON schema property from servingEngineSpec to routerSpec where it belongs - Replace hasKey check with truthiness check in service-router.yaml to correctly handle nodePort: null Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> * docs(helm): document nodePort field in Router Configuration table Add routerSpec.nodePort entry to the Helm README to document the configurable NodePort introduced for the router service. Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> --------- Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Signed-off-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Keyu_Chen <54015474+km5ar@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…oning models emitting reasoning_content instead of content (vllm-project#873) Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ect#880) * fix: Detect the media_type instead of hardcode to text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Add test for audio/wav and text/event-stream Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * fix: Move media-type before header Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> --------- Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…lm-project#889) Signed-off-by: Nejc Habjan <nejc.habjan@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…#891) Signed-off-by: Gheorghe Isac <gheorghe_isac@smart-x.net> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…outer CRD (vllm-project#881) - Extend RoutingLogic enum with prefixaware and kvaware strategies - Add LmcacheControllerPort field with maximum=65535 validation - Pass --lmcache-controller-port flag when routingLogic is kvaware and port is set - Add unit tests for routing logic and deployment update detection - Update CRD YAML, sample manifest, Helm values and README Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…-project#894) Remove unnecessary secrets RBAC from vllmruntime and vllmrouter controllers (they only use SecretKeyRef in pod specs, kubelet reads secrets). Add secrets RBAC (get only) to loraadapter controller which actually reads secrets via r.Get() for HuggingFace tokens and VLLM API keys. Fixes vllm-project#871 Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…vllm-project#895) * [CI/Build] Add .dockerignore to exclude test files from Docker builds Exclude src/tests/, src/examples/, and src/gateway_inference_extension/ from Docker build context to reduce image size. Fixes vllm-project#615 Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> * Address review: exclude .dockerignore and Dockerfiles Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> --------- Signed-off-by: EzgiTastan <gezgit.tech@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…vllm-project#898) * Add generic cache-server resources support for InfiniBand/RDMA Signed-off-by: happytreees <eevans@vultr.com> * Update helm/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com> --------- Signed-off-by: happytreees <eevans@vultr.com> Signed-off-by: happytreees <110687499+happytreees@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ct#906) Signed-off-by: Max Wittig <max.wittig@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…llm-project#908) Signed-off-by: Can Sun <sucan@amazon.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#886) * fix(helm) update sharedPvcStorage so it correctly create PVC and/or PV Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) various fix to align values, templates and README. update json schema Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) use correct type for lmcacheControllerPort Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) rayCluster: fix templating and add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * fix(helm) deployment-router: simplify templating and add tests Signed-off-by: enneitex <etienne.divet@gmail.com> * feat(helm) improve schema, add tests and CI validation Signed-off-by: enneitex <etienne.divet@gmail.com> feat(helm) improve testing around rayCluster and engine deployment Signed-off-by: enneitex <etienne.divet@gmail.com> ci: retrigger Signed-off-by: enneitex <etienne.divet@gmail.com> fix(tutorials) rename ScaledObject file name Signed-off-by: enneitex <etienne.divet@gmail.com> fix pre-commit Signed-off-by: enneitex <etienne.divet@gmail.com> --------- Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…nership conflict (vllm-project#907) * fix: omit .spec.replicas when KEDA is enabled to prevent field ownership conflict When KEDA manages a Deployment via ScaledObject, both KEDA and Helm claim ownership of .spec.replicas, causing server-side apply conflicts on helm upgrade. This follows the same pattern already used in deployment-router.yaml (lines 11-13) where replicas is conditionally omitted when autoscaling is enabled. Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> * fix: use direct truthiness check for keda config instead of hasKey Avoids nil pointer evaluation if keda is explicitly set to null in values. More idiomatic and consistent with template patterns used elsewhere. Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> --------- Signed-off-by: Luis Rivera-Wong <lriverawong@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…roject#903) * init Signed-off-by: aeon-x <talexcao@gmail.com> * update proto Signed-off-by: aeon-x <talexcao@gmail.com> * fix review comments Signed-off-by: aeon-x <talexcao@gmail.com> * fix review comments for autoscaling Signed-off-by: aeon-x <talexcao@gmail.com> * fix lint Signed-off-by: aeon-x <talexcao@gmail.com> --------- Signed-off-by: aeon-x <talexcao@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat(helm) add support for per-model tolerations Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) simplify if/else branching Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix indentations Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) add readme entry and also apply tolerations to ray-cluster.yaml Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix remaining indentation errors Signed-off-by: Alexander Sing <AlexanderSing@live.de> * fix(helm) fix generated schema Signed-off-by: Alexander Sing <AlexanderSing@live.de> --------- Signed-off-by: Alexander Sing <AlexanderSing@live.de> Signed-off-by: Alexander Sing <56878419+AlexanderSing@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* security: pin GitHub Actions to SHAs Pin workflow actions to immutable commit SHAs and add Dependabot updates for github-actions. Refs vllm-project#904. Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> * Specify a day for the weekly schedule of dependabot Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> * ci: retrigger workflow Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> --------- Signed-off-by: xiaotian-yu <xiaotian-yu@users.noreply.github.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Rui Zhang <zrfishnoodles@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
* feat: Implement openai provider Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Update tests Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * test: Add unit test for external providers Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> * fix: Increase timeout Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> --------- Signed-off-by: Shern Shiou Tan <shernshiou@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: aeon-x <talexcao@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Yaoming Zhan <yzhan@Mac.attlocal.net> Co-authored-by: Yaoming Zhan <yzhan@Mac.attlocal.net> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: enneitex <etienne.divet@gmail.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…m-project#916) Closes vllm-project#915 Signed-off-by: Linus Schlumberger <linus.schlumberger@siemens.com> Co-authored-by: Rui Zhang <51696593+ruizhang0101@users.noreply.github.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Add configurable retry mechanism for transient failures: - RetryConfig dataclass with exponential backoff calculation - CLI arguments: --retry-max-retries, --retry-initial-backoff-ms, --retry-max-backoff-ms, --retry-backoff-multiplier, --retry-jitter-factor, --disable-retries - Retryable status codes: 408, 429, 500, 502, 503, 504 - Remove max_instance_failover_reroute_attempts in favor of retry_config Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
- Fix Helm chart version regression (0.1.10 -> 0.1.19) - Remove dead max_instance_failover_reroute_attempts parameter - Clarify max_retries semantics in docs (total attempts, not retries) - Add input validation for retry CLI arguments - Fix thread safety in RoutingInterface singleton init - Add super().__init__() to DisaggregatedPrefillOrchestratedRouter - Add comment explaining HTTPException retry behavior - Rename test to test_non_retryable_http_exception_not_retried - Add test for retryable HTTPException (503) Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
… 0.1.10. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Move media_type extraction after headers are received to properly capture the Content-Type from backend responses. This fixes the audio content type forwarding test. Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
- Add tests for is_retryable_status function - Fix retry logic to exclude backends with retryable errors - Update validation message for retry_max_retries - Fix type annotation in routing_logic.py Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
|
@ruizhang0101 could you please review the MR? Thanks! |
|
@aeon-x Could you take a look at this? |
Overview
Implements automatic retry for transient HTTP failures with configurable exponential backoff and jitter. This enhances the router's resilience by automatically retrying failed requests instead of immediately returning errors to clients.
Details: https://medium.com/@avnein4988/mitigating-the-thundering-herd-problem-exponential-backoff-with-jitter-b507cdf90d62
Changes
Core Implementation
route_general_request()CLI Arguments
Added retry configuration options:
--retry-max-retries(default: 5)--retry-initial-backoff-ms(default: 50)--retry-max-backoff-ms(default: 30000)--retry-backoff-multiplier(default: 1.5)--retry-jitter-factor(default: 0.2)--disable-retries: Disable all retriesKey Features
Retryable Status Codes
Automatically retries on transient failures:
408- Request Timeout429- Too Many Requests500- Internal Server Error502- Bad Gateway503- Service Unavailable504- Gateway TimeoutExponential Backoff with Jitter
Prevents thundering herd through randomized delays:
Formula:
delay = min(initial_backoff_ms × (multiplier ^ attempt), max_backoff_ms)With jitter:
D' = D × (1 + U[-j, +j])Example with defaults:
Backward Compatibility
max_instance_failover_reroute_attemptsbehaviorUsage
vllm-router --port 8000 \ --service-discovery static \ --static-backends "http://localhost:9001,http://localhost:9002" \ --static-models "facebook/opt-125m,facebook/opt-125m" \ --routing-logic roundrobin \ --retry-max-retries 5 \ --retry-initial-backoff-ms 100 \ --retry-max-backoff-ms 60000 \ --retry-backoff-multiplier 2.0 \ --retry-jitter-factor 0.1