[Router] Add phase 1 router queue groundwork#905
[Router] Add phase 1 router queue groundwork#905ardecode wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: ardecode <desaiarijit@gmail.com>
Signed-off-by: ardecode <desaiarijit@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces router-side request queuing for round-robin routing, including new CLI configuration options, validation logic, and a dual-interval metrics scraping mechanism. Feedback identifies critical thread-safety concerns in both the EngineStatsScraper and RoundRobinRouter singletons. Additionally, there are concerns regarding O(N^2) performance overhead during metrics updates and the potential for premature timeouts when using a single minimum timeout value for both full and admission-only scrapes.
Signed-off-by: ardecode <desaiarijit@gmail.com>
Signed-off-by: ardecode <desaiarijit@gmail.com>
|
@ruizhang0101 , ready for review! |
|
@aeon-x Could you take a look at this? |
| self.last_endpoints_id = endpoints_id | ||
| chosen = self.sorted_endpoints[self.req_id % len(self.sorted_endpoints)] | ||
| self.req_id += 1 | ||
| chosen = self.pick_admissible_endpoint(endpoints, lambda _: True) |
There was a problem hiding this comment.
If chosen is None, chosen.url will crash
| raise ValueError( | ||
| "Router queue is only supported with roundrobin routing in phase 1." | ||
| ) | ||
| if _get_numeric_arg("router_max_queued_requests", 256) <= 0: |
There was a problem hiding this comment.
only check these when enable_router_queue is true
| def pick_admissible_endpoint( | ||
| self, | ||
| endpoints: List[EndpointInfo], | ||
| is_admissible, |
There was a problem hiding this comment.
Please help add comment that is_admissible check is always skipped for now.
And in the future, we will also skip this check router queue is not enabled.
| def pick_admissible_endpoint( | ||
| self, | ||
| endpoints: List[EndpointInfo], | ||
| is_admissible, |
There was a problem hiding this comment.
Missing type hint for is_admissible
|
For getting num_requests_waiting from vllm, have you found any other solutions than scraping the full /metrics every admission interval? I am concerned that this design might not be scalable when we have to scrape many endpoints. |
Refs #855
Summary
This PR adds the Phase 1 groundwork for router-side request queueing.
It does not yet enable router-side request queueing in the request path. Instead, it lands the supporting pieces needed for follow-up PRs:
What is included
What is not included yet
Testing
uv run --group lint pre-commit run --all-filesuv run --group test pytest src/tests/test_parser.py -q