Skip to content

[Router] Add phase 1 router queue groundwork#905

Open
ardecode wants to merge 4 commits intovllm-project:mainfrom
ardecode:router-side-queue-phase1-infra
Open

[Router] Add phase 1 router queue groundwork#905
ardecode wants to merge 4 commits intovllm-project:mainfrom
ardecode:router-side-queue-phase1-infra

Conversation

@ardecode
Copy link
Copy Markdown
Contributor

@ardecode ardecode commented Apr 7, 2026

Refs #855

Summary

This PR adds the Phase 1 groundwork for router-side request queueing.

It does not yet enable router-side request queueing in the request path. Instead, it lands the supporting pieces needed for follow-up PRs:

  • router queue CLI/config validation
  • round-robin admissible endpoint selection helper
  • fast admission-oriented engine stats refresh support
  • app initialization plumbing for the admission scrape path

What is included

  • add router queue config flags and validation
  • add round-robin helper for selecting the next admissible endpoint
  • extend engine stats scraping with an admission-oriented fast refresh interval
  • add parser tests for the new validation

What is not included yet

  • no admission controller
  • no request queueing in route_general_request
  • no lease lifecycle or first-token release
  • no router queue metrics

Testing

  • uv run --group lint pre-commit run --all-files
  • uv run --group test pytest src/tests/test_parser.py -q

ardecode added 2 commits April 6, 2026 23:00
Signed-off-by: ardecode <desaiarijit@gmail.com>
Signed-off-by: ardecode <desaiarijit@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces router-side request queuing for round-robin routing, including new CLI configuration options, validation logic, and a dual-interval metrics scraping mechanism. Feedback identifies critical thread-safety concerns in both the EngineStatsScraper and RoundRobinRouter singletons. Additionally, there are concerns regarding O(N^2) performance overhead during metrics updates and the potential for premature timeouts when using a single minimum timeout value for both full and admission-only scrapes.

Comment thread src/vllm_router/stats/engine_stats.py Outdated
Comment thread src/vllm_router/stats/engine_stats.py Outdated
Comment thread src/vllm_router/routers/routing_logic.py Outdated
ardecode added 2 commits April 6, 2026 23:19
Signed-off-by: ardecode <desaiarijit@gmail.com>
Signed-off-by: ardecode <desaiarijit@gmail.com>
@ardecode
Copy link
Copy Markdown
Contributor Author

@ruizhang0101 , ready for review!

@ruizhang0101
Copy link
Copy Markdown
Collaborator

@aeon-x Could you take a look at this?

self.last_endpoints_id = endpoints_id
chosen = self.sorted_endpoints[self.req_id % len(self.sorted_endpoints)]
self.req_id += 1
chosen = self.pick_admissible_endpoint(endpoints, lambda _: True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If chosen is None, chosen.url will crash

raise ValueError(
"Router queue is only supported with roundrobin routing in phase 1."
)
if _get_numeric_arg("router_max_queued_requests", 256) <= 0:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only check these when enable_router_queue is true

def pick_admissible_endpoint(
self,
endpoints: List[EndpointInfo],
is_admissible,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help add comment that is_admissible check is always skipped for now.

And in the future, we will also skip this check router queue is not enabled.

def pick_admissible_endpoint(
self,
endpoints: List[EndpointInfo],
is_admissible,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type hint for is_admissible

@aeon-x
Copy link
Copy Markdown
Collaborator

aeon-x commented Apr 21, 2026

For getting num_requests_waiting from vllm, have you found any other solutions than scraping the full /metrics every admission interval?

I am concerned that this design might not be scalable when we have to scrape many endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants