Skip to content

refactor(miles): share router URL acquisition logic#10

Open
TianyeGGBond wants to merge 1 commit into
rlops:zhenyu/m11-mvp-testfrom
TianyeGGBond:zhenyu/dedupe-router-url-selection
Open

refactor(miles): share router URL acquisition logic#10
TianyeGGBond wants to merge 1 commit into
rlops:zhenyu/m11-mvp-testfrom
TianyeGGBond:zhenyu/dedupe-router-url-selection

Conversation

@TianyeGGBond
Copy link
Copy Markdown

@TianyeGGBond TianyeGGBond commented May 30, 2026

Context

MilesRouter._use_url() is the sync selector used by direct tests, while _use_url_async() is the production proxy selector. Both paths had the same two-line acquisition logic:

url = min(candidates, key=lambda u: self.worker_request_counts.get(u, 0))
self.worker_request_counts[url] += 1

That duplication is risky because the tested sync path and production async path can drift. The async path also has an important locking invariant: the in-flight count increment must remain inside _workers_changed's condition lock.

Change

  • Add _pick_least_loaded(candidates) to own the shared "select least-loaded URL + increment in-flight count" operation.
  • Use it from both _use_url() and _use_url_async().
  • Keep _use_url_async() calling the helper inside the existing condition lock, preserving the concurrency invariant.
  • Add a fast router test for _use_url_async() so the production selector path has direct coverage for the same least-load behavior already tested on _use_url().

Risk

Low. This is a mechanical refactor of identical code. It should not change Ray/runtime behavior because:

  • _candidate_set() is unchanged.
  • _use_url() still raises on empty candidates.
  • _use_url_async() still waits for a non-empty candidate set.
  • _use_url_async() still increments the request count while holding _workers_changed's lock.

The main risk would be accidentally moving the async increment outside the lock; this PR does not do that.

Validation

Passed syntax/whitespace checks:

python -m compileall miles/router/router.py tests/fast/router/test_router.py
git diff --check origin/zhenyu/m11-mvp-test..HEAD

Passed a focused local selector check with a temporary Ray import stub:

assert router._use_url() == "http://w2:8000"
assert router.worker_request_counts["http://w2:8000"] == 3
assert asyncio.run(router._use_url_async()) == "http://w2:8000"
assert router.worker_request_counts["http://w2:8000"] == 3

Local pytest limitation:

python -m pytest tests/fast/router/test_router.py::TestLoadBalancing -q

This Windows environment cannot collect the fast router test module without the full project dependencies installed. Collection is blocked by missing packages (ray first, then sglang after a temporary Ray stub). The new test is still included for CI/full dev environments where the fast router suite dependencies are present.

@TianyeGGBond TianyeGGBond force-pushed the zhenyu/dedupe-router-url-selection branch from 1463fef to 1e06196 Compare May 30, 2026 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant