Conversation
📝 WalkthroughWalkthroughAdds configurable evacuation shuffling to the Nova external scheduler API: a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/scheduling/nova/external_scheduler_api_test.go (1)
587-621: Consider adding verification that shuffling actually occurs.The current test assertions verify that:
- Length is preserved
- Original slice is not mutated
- Tail hosts remain unchanged
- All hosts are present
However, none of these assertions would fail if no shuffling occurred at all (e.g., if
GetIntent()returns an error). Consider adding a statistical check that runs the shuffle multiple times and verifies that the first k hosts can appear in different orders:♻️ Example assertion to verify shuffling occurs
// For evacuation cases with k > 1, verify shuffling can reorder hosts // Run multiple iterations and check that we see different orderings if tt.request == evacuateRequest && len(tt.hosts) > 1 && tt.k > 1 { seen := make(map[string]bool) for i := 0; i < 100; i++ { r := shuffleTopHostsForEvacuation(tt.request, tt.hosts, tt.k) key := strings.Join(r[:min(tt.k, len(r))], ",") seen[key] = true } if len(seen) == 1 { t.Errorf("expected shuffling to produce different orderings, but got only one") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/external_scheduler_api_test.go` around lines 587 - 621, The test currently checks invariants but doesn't assert that shuffleTopHostsForEvacuation actually reorders hosts; add a probabilistic check that for evacuation cases (when request == evacuateRequest) and k>1 you call shuffleTopHostsForEvacuation repeatedly (e.g., ~100 iterations), collect the ordering of the first min(k,len(result)) hosts into a set (use strings.Join to build a key) and assert the set size > 1 to fail if shuffling never changes the ordering; locate this check inside the t.Run for each tt and guard it so it only runs for evacuateRequest and tt.k>1 to avoid false positives.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/scheduling/nova/external_scheduler_api_test.go`:
- Around line 587-621: The test currently checks invariants but doesn't assert
that shuffleTopHostsForEvacuation actually reorders hosts; add a probabilistic
check that for evacuation cases (when request == evacuateRequest) and k>1 you
call shuffleTopHostsForEvacuation repeatedly (e.g., ~100 iterations), collect
the ordering of the first min(k,len(result)) hosts into a set (use strings.Join
to build a key) and assert the set size > 1 to fail if shuffling never changes
the ordering; locate this check inside the t.Run for each tt and guard it so it
only runs for evacuateRequest and tt.k>1 to avoid false positives.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 12aa6816-615c-4565-a450-88345e87b91a
📒 Files selected for processing (3)
internal/scheduling/nova/external_scheduler_api.gointernal/scheduling/nova/external_scheduler_api_test.gointernal/scheduling/nova/integration_test.go
| // shuffleTopHostsForEvacuation randomly reorders the first k hosts if the request | ||
| // is an evacuation. This helps distribute evacuated VMs across multiple hosts | ||
| // rather than concentrating them on the single "best" host. | ||
| func shuffleTopHostsForEvacuation(request api.ExternalSchedulerRequest, hosts []string, k int) []string { | ||
| intent, err := request.GetIntent() | ||
| if err != nil || intent != api.EvacuateIntent { | ||
| return hosts | ||
| } | ||
| if k <= 0 { | ||
| k = 3 | ||
| } | ||
| n := min(k, len(hosts)) | ||
| if n <= 1 { | ||
| return hosts | ||
| } | ||
| result := make([]string, len(hosts)) | ||
| copy(result, hosts) | ||
| rand.Shuffle(n, func(i, j int) { | ||
| result[i], result[j] = result[j], result[i] | ||
| }) | ||
| slog.Info("shuffled top hosts for evacuation", "k", n, "hosts", result[:n]) | ||
| return result | ||
| } | ||
|
|
There was a problem hiding this comment.
We could also include a weigher that introduces fuzzyness and is only enabled for some scheduling kinds. What do you think about this approach? In that way, it would be transparent in the pipeline CRD why/why not a reservation was located there, and we don't break other weighers.
There was a problem hiding this comment.
Something like:
# ...
- name: randomize
multiplier: 0.25 # can be used to increase the randomness
params:
- {key: consideredIntents, stringListValue: ["evacuate"]}
description: |
This step adds random weights between [-1.0, 1.0] to the
hosts. Use the multiplier to adjust its strength.There was a problem hiding this comment.
^ didn't check the specific intent value, but the idea is clear
There was a problem hiding this comment.
I was thinking about something like that as well. I see 2 potential issues:
- I want random but not overall host (only reorder the top k hosts)
- I am not sure if this can overwrite weights alla [1,1,1,1,0,0,0,0]
Test Coverage ReportTest Coverage 📊: 68.4% |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
internal/scheduling/nova/external_scheduler_api.go (1)
262-267:⚠️ Potential issue | 🟠 MajorThis can push a lower-ranked evacuation target ahead of the reservation-preferred host.
The pipeline has already produced an ordered result here, but this unconditional post-processing treats the whole top-K window as equally good for every evacuation. With the default
k=3, a failover-reserved host can lose first place to another candidate even when the weigher intentionally ranked it first; the integration helper now pinsEvacuationShuffleK: 1just to keep those expectations stable. If we need fuzziness, it should live in the pipeline (or be limited to truly tied hosts) so we don’t override the weighers’ semantics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/external_scheduler_api.go` around lines 262 - 267, The current unconditional shuffle in the Evacuate branch can reorder a pipeline-chosen preferred host; change it so evacuation fuzziness does not override the weigher: when intent == api.EvacuateIntent only call shuffleTopHosts(hosts, httpAPI.config.EvacuationShuffleK) if the top-K candidates are effectively tied (e.g. hosts[0].Score == hosts[i].Score for some i in 1..k-1 or other existing weight/tie field on the host candidate), and do not shuffle if the first host is uniquely highest-ranked or is the reservation-preferred host; locate the logic around requestData.GetIntent(), api.EvacuateIntent, shuffleTopHosts, and httpAPI.config.EvacuationShuffleK to implement the tie-detection guard and preserve the original ordering otherwise.
🧹 Nitpick comments (1)
internal/scheduling/nova/integration_test.go (1)
286-289: Make the shuffle config opt-in per integration test.Hardcoding
EvacuationShuffleK: 1here means every integration test bypasses the default shuffle path, so this helper can no longer cover the evacuation behavior end to end. Threading the API config throughNewIntegrationTestServer()would keep the order-sensitive cases deterministic while still allowing one integration case with shuffling enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/integration_test.go` around lines 286 - 289, The helper currently hardcodes EvacuationShuffleK: 1 in the httpAPI instantiation which forces all integration tests to bypass shuffling; change NewIntegrationTestServer() to accept an HTTPAPIConfig (or an optional pointer) and thread that config through to the httpAPI creation instead of using a fixed HTTPAPIConfig literal in integration_test.go, defaulting to EvacuationShuffleK: 1 only when no config is provided; update callers of NewIntegrationTestServer() so tests that need the default deterministic behavior omit the config while the one test that must exercise real shuffle passes a config with EvacuationShuffleK set to the desired shuffle value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/scheduling/nova/external_scheduler_api.go`:
- Around line 262-267: The current unconditional shuffle in the Evacuate branch
can reorder a pipeline-chosen preferred host; change it so evacuation fuzziness
does not override the weigher: when intent == api.EvacuateIntent only call
shuffleTopHosts(hosts, httpAPI.config.EvacuationShuffleK) if the top-K
candidates are effectively tied (e.g. hosts[0].Score == hosts[i].Score for some
i in 1..k-1 or other existing weight/tie field on the host candidate), and do
not shuffle if the first host is uniquely highest-ranked or is the
reservation-preferred host; locate the logic around requestData.GetIntent(),
api.EvacuateIntent, shuffleTopHosts, and httpAPI.config.EvacuationShuffleK to
implement the tie-detection guard and preserve the original ordering otherwise.
---
Nitpick comments:
In `@internal/scheduling/nova/integration_test.go`:
- Around line 286-289: The helper currently hardcodes EvacuationShuffleK: 1 in
the httpAPI instantiation which forces all integration tests to bypass
shuffling; change NewIntegrationTestServer() to accept an HTTPAPIConfig (or an
optional pointer) and thread that config through to the httpAPI creation instead
of using a fixed HTTPAPIConfig literal in integration_test.go, defaulting to
EvacuationShuffleK: 1 only when no config is provided; update callers of
NewIntegrationTestServer() so tests that need the default deterministic behavior
omit the config while the one test that must exercise real shuffle passes a
config with EvacuationShuffleK set to the desired shuffle value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7a032db1-caff-4489-ba54-560b771f9e67
📒 Files selected for processing (3)
internal/scheduling/nova/external_scheduler_api.gointernal/scheduling/nova/external_scheduler_api_test.gointernal/scheduling/nova/integration_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/scheduling/nova/external_scheduler_api_test.go
No description provided.