Open
Conversation
424a599 to
336c408
Compare
- Only return executor ids that have changed state. - Propagate the shutdown token to allow graceful shutdown of the pending tasks.
This is more explicit.
…ontainers Cordoned (SchedulingDisabled) executors had containers counted as available capacity, causing under-reported deficits. Warm containers also sat stranded with no path to termination or replacement. - Mark affected pools dirty on cordon so buffer reconciler re-evaluates them - Exclude cordoned-executor containers from pool_container_count, count_active_idle_containers, and count_pool_containers - Add Phase 0 drain in buffer reconciler to terminate idle/warm containers on cordoned executors before creating replacements on healthy ones Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
76d1a27 to
eb22aec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
We want to be able to disable scheduling in specific executors. This is useful during rollout operations where we might have executors live longer and we don't want the scheduler to schedule any work on them even if they remain alive.
What
Cordon API
Add a new internal api to cordon executors under
/internal/cordon-executors.This api can receive a list of specific executors to cordon, or cordon all present executors if the list if empty.
When the scheduler receives this request, it will propagate a new state to all executors in the request.
The request tries to wait until all executors have received the state update by spawning multiple tasks waiting for state updates in the executor.
Deficit API & warm container reallocation for cordoned executors
When an executor is cordoned, two things were broken:
Deficit API under-reported:
count_active_idle_containers(function pools) andcount_pool_containers(sandbox pools) counted containers on cordoned executors as available capacity. Since those containers can't accept new work, the deficit was too low.Warm containers were stranded: The
CordonExecutorshandler only set executor state — it didn't mark affected pools dirty or terminate warm containers. They sat on the cordoned executor consuming resources but couldn't be used.Changes:
pool_container_count,count_active_idle_containers, andcount_pool_containersTesting
Contribution Checklist
just fmtto format the code.