Skip to content

Executor scheduling disabled#2142

Open
calavera wants to merge 7 commits intomainfrom
executor_scheduling_disabled
Open

Executor scheduling disabled#2142
calavera wants to merge 7 commits intomainfrom
executor_scheduling_disabled

Conversation

@calavera
Copy link
Contributor

@calavera calavera commented Feb 17, 2026

Context

We want to be able to disable scheduling in specific executors. This is useful during rollout operations where we might have executors live longer and we don't want the scheduler to schedule any work on them even if they remain alive.

What

Cordon API

Add a new internal api to cordon executors under /internal/cordon-executors.
This api can receive a list of specific executors to cordon, or cordon all present executors if the list if empty.
When the scheduler receives this request, it will propagate a new state to all executors in the request.
The request tries to wait until all executors have received the state update by spawning multiple tasks waiting for state updates in the executor.

Deficit API & warm container reallocation for cordoned executors

When an executor is cordoned, two things were broken:

  1. Deficit API under-reported: count_active_idle_containers (function pools) and count_pool_containers (sandbox pools) counted containers on cordoned executors as available capacity. Since those containers can't accept new work, the deficit was too low.

  2. Warm containers were stranded: The CordonExecutors handler only set executor state — it didn't mark affected pools dirty or terminate warm containers. They sat on the cordoned executor consuming resources but couldn't be used.

Changes:

  • Mark affected pools dirty on cordon so the buffer reconciler re-evaluates them
  • Exclude cordoned-executor containers from pool_container_count, count_active_idle_containers, and count_pool_containers
  • Add Phase 0 drain in buffer reconciler to terminate idle/warm containers on cordoned executors, then let existing phases create replacements on healthy executors

Testing

  • Integration tests for the cordon-executors API
  • All 239 existing unit tests pass

Contribution Checklist

  • I ran just fmt to format the code.
  • All PR Checks are passing.

calavera and others added 7 commits February 26, 2026 15:39
- Only return executor ids that have changed state.
- Propagate the shutdown token to allow graceful shutdown of the pending tasks.
…ontainers

Cordoned (SchedulingDisabled) executors had containers counted as available
capacity, causing under-reported deficits. Warm containers also sat stranded
with no path to termination or replacement.

- Mark affected pools dirty on cordon so buffer reconciler re-evaluates them
- Exclude cordoned-executor containers from pool_container_count,
  count_active_idle_containers, and count_pool_containers
- Add Phase 0 drain in buffer reconciler to terminate idle/warm containers
  on cordoned executors before creating replacements on healthy ones

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@calavera calavera force-pushed the executor_scheduling_disabled branch from 76d1a27 to eb22aec Compare February 26, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants