Add respectNodePodLimits scheduler flag to enforce per-node pod capacity by dejanzele · Pull Request #4841 · armadaproject/armada

dejanzele · 2026-04-16T11:31:39Z

Summary

Adds scheduling.respectNodePodLimits feature flag (default false) that enables the scheduler to track pods as a resource and reject scheduling to nodes that have exhausted their pod limit (node.Status.Allocatable["pods"])
When enabled, the scheduler programmatically registers pods in supportedResourceTypes and indexedResources at startup, and injects pods: 1 into every job's internal resource requirements
The executor now always reports non-Armada pod count in NonArmadaAllocatedResources so the scheduler can subtract system/DaemonSet pods from available capacity

Fixes #4515

This PR builds on top of #4517 and big thanks for the initial work to @Sovietaced

Operator upgrade notes

Executor change is unconditional. After the executor upgrade, NonArmadaAllocatedResources gains a pods key in every report regardless of whether any scheduler has the flag enabled. Dashboards, metrics, or custom consumers that iterate this map generically (e.g. sum over all keys) will start including pod counts. Audit prometheus / Grafana panels before rollout.
Rollback is clean. Reverting the scheduler flag to false stops the scheduler from tracking pods; reverting the executor binary removes the pods key from its reports. Neither requires data migration.
Rolling upgrade order is flexible. Old scheduler + new executor is safe (scheduler's FromNodeProto silently drops unknown resources). New scheduler + old executor is safe (only non-Armada pod accounting is slightly pessimistic until executors are upgraded).

Known limitations

pods is not added to dominantResourceFairnessResourcesToConsider. On dense-pod nodes (e.g. GKE's 110-pod limit) a queue running many small pods can monopolize pod slots without a fair-share penalty. Deferred per reviewer request; follow-up if this becomes a problem in practice.

greptile-apps · 2026-04-16T11:37:15Z

Greptile Summary

This PR introduces a respectNodePodLimits scheduler flag (default false) that enforces per-node pod capacity by registering pods as a tracked resource, injecting pods: 1 into every job's requirements at runtime, and having the executor unconditionally report non-Armada pod counts in NonArmadaAllocatedResources. The implementation is correct end-to-end: ApplyRespectNodePodLimits is called before ResourceListFactory construction in both schedulerapp.go and simulator.go, mutation of the requirements map in getResourceRequirements is safe (backed by K8sResourceListToMap's fresh copy), Clone() propagates the flag, and the eviction/unbind round-trip correctly restores pod slots.

Confidence Score: 5/5

Safe to merge; no blocking issues found across all changed files.

All changes are well-structured and correct: the resource injection is idempotent, the factory/jobDb initialization order is right in every entrypoint, and the test suite covers flag-off, flag-on, resolution normalization, factory-lacks-pods, and the eviction round-trip.

No files require special attention.

Important Files Changed

Filename	Overview
internal/scheduler/configuration/configuration.go	Adds RespectNodePodLimits flag and ApplyRespectNodePodLimits helper that idempotently registers pods with resolution 1 in both SupportedResourceTypes and IndexedResources.
internal/scheduler/jobdb/jobdb.go	Adds respectNodePodLimits field with setter and correct Clone() copy; getResourceRequirements safely injects pods: 1 into a new-map copy from safeGetRequirements.
internal/executor/utilisation/cluster_utilisation.go	Injects pods: 1 into each non-Armada pod's resource request so the scheduler can subtract system pod count from available capacity; mutation is safe since TotalPodResourceRequest returns a new map.
internal/scheduler/schedulerapp.go	Calls ApplyRespectNodePodLimits before NewResourceListFactory and sets SetRespectNodePodLimits on jobDb after creation; ordering is correct.
internal/scheduler/nodedb/respect_node_pod_limits_test.go	New end-to-end test verifying the eviction/unbind round-trip restores the pod slot and allows a follow-up job to bind.

Sequence Diagram

sequenceDiagram
    participant Exec as Executor
    participant Sched as Scheduler
    participant NodeDB as NodeDb
    participant JobDB as JobDb
    Note over Sched: Startup
    Sched->>Sched: ApplyRespectNodePodLimits adds pods to factory config
    Sched->>JobDB: SetRespectNodePodLimits(true)
    Note over Exec: Per heartbeat
    Exec->>Sched: NodeInfo{TotalResources[pods]=110, NonArmadaAllocated[pods]=5}
    Note over Sched: Node ingestion
    Sched->>NodeDB: AllocatableByPriority[p][pods] = 105
    Note over Sched: Scheduling loop
    Sched->>JobDB: NewJob injects pods:1
    Sched->>NodeDB: BindJobToNode → pods -= 1
    Note over Sched: Eviction
    Sched->>NodeDB: UnbindJobFromNode → pods += 1

_{Reviews (7): Last reviewed commit: "Add respectNodePodLimits scheduler flag ..." | Re-trigger Greptile}

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele · 2026-04-17T13:58:15Z

@greptileai

greptile-apps bot reviewed Apr 16, 2026

View reviewed changes

Comment thread internal/scheduler/jobdb/jobdb_test.go Outdated

Comment thread internal/executor/utilisation/cluster_utilisation.go Outdated

dejanzele force-pushed the respect-node-pod-limits branch 7 times, most recently from e95dd38 to 4faa4a9 Compare April 17, 2026 13:52

Add respectNodePodLimits scheduler flag to enforce per-node pod capacity

57a9176

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele force-pushed the respect-node-pod-limits branch from 4faa4a9 to 57a9176 Compare April 17, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add respectNodePodLimits scheduler flag to enforce per-node pod capacity#4841

Add respectNodePodLimits scheduler flag to enforce per-node pod capacity#4841
dejanzele wants to merge 1 commit intoarmadaproject:masterfrom
dejanzele:respect-node-pod-limits

dejanzele commented Apr 16, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dejanzele commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dejanzele commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operator upgrade notes

Known limitations

Uh oh!

greptile-apps bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

dejanzele commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dejanzele commented Apr 16, 2026 •

edited

Loading

greptile-apps bot commented Apr 16, 2026 •

edited

Loading