KAI Scheduler is built around key concepts that work together to make scheduling decisions. This document explains these concepts for developers working with the scheduler.
The scheduler runs in cycles. Each cycle takes a snapshot of the cluster state and makes scheduling decisions through a series of actions.
The scheduler runs in periodic cycles (configurable via schedulePeriod). Each cycle follows this flow:
flowchart LR
Start([Cycle Start]) --> Cache[Cache Sync]
Cache --> Snapshot[Take Snapshot]
Snapshot --> Session[Open Session]
Session --> Actions[Execute Actions]
Actions --> Close[Close Session]
Close --> End([Cycle End])
style Start fill:#f5f5f5,stroke:#333
style End fill:#f5f5f5,stroke:#333
style Snapshot fill:#d4f1f9,stroke:#333
style Session fill:#d5f5e3,stroke:#333
style Actions fill:#fcf3cf,stroke:#333
- Cache Sync: Ensure all Kubernetes resource informers are up-to-date
- Snapshot: Capture point-in-time cluster state
- Session: Create scheduling context with snapshot data
- Actions: Execute scheduling actions in sequence (Allocate → Consolidate → Reclaim → Preempt → StaleGangEviction)
- Each action processes jobs individually, creating and committing/discarding statements per job
- Session Close: Clean up and prepare for next cycle
The Cache serves as the authoritative source of cluster state, built from Kubernetes API informers.
- Data Collection: Aggregate information from multiple API resources
- State Maintenance: Keep track of resource changes over time
- Snapshot Generation: Create consistent point-in-time views
- Change Propagation: Apply committed scheduling decisions back to cluster
A Snapshot captures the cluster state at the start of each scheduling cycle.
Snapshots capture all the cluster resources and state information needed for scheduling decisions, including pods, nodes, queues, pod groups, bind requests, and other relevant Kubernetes objects.
For detailed information about snapshots and the snapshot plugin, see Snapshot Plugin.
- Consistency: All scheduling decisions in a cycle are based on the same cluster state
- Performance: Avoids repeated API calls during scheduling
- Debugging: Provides reproducible state for analysis
PodGroups define gang scheduling requirements for workloads, specifying how multiple pods should be scheduled together.
PodGroups are automatically created by the pod-grouper component based on workload types and can specify minimum member requirements, queue assignments, and priority classes.
For detailed information about PodGroup creation and gang scheduling, see Pod Grouper.
The scheduler implements a hierarchical queue system for resource management and fair sharing. Queues represent logical resource containers with quotas, priorities, and limits.
For detailed information, see Scheduling Queues and Fairness.
A Session represents the scheduling context for a single cycle. It contains the snapshot data, plugin callbacks, and provides the framework for scheduling operations.
- State Management: Maintains consistent view of cluster during cycle
- Plugin Coordination: Provides extension points for plugin callbacks
- Statement Factory: Creates Statement objects for actions to use
- Resource Accounting: Tracks resource allocations and usage
For detailed information about session implementation, lifecycle, and plugin integration, see Plugin Framework.
Actions are discrete scheduling operations executed in sequence during each cycle. Each action operates on the session's snapshot data and uses statements to ensure atomicity.
For detailed information about action types, execution order, and implementation details, see Action Framework.
The scheduler uses a plugin-based architecture that allows extending functionality through various extension points. Plugins register callbacks during session lifecycle to influence scheduling behavior.
For detailed information about plugin development, extension points, and examples, see Plugin Framework.
Statements provide a transaction-like mechanism for scheduling operations, allowing changes to be grouped and either committed or rolled back as a unit. Actions use statements to ensure atomicity when making scheduling decisions. Additionally, statements simulate scheduling scenarios in-memory, enabling evaluation of potential changes before they are committed.
For detailed statement operations and usage patterns, see Action Framework - Statements.
Scenarios represent hypothetical scheduling states used to evaluate potential decisions before committing them. They enable "what-if" modeling and validation of scheduling operations.
For detailed scenario implementation and validation mechanisms, see Action Framework - Scenarios.
BindRequests facilitate communication between the scheduler and binder components. When the scheduler decides where a pod should run, it creates a BindRequest containing the pod, selected node, and resource allocation details.
The binder processes BindRequests asynchronously, handling the actual pod binding and any required resource setup such as volume mounting or dynamic resource allocation.
For detailed information about the binding process and BindRequest lifecycle, see Binder.
- Action Framework - Detailed action implementation
- Plugin Framework - Plugin development guide
- Binder - Pod binding process
- Pod Grouper - Gang scheduling implementation
- Snapshot Plugin - Snapshot capture and analysis tools
- Scheduling Queues - Queue configuration and management
- Fairness - Resource fairness and distribution algorithms