In Kubernetes, assigning different priorities to workloads ensures efficient resource management, minimizes service disruption, and supports better scaling. It allows critical applications to receive the resources they need before less important ones, ensuring high availability and compliance with SLAs. By prioritizing workloads, KAI Scheduler schedules jobs according to their assigned priority. When sufficient resources aren't available for a workload, the scheduler can preempt lower-priority workloads to free up resources for higher-priority ones. This approach ensures that mission-critical services are always prioritized in resource allocation.
KAI scheduler deployment comes with several predefined priority classes:
train(50) - can be used for preemptible training workloadsbuild-preemptible(75) - can be used for preemptible build/interactive workloadsbuild(100) - can be used for build/interactive workloads (non-preemptible)inference(125) - can be used for inference workloads (non-preemptible)
Some workloads have a predefined priorities. But, workload can have a custom priority using one of the following ways:
- By setting
priorityClassNamelabel on the workload instance with the name of the desired priority class. - By setting
priorityClassNamelabel on the workload's pods with the name of the desired priority class. - By setting
pod.Spec.PriorityClassNameon the workload's pods with the name of the desired priority class.
KAI Scheduler supports any PriorityClass deployed in the cluster. A PriorityClass with a value of 100 or higher is considered as non-preemptible. Non preemptible can only consume in-quota resources of the scheduling queue, and cannot go over quota. Read about scheduling queues for more details.
When priorityClass is not provided, KAI Scheduler will use a predefined default priority based on the workload type:
- Inference for K8s Deployment or Knative Service
- Build for Kubeflow Notebook
- Train as the general default If pods from the same workload have different priorities, the workload's priority is derived from any of its pods.
Workload priorities serve three main purposes:
- The scheduler attempts to schedule higher priority workloads first.
- In case of insufficient cluster resources, lower priority workloads can be evicted to prioritize higher priority queues.
- Workloads with
buildorinferencepriorities are not preemptible, hence they can only run within queue quota boundaries.
To limit queue resources, use the following command:
kubectl apply -f example/limited-queue.yaml
It will create a test queue that has a limit of 1 GPU.
To submit a pod with train priority (with a value of 50), use the following command:
kubectl apply -f example/train-priority-pod.yaml
After train-pod is running, submit a pod with build priority (with a value of 100), use the following command:
kubectl apply -f example/build-priority-pod.yaml
Since both pods request 1 GPU, which is the limit of the test queue, only one of the pods will be able to run.
The scheduler will preempt lower priority train-pod and schedule higher priority build-pod instead.