The KAI Scheduler provides a snapshot plugin and tool that allows capturing and analyzing the state of the scheduler and cluster resources. This documentation covers both the snapshot plugin and the snapshot tool.
The snapshot plugin is a framework plugin that provides an HTTP endpoint to capture the current state of the scheduler and cluster resources.
- Captures scheduler configuration and parameters
- Collects raw Kubernetes objects that the scheduler uses to perform its actions including:
- Pods
- Nodes
- Queues
- PodGroups
- BindRequests
- PriorityClasses
- ConfigMaps
- PersistentVolumeClaims
- CSIStorageCapacities
- StorageClasses
- CSIDrivers
- ResourceClaims
- ResourceSlices
- DeviceClasses
The plugin registers an HTTP endpoint /get-snapshot that returns a ZIP file containing a JSON snapshot of the cluster state.
Example for the scheduler pod that is deployed in kai namespace:
kubectl port-forward -n kai deployment/scheduler 8081 &
curl -vv "localhost:8081/get-snapshot" > snapshot.gzip
./bin/snapshot-tool-amd64 --filename snapshot.gzip --verbosity 8The snapshot is returned as a ZIP file containing a single JSON file (snapshot.json) with the following structure:
{
"config": {
// Scheduler configuration
},
"schedulerParams": {
// Scheduler parameters
},
"rawObjects": {
// Raw Kubernetes objects
}
}The snapshot tool is a command-line utility that can load and analyze snapshots captured by the snapshot plugin.
- Loads snapshots from ZIP files
- Recreates the scheduler environment from a snapshot
- Supports running scheduler actions on the snapshot data
- Provides detailed logging of operations
snapshot-tool --filename <snapshot-file> [--verbosity <log-level>]--filename: Path to the snapshot ZIP file (required)--verbosity: Logging verbosity level (default: 4)
# Load and analyze a snapshot
snapshot-tool --filename snapshot.zip
# Load and analyze a snapshot with increased verbosity
snapshot-tool --filename snapshot.zip --verbosity 5The snapshot plugin (pkg/scheduler/plugins/snapshot/snapshot.go) implements the following key components:
RawKubernetesObjects: Structure containing all captured Kubernetes objectsSnapshot: Main structure containing configuration, parameters, and raw objectssnapshotPlugin: Plugin implementation with HTTP endpoint handler
The snapshot tool (cmd/snapshot-tool/main.go) implements:
- Snapshot loading and parsing
- Fake client creation with snapshot data
- Scheduler cache initialization
- Session management
- Action execution
- The snapshot tool runs in a simulated environment
- Some real-time cluster features may not be available
- Resource constraints may differ from the original cluster