Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ func main() {
// Initialize commitments API for LIQUID interface (with Nova client for usage reporting)
commitmentsConfig := conf.GetConfigOrDie[commitments.Config]()
commitmentsAPI := commitments.NewAPIWithConfig(multiclusterClient, commitmentsConfig, novaClient)
commitmentsAPI.Init(mux, metrics.Registry)
commitmentsAPI.Init(mux, metrics.Registry, ctrl.Log.WithName("commitments-api"))

deschedulingsController := &nova.DetectorPipelineController{
Monitor: detectorPipelineMonitor,
Expand Down
104 changes: 104 additions & 0 deletions docs/reservations/committed-resource-reservations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Committed Resource Reservation System

The committed resource reservation system manages capacity commitments, i.e. strict reservation guarantees usable by projects.
When customers pre-commit to resource usage, Cortex reserves capacity on hypervisors to guarantee availability.
The system integrates with Limes (via the LIQUID protocol) to receive commitments, expose usage and capacity data, and provides acceptance/rejection feedback.

## File Structure

```text
internal/scheduling/reservations/commitments/
├── config.go # Configuration (intervals, API flags, secrets)
├── controller.go # Reconciliation of reservations
├── syncer.go # Periodic sync task with Limes, ensures local state matches Limes' commitments
├── reservation_manager.go # Reservation CRUD operations
├── api.go # HTTP API initialization
├── api_change_commitments.go # Handle commitment changes from Limes and updates local reservations accordingly
├── api_report_usage.go # Report VM usage per project, accounting to commitments or PAYG
├── api_report_capacity.go # Report capacity per AZ
├── api_info.go # Readiness endpoint with versioning (of underlying flavor group configuration)
├── capacity.go # Capacity calculation from Hypervisor CRDs
├── usage.go # VM-to-commitment assignment logic
├── flavor_group_eligibility.go # Validates VMs belong to correct flavor groups
└── state.go # Commitment state helper functions
```

## Operations

### Configuration

| Helm Value | Description |
|------------|-------------|
| `committedResourceEnableChangeCommitmentsAPI` | Enable/disable the change-commitments endpoint |
| `committedResourceEnableReportUsageAPI` | Enable/disable the usage reporting endpoint |
| `committedResourceEnableReportCapacityAPI` | Enable/disable the capacity reporting endpoint |
| `committedResourceRequeueIntervalActive` | How often to revalidate active reservations |
| `committedResourceRequeueIntervalRetry` | Retry interval when knowledge not ready |
| `committedResourceChangeAPIWatchReservationsTimeout` | Timeout waiting for reservations to become ready while processing commitment changes via API |
| `committedResourcePipelineDefault` | Default scheduling pipeline |
| `committedResourceFlavorGroupPipelines` | Map of flavor group to pipeline name |
| `committedResourceSyncInterval` | How often the syncer reconciles Limes commitments to Reservation CRDs |

Each API endpoint can be disabled independently. The periodic sync task can be disabled by removing it (`commitments-sync-task`) from the list of enabled tasks in the `cortex-nova` Helm chart.

### Observability

Alerts and metrics are defined in `helm/bundles/cortex-nova/alerts/nova.alerts.yaml`. Key metric prefixes:
- `cortex_committed_resource_change_api_*` - Change API metrics
- `cortex_committed_resource_usage_api_*` - Usage API metrics
- `cortex_committed_resource_capacity_api_*` - Capacity API metrics

## Architecture Overview

```mermaid
flowchart LR
subgraph State
Res[(Reservation CRDs)]
end

ChangeAPI[Change API]
UsageAPI[Usage API]
Syncer[Syncer Task]
Controller[Controller]
Scheduler[Scheduler API]

ChangeAPI -->|CRUD| Res
Syncer -->|CRUD| Res
UsageAPI -->|read| Res
Res -->|watch| Controller
Controller -->|update spec/status| Res
Controller -->|placement request| Scheduler
```

Reservations are managed through the Change API, Syncer Task, and Controller reconciliation. The Usage API provides read-only access to report usage data back to Limes.

### Change-Commitments API

The change-commitments API receives batched commitment changes from Limes. A request can contain multiple commitment changes across different projects and flavor groups. The semantic is **all-or-nothing**: if any commitment in the batch cannot be fulfilled (e.g., insufficient capacity), the entire request is rejected and rolled back.

Cortex performs CRUD operations on local Reservation CRDs to match the new desired state:
- Creates new reservations for increased commitment amounts
- Deletes existing reservations
- Cortex preserves existing reservations that already have VMs allocated when possible

### Syncer Task

The syncer task runs periodically and fetches all commitments from Limes. It syncs the local Reservation CRD state to match Limes' view of commitments.

### Controller (Reconciliation)

The controller watches Reservation CRDs and performs reconciliation:

1. **For new reservations** (no target host assigned):
- Calls Cortex for scheduling to find a suitable host
- Assigns the target host and marks the reservation as Ready

2. **For existing reservations** (already have a target host):
- Validates that allocated VMs are still on the expected host
- Updates allocations if VMs have migrated or been deleted
- Requeues for periodic revalidation

### Usage API

This API reports for a given project the total committed resources and usage per flavor group. For each VM, it reports whether the VM accounts to a specific commitment or PAYG. This assignment is deterministic and may differ from the actual Cortex internal assignment used for scheduling.

6 changes: 6 additions & 0 deletions helm/bundles/cortex-nova/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,12 @@ cortex-scheduling-controllers:
# Whether the change-commitments API endpoint is active
# When false, the endpoint returns HTTP 503. The info endpoint remains available.
committedResourceEnableChangeCommitmentsAPI: true
# Whether the report-usage API endpoint is active
# When false, the endpoint returns HTTP 503.
committedResourceEnableReportUsageAPI: true
# Whether the report-capacity API endpoint is active
# When false, the endpoint returns HTTP 503.
committedResourceEnableReportCapacityAPI: true
# OvercommitMappings is a list of mappings that map hypervisor traits to
# overcommit ratios. Note that this list is applied in order, so if there
# are multiple mappings applying to the same hypervisors, the last mapping
Expand Down
8 changes: 7 additions & 1 deletion internal/scheduling/reservations/commitments/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"sync"

"github.com/cobaltcore-dev/cortex/internal/scheduling/nova"
"github.com/go-logr/logr"
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/client"
)
Expand Down Expand Up @@ -46,12 +47,17 @@ func NewAPIWithConfig(client client.Client, config Config, novaClient UsageNovaC
}
}

func (api *HTTPAPI) Init(mux *http.ServeMux, registry prometheus.Registerer) {
func (api *HTTPAPI) Init(mux *http.ServeMux, registry prometheus.Registerer, log logr.Logger) {
registry.MustRegister(&api.monitor)
registry.MustRegister(&api.usageMonitor)
registry.MustRegister(&api.capacityMonitor)
mux.HandleFunc("/v1/commitments/change-commitments", api.HandleChangeCommitments)
mux.HandleFunc("/v1/commitments/report-capacity", api.HandleReportCapacity)
mux.HandleFunc("/v1/commitments/info", api.HandleInfo)
mux.HandleFunc("/v1/commitments/projects/", api.HandleReportUsage) // matches /v1/commitments/projects/:project_id/report-usage

log.Info("commitments API initialized",
"changeCommitmentsEnabled", api.config.EnableChangeCommitmentsAPI,
"reportUsageEnabled", api.config.EnableReportUsageAPI,
"reportCapacityEnabled", api.config.EnableReportCapacityAPI)
}
Original file line number Diff line number Diff line change
Expand Up @@ -997,7 +997,7 @@ func newCommitmentTestEnv(
}
mux := http.NewServeMux()
registry := prometheus.NewRegistry()
api.Init(mux, registry)
api.Init(mux, registry, log.Log)
httpServer := httptest.NewServer(mux)

env.HTTPServer = httpServer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -540,7 +540,7 @@ func newUsageTestEnv(
api := NewAPIWithConfig(k8sClient, DefaultConfig(), novaClient)
mux := http.NewServeMux()
registry := prometheus.NewRegistry()
api.Init(mux, registry)
api.Init(mux, registry, log.Log)
httpServer := httptest.NewServer(mux)

return &UsageTestEnv{
Expand Down
12 changes: 10 additions & 2 deletions internal/scheduling/reservations/commitments/usage.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ func (c *UsageCalculator) CalculateUsage(
return liquid.ServiceUsageReport{}, fmt.Errorf("failed to get flavor groups: %w", err)
}

// Get info version from Knowledge CRD (used by Limes to detect metadata changes)
var infoVersion int64 = -1
if knowledgeCRD, err := knowledge.Get(ctx); err == nil && knowledgeCRD != nil && !knowledgeCRD.Status.LastContentChange.IsZero() {
infoVersion = knowledgeCRD.Status.LastContentChange.Unix()
}

// Step 2: Build commitment capacity map from K8s Reservation CRDs
commitmentsByAZFlavorGroup, err := c.buildCommitmentCapacityMap(ctx, log, projectID)
if err != nil {
Expand All @@ -80,7 +86,7 @@ func (c *UsageCalculator) CalculateUsage(
vmAssignments, assignedToCommitments := c.assignVMsToCommitments(vms, commitmentsByAZFlavorGroup)

// Step 5: Build the response
report := c.buildUsageResponse(vms, vmAssignments, flavorGroups, allAZs)
report := c.buildUsageResponse(vms, vmAssignments, flavorGroups, allAZs, infoVersion)

log.Info("completed usage report",
"projectID", projectID,
Expand Down Expand Up @@ -336,6 +342,7 @@ func (c *UsageCalculator) buildUsageResponse(
vmAssignments map[string]string,
flavorGroups map[string]compute.FlavorGroupFeature,
allAZs []liquid.AvailabilityZone,
infoVersion int64,
) liquid.ServiceUsageReport {
// Initialize resources map for flavor groups that accept commitments
resources := make(map[liquid.ResourceName]*liquid.ResourceUsageReport)
Expand Down Expand Up @@ -420,7 +427,8 @@ func (c *UsageCalculator) buildUsageResponse(
}

return liquid.ServiceUsageReport{
Resources: resources,
InfoVersion: infoVersion,
Resources: resources,
}
}

Expand Down
Loading