Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCommitments API now registers a Prometheus monitor and records request-level metrics; change-commitments handler centralizes status-code handling and records metrics on all exit paths. Logging was refactored to use context-derived loggers. Five warning-level Prometheus alerts for the committed-resource API were added. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPAPI
participant Monitor
participant Reservations
Client->>HTTPAPI: POST /change-commitments (request)
HTTPAPI->>HTTPAPI: parse request, set startTime, derive requestID/context
HTTPAPI->>Reservations: processCommitmentChanges(ctx, req)
Reservations-->>HTTPAPI: result or error (success, version mismatch, caches not ready, timeout)
HTTPAPI->>Monitor: recordMetrics(statusCode, duration, commitmentCount, result)
HTTPAPI-->>Client: HTTP response (statusCode, body)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/scheduling/reservations/commitments/api_change_commitments.go (1)
64-70:⚠️ Potential issue | 🟠 MajorPreserve
r.Context()when creating the request-scoped context.Using
context.Background()here drops client cancellation, server deadlines, and any request-scoped values. Because this handler keepsapi.changeMutexlocked whilewatchReservationsUntilReady()waits onctx.Done()(lines 265–408), a disconnected or timed-out request can keep the entire change-commitments API serialized until the watch timeout expires.🔧 Minimal fix
- ctx := reservations.WithGlobalRequestID(context.Background(), "committed-resource-"+requestID) + ctx := reservations.WithGlobalRequestID(r.Context(), "committed-resource-"+requestID)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments.go` around lines 64 - 70, The handler creates a new context with reservations.WithGlobalRequestID using context.Background(), which drops the incoming request's cancellation and deadlines and can block api.changeMutex while watchReservationsUntilReady waits on ctx.Done(); change the base context to the incoming request context by calling reservations.WithGlobalRequestID(r.Context(), "committed-resource-"+requestID) so request cancellation and deadlines propagate; ensure LoggerFromContext continues to use that returned ctx so logging and downstream calls (e.g., watchReservationsUntilReady and any api.changeMutex-protected code) observe request cancellation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml`:
- Around line 296-298: The alert CortexNovaCommittedResourceLatencyTooHigh is
computing p95 per status_code; instead aggregate histogram buckets across
status_code before calling histogram_quantile so you get an overall p95. Replace
the current expr that directly calls histogram_quantile on
cortex_committed_resource_change_api_request_duration_seconds_bucket{service="cortex-nova-metrics"}
with an expression that first sums the buckets by the histogram "le" label
(e.g., sum by (le)(...)[5m] with rate applied) and then pass that summed series
into histogram_quantile(0.95, ...), keeping the same threshold and for duration.
- Around line 311-315: The alert CortexNovaCommittedResourceRejectionRateTooHigh
uses cortex_committed_resource_change_api_commitment_changes_total with a
numerator filtered by result="rejected" but the denominator unfiltered, causing
vector mismatch and effectively computing rejected/rejected; change the
expression to aggregate away the result label on both sides (e.g., use
sum(rate(...[1h])) grouped by the same labels like service or use sum
without(result)) so you compute sum(rate(...{result="rejected"}[1h])) /
sum(rate(...[1h])) > 0.5 with identical grouping, ensuring the denominator
represents total commits rather than only rejected ones.
---
Outside diff comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 64-70: The handler creates a new context with
reservations.WithGlobalRequestID using context.Background(), which drops the
incoming request's cancellation and deadlines and can block api.changeMutex
while watchReservationsUntilReady waits on ctx.Done(); change the base context
to the incoming request context by calling
reservations.WithGlobalRequestID(r.Context(), "committed-resource-"+requestID)
so request cancellation and deadlines propagate; ensure LoggerFromContext
continues to use that returned ctx so logging and downstream calls (e.g.,
watchReservationsUntilReady and any api.changeMutex-protected code) observe
request cancellation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3172b0d7-e30c-42c4-a228-69705d1f4794
📒 Files selected for processing (15)
cmd/main.gohelm/bundles/cortex-nova/alerts/nova.alerts.yamlinternal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_metrics.gointernal/scheduling/reservations/commitments/api_change_commitments_monitor.gointernal/scheduling/reservations/commitments/api_change_commitments_monitor_test.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/api_info.gointernal/scheduling/reservations/commitments/api_report_capacity.gointernal/scheduling/reservations/commitments/client.gointernal/scheduling/reservations/commitments/context.gointernal/scheduling/reservations/commitments/controller.gointernal/scheduling/reservations/commitments/state.gointernal/scheduling/reservations/commitments/syncer.go
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/scheduling/reservations/commitments/api_change_commitments.go (1)
97-97: Inconsistent use ofstatusCodevariable.For consistency with the rest of the function, use
w.WriteHeader(statusCode)instead of the hardcodedhttp.StatusOK.♻️ Proposed fix
- w.WriteHeader(http.StatusOK) + w.WriteHeader(statusCode)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments.go` at line 97, The response currently calls w.WriteHeader(http.StatusOK) which is inconsistent with the rest of the handler that computes and uses the variable statusCode; change that call to use w.WriteHeader(statusCode) instead (ensure the existing statusCode variable is used and correctly set earlier in the function so the response status remains consistent with prior logic where statusCode is determined).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@internal/scheduling/reservations/commitments/api_change_commitments_monitor_test.go`:
- Around line 160-213: Move the helper functions newCommitmentRequest,
createCommitment, and buildRequestJSON from api_change_commitments_test.go into
this test file so TestCountCommitments can call them locally; ensure you copy
their exact signatures and any dependent types (e.g., CommitmentChangeRequest,
commitment struct fields) and adjust imports if necessary, then remove the
external references so TestCountCommitments and the countCommitments call
compile without relying on functions defined in another test file.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 109-114: Replace fragile substring checks on err.Error() with
explicit error types or sentinel errors and use errors.Is or a typed error
interface to map to HTTP codes; for example define sentinel errors like
ErrVersionMismatch and ErrCachesNotReady (or implement an error type with a
StatusCode() int) and then in the code that currently checks
strings.Contains(err.Error(), "version mismatch") / "caches not ready" perform
errors.Is(err, ErrVersionMismatch) / errors.Is(err, ErrCachesNotReady) or cast
to the typed error and read StatusCode() to set statusCode to
http.StatusConflict or http.StatusServiceUnavailable respectively, updating the
logic around the err variable where statusCode is derived.
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Line 97: The response currently calls w.WriteHeader(http.StatusOK) which is
inconsistent with the rest of the handler that computes and uses the variable
statusCode; change that call to use w.WriteHeader(statusCode) instead (ensure
the existing statusCode variable is used and correctly set earlier in the
function so the response status remains consistent with prior logic where
statusCode is determined).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: df4da2e8-f521-46fd-9d2f-b018a3d246af
📒 Files selected for processing (4)
helm/bundles/cortex-nova/alerts/nova.alerts.yamlinternal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_monitor.gointernal/scheduling/reservations/commitments/api_change_commitments_monitor_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- helm/bundles/cortex-nova/alerts/nova.alerts.yaml
| func TestCountCommitments(t *testing.T) { | ||
| testCases := []struct { | ||
| name string | ||
| request CommitmentChangeRequest | ||
| expected int | ||
| }{ | ||
| { | ||
| name: "Single commitment", | ||
| request: newCommitmentRequest("az-a", false, 1234, | ||
| createCommitment("ram_hana_1", "project-A", "uuid-1", "confirmed", 2), | ||
| ), | ||
| expected: 1, | ||
| }, | ||
| { | ||
| name: "Multiple commitments same project", | ||
| request: newCommitmentRequest("az-a", false, 1234, | ||
| createCommitment("ram_hana_1", "project-A", "uuid-1", "confirmed", 2), | ||
| createCommitment("ram_hana_2", "project-A", "uuid-2", "confirmed", 2), | ||
| ), | ||
| expected: 2, | ||
| }, | ||
| { | ||
| name: "Multiple commitments multiple projects", | ||
| request: newCommitmentRequest("az-a", false, 1234, | ||
| createCommitment("ram_hana_1", "project-A", "uuid-1", "confirmed", 2), | ||
| createCommitment("ram_hana_1", "project-B", "uuid-2", "confirmed", 3), | ||
| createCommitment("ram_gp_1", "project-C", "uuid-3", "confirmed", 1), | ||
| ), | ||
| expected: 3, | ||
| }, | ||
| { | ||
| name: "Empty request", | ||
| request: newCommitmentRequest("az-a", false, 1234), | ||
| expected: 0, | ||
| }, | ||
| } | ||
|
|
||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| // Convert test request to liquid request | ||
| reqJSON := buildRequestJSON(tc.request) | ||
| var req liquid.CommitmentChangeRequest | ||
| if err := json.Unmarshal([]byte(reqJSON), &req); err != nil { | ||
| t.Fatalf("Failed to parse request: %v", err) | ||
| } | ||
|
|
||
| result := countCommitments(req) | ||
|
|
||
| if result != tc.expected { | ||
| t.Errorf("Expected %d commitments, got %d", tc.expected, result) | ||
| } | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for helper function definitions in the commitments package test files
rg -n "func (newCommitmentRequest|createCommitment|buildRequestJSON)" internal/scheduling/reservations/commitments/Repository: cobaltcore-dev/cortex
Length of output: 616
Move helper functions into the test file.
The helper functions newCommitmentRequest, createCommitment, and buildRequestJSON are defined in api_change_commitments_test.go but used in this file. Per coding guidelines, keep helper functions in the same file as the tests that use them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@internal/scheduling/reservations/commitments/api_change_commitments_monitor_test.go`
around lines 160 - 213, Move the helper functions newCommitmentRequest,
createCommitment, and buildRequestJSON from api_change_commitments_test.go into
this test file so TestCountCommitments can call them locally; ensure you copy
their exact signatures and any dependent types (e.g., CommitmentChangeRequest,
commitment struct fields) and adjust imports if necessary, then remove the
external references so TestCountCommitments and the countCommitments call
compile without relying on functions defined in another test file.
| // Determine status code from error context (409 or 503) | ||
| if strings.Contains(err.Error(), "version mismatch") { | ||
| statusCode = http.StatusConflict | ||
| } else if strings.Contains(err.Error(), "caches not ready") { | ||
| statusCode = http.StatusServiceUnavailable | ||
| } |
There was a problem hiding this comment.
Fragile status code derivation via error string matching.
Determining HTTP status codes by inspecting error message substrings is brittle—if the error messages change, this logic silently breaks. Consider using typed errors or sentinel errors to convey the status code.
💡 Suggested approach using sentinel errors
+var (
+ errVersionMismatch = errors.New("version mismatch")
+ errCachesNotReady = errors.New("caches not ready")
+)
// In processCommitmentChanges, return the sentinel error:
-return errors.New("version mismatch")
+return errVersionMismatch
// In HandleChangeCommitments:
-if strings.Contains(err.Error(), "version mismatch") {
- statusCode = http.StatusConflict
-} else if strings.Contains(err.Error(), "caches not ready") {
- statusCode = http.StatusServiceUnavailable
-}
+if errors.Is(err, errVersionMismatch) {
+ statusCode = http.StatusConflict
+} else if errors.Is(err, errCachesNotReady) {
+ statusCode = http.StatusServiceUnavailable
+}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`
around lines 109 - 114, Replace fragile substring checks on err.Error() with
explicit error types or sentinel errors and use errors.Is or a typed error
interface to map to HTTP codes; for example define sentinel errors like
ErrVersionMismatch and ErrCachesNotReady (or implement an error type with a
StatusCode() int) and then in the code that currently checks
strings.Contains(err.Error(), "version mismatch") / "caches not ready" perform
errors.Is(err, ErrVersionMismatch) / errors.Is(err, ErrCachesNotReady) or cast
to the typed error and read StatusCode() to set statusCode to
http.StatusConflict or http.StatusServiceUnavailable respectively, updating the
logic around the err variable where statusCode is derived.
Test Coverage ReportTest Coverage 📊: 68.5% |
No description provided.