Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions api/v1alpha1/proposal_analysis_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,6 @@ type ProposalResult struct {
// Must be one of: Reversible, Irreversible, Partial.
// +optional
Reversible Reversibility `json:"reversible,omitempty"`
// estimatedImpact is a Markdown-formatted description of the expected
// impact of the remediation on the system
// (e.g., "Brief pod restart, ~30s downtime").
// Maximum 1024 characters.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=1024
EstimatedImpact string `json:"estimatedImpact,omitempty"`
// rollbackPlan describes how to undo the remediation if execution fails
// or causes unexpected issues. Only the execution step mutates cluster
// state, so rollback lives here alongside the actions it would undo.
Expand Down
10 changes: 0 additions & 10 deletions config/crd/bases/agentic.openshift.io_analysisresults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -243,15 +243,6 @@ spec:
maxLength: 8192
minLength: 1
type: string
estimatedImpact:
description: |-
estimatedImpact is a Markdown-formatted description of the expected
impact of the remediation on the system
(e.g., "Brief pod restart, ~30s downtime").
Maximum 1024 characters.
maxLength: 1024
minLength: 1
type: string
reversible:
description: |-
reversible indicates whether the remediation can be rolled back
Expand Down Expand Up @@ -298,7 +289,6 @@ spec:
required:
- actions
- description
- estimatedImpact
- risk
type: object
rbac:
Expand Down
146 changes: 146 additions & 0 deletions controller/proposal/schema_crd_drift_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
package proposal

import (
"encoding/json"
"os"
"path/filepath"
"testing"

apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
"sigs.k8s.io/yaml"
)

// TestAnalysisSchemaCoversCRDRequiredFields guards against the schema/CRD
// contract drift described in lightspeed-agentic-operator#162: the JSON
// schema sent to the LLM for structured output must mark as required every
// field the AnalysisResult CRD requires. When the two disagree, a valid LLM
// response can be rejected by CRD validation at status-patch time, which
// fails the analysis and orphans the sandbox pod.
//
// The generated CRD (which encodes the +required markers) is the source of
// truth. For every object node the two schemas share, the CRD's required set
// must be a subset of the LLM schema's required set. This would have caught
// both the original estimatedImpact incident and the latent verification gap.
func TestAnalysisSchemaCoversCRDRequiredFields(t *testing.T) {
crdPath := filepath.Join("..", "..", "config", "crd", "bases", "agentic.openshift.io_analysisresults.yaml")
raw, err := os.ReadFile(crdPath)
if err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR — Fragile relative path to CRD file

This relies on go test setting CWD to the package directory, which is correct, but if the package or CRD path is ever restructured, the test fails with a confusing "read CRD" error rather than a clear message about the expected layout. Consider a small clarifying comment, or alternatively using runtime.Caller(0) to resolve the path relative to the source file rather than CWD.

t.Fatalf("read CRD: %v", err)
}

var crd apiextensionsv1.CustomResourceDefinition
if err := yaml.Unmarshal(raw, &crd); err != nil {
t.Fatalf("unmarshal CRD: %v", err)
}
if len(crd.Spec.Versions) == 0 {
t.Fatal("CRD declares no versions")
}

// The LLM analysis schema describes a single option; its per-option shape
// lives at properties.options.items.
var llm map[string]any
if err := json.Unmarshal(AnalysisOutputSchema, &llm); err != nil {
t.Fatalf("unmarshal AnalysisOutputSchema: %v", err)
}
llmOption, ok := digObject(llm, "properties", "options", "items")
if !ok {
t.Fatal("AnalysisOutputSchema is missing properties.options.items")
}

for _, v := range crd.Spec.Versions {
if v.Schema == nil || v.Schema.OpenAPIV3Schema == nil {
continue
}
// The matching per-option schema in the CRD is at status.options.items.
status, ok := v.Schema.OpenAPIV3Schema.Properties["status"]
if !ok {
continue
}
options, ok := status.Properties["options"]
if !ok || options.Items == nil || options.Items.Schema == nil {
t.Fatalf("CRD version %s is missing status.options.items schema", v.Name)
}
assertRequiredCoverage(t, "options[]", options.Items.Schema, llmOption)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR — Guard test only covers AnalysisResult, leaving live drift in other schemas

The test achieves its stated goal for AnalysisResult, but the same drift class this PR fixed for estimatedImpact exists right now in at least two other schemas:

1. VerificationOutputSchema (schemas.go:199) — LLM schema requires ["name", "result"], but the VerificationResult CRD (agentic.openshift.io_verificationresults.yaml:119-123) requires ["name", "result", "source", "value"]. If the LLM omits source or value, the VerificationResult status patch will be rejected by CRD validation — the same failure mode as estimatedImpact.

2. ExecutionOutputSchema (schemas.go:172-179) — The verification sub-object has no required array at all, but the ExecutionResult CRD (agentic.openshift.io_executionresults.yaml:253-255) requires ["conditionOutcome", "summary"]. Same failure mode.

Extending this test to cover all four result CRDs (AnalysisResult, VerificationResult, ExecutionResult, EscalationResult) against their respective *OutputSchema variables would catch these immediately and would prevent the stated goal — "catch this class of drift automatically" — from being incomplete on merge.

// assertRequiredCoverage walks a CRD schema node and the corresponding LLM
// schema node in parallel, asserting that every field required by the CRD is
// also required by the LLM schema (for fields the LLM schema actually models).
func assertRequiredCoverage(t *testing.T, path string, crd *apiextensionsv1.JSONSchemaProps, llm map[string]any) {
t.Helper()
if crd == nil || llm == nil {
return
}

// Arrays: descend into the element schema on both sides.
if crd.Type == "array" {
if crd.Items == nil || crd.Items.Schema == nil {
return
}
llmItems, ok := asJSONObject(llm["items"])
if !ok {
return
}
assertRequiredCoverage(t, path+"[]", crd.Items.Schema, llmItems)
return
}

llmProps, _ := asJSONObject(llm["properties"])
llmRequired := requiredSet(llm["required"])
for _, req := range crd.Required {
// A field the CRD requires must be both modeled and required in the LLM
// output schema. If it is absent entirely the agent is never asked for
// it, so the status patch is rejected at runtime — the exact drift this
// guard exists to catch.
if _, modeled := llmProps[req]; !modeled {
t.Errorf("schema/CRD drift at %s: CRD requires %q but the LLM output schema does not model the field", path, req)
continue
}
if !llmRequired[req] {
t.Errorf("schema/CRD drift at %s: CRD requires %q but the LLM output schema does not mark it required", path, req)
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Comment on lines +91 to +102

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR — Guard only checks one direction (CRD→LLM); reverse direction (LLM→CRD) is unchecked

The test asserts every CRD-required field is also required in the LLM schema. This is the right check for the estimatedImpact bug class (CRD rejects an LLM response).

However, the reverse is also worth guarding: if the LLM schema models a property that does not exist in the CRD at all (e.g., a typo like "descrption" instead of "description"), that field would be silently pruned by the API server on the status patch, causing silent data loss. A reverse check — asserting every property in the LLM schema exists as a property in the CRD — would catch this. Not a blocker, but would complete the guard.

}

// Recurse into properties present in both schemas.
for name := range crd.Properties {
child := crd.Properties[name]
llmChild, ok := asJSONObject(llmProps[name])
if !ok {
continue
}
assertRequiredCoverage(t, path+"."+name, &child, llmChild)
}
}

// digObject walks nested object properties of a decoded JSON schema.
func digObject(m map[string]any, keys ...string) (map[string]any, bool) {
cur := m
for _, k := range keys {
next, ok := asJSONObject(cur[k])
if !ok {
return nil, false
}
cur = next
}
return cur, true
}

func asJSONObject(v any) (map[string]any, bool) {
m, ok := v.(map[string]any)
return m, ok
}

func requiredSet(v any) map[string]bool {
out := map[string]bool{}
arr, ok := v.([]any)
if !ok {
return out
}
for _, e := range arr {
if s, ok := e.(string); ok {
out[s] = true
}
}
return out
}
7 changes: 4 additions & 3 deletions controller/proposal/schemas.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ var AnalysisOutputSchema = json.RawMessage(`{
},
"risk": { "type": "string", "enum": ["Low", "Medium", "High", "Critical"], "description": "Risk assessment. Low: safe to apply. Medium: review recommended. High: careful review required. Critical: manual approval strongly recommended" },
"reversible": { "type": "string", "enum": ["Reversible", "Irreversible", "Partial"], "description": "Whether this remediation can be rolled back. Reversible: fully undoable. Irreversible: cannot undo. Partial: some actions undoable" },
"estimatedImpact": { "type": "string", "description": "Expected impact on the system (e.g., 'Brief pod restart, ~30s downtime')" },
"rollbackPlan": {
"type": "object",
"description": "How to undo the remediation if it fails or causes issues. Required when reversible is Reversible or Partial.",
Expand All @@ -77,10 +76,12 @@ var AnalysisOutputSchema = json.RawMessage(`{
"command": { "type": "string", "description": "Command or API call to run (e.g., 'oc get pod -n production -l app=web -o jsonpath={.status.phase}')" },
"expected": { "type": "string", "description": "Expected output or condition (e.g., 'Running', 'ready=true')" },
"type": { "type": "string", "description": "Check category (e.g., 'command', 'metric', 'condition')" }
}
},
"required": ["name", "type"]
}
}
Comment on lines +79 to 82

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR — Tightening here is correct, but the same fix is needed in two other schemas

Good catch adding "required": ["name", "type"] for verification steps and "required": ["description"] for the verification object in the Analysis schema.

The same tightening is needed elsewhere:

  1. VerificationOutputSchema (line 199 in this file): Currently "required": ["name", "result"] but the CRD requires ["name", "result", "source", "value"]. Add "source" and "value" to the required array.

  2. ExecutionOutputSchema (lines 172-179 in this file): The verification sub-object has no required array, but the CRD requires ["conditionOutcome", "summary"]. Add "required": ["conditionOutcome", "summary"] to the verification object.

These are the same latent drift class noted in the issue description and would cause the exact same runtime failure (status patch rejected → result marked Failed → sandbox pod orphaned).

}
},
"required": ["description"]
Comment thread
coderabbitai[bot] marked this conversation as resolved.
},
"rbac": {
"type": "object",
Expand Down
3 changes: 1 addition & 2 deletions test/agent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -255,8 +255,7 @@ func cannedResponse(phase, targetNS string) []byte {
{ "type": "patch", "description": "mock proposed action" }
],
"risk": "Low",
"reversible": "Reversible",
"estimatedImpact": "Brief pod restart, ~30s downtime"
"reversible": "Reversible"
},
"verification": {
"description": "mock verification plan",
Expand Down