Skip to content

evals: resync analysis schema with operator (drop estimatedImpact, tighten verification)#80

Open
harche wants to merge 1 commit into
openshift:mainfrom
harche:drop-estimated-impact
Open

evals: resync analysis schema with operator (drop estimatedImpact, tighten verification)#80
harche wants to merge 1 commit into
openshift:mainfrom
harche:drop-estimated-impact

Conversation

@harche

@harche harche commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Resyncs evals/schemas.py (ANALYSIS_WITH_COMPONENTS_SCHEMA, which mirrors the operator's AnalysisOutputSchema) with the operator change in openshift/lightspeed-agentic-operator#174.

Changes

  • Drop estimatedImpact from proposal — the operator removed it from the API type and CRD.
  • Add required: ["name", "type"] to verification.steps[] items and required: ["description"] to the verification object, matching the operator schema and the AnalysisResult CRD.

Why

The eval schema is sent as the outputSchema to /run and used for the post-hoc jsonschema.validate. Keeping it in lockstep with the operator ensures eval validation matches what the operator actually enforces, and avoids drift like the estimatedImpact schema/CRD mismatch that motivated #174.

Verification

  • Live eval: make eval EVAL_ARGS="-k claude" against Vertex / claude-opus-4-6test_find_token_skill[claude] PASSED. The model's native structured output conformed to the tightened schema (provider honors the nested required) and jsonschema.validate passed.
  • Offline: schema is meta-valid; the new constraints behave as intended (missing description/name/type → invalid; steps stays optional).

🤖 Generated with Claude Code

ANALYSIS_WITH_COMPONENTS_SCHEMA mirrors the operator's AnalysisOutputSchema.
Sync it with the operator change in
openshift/lightspeed-agentic-operator#174:

- drop estimatedImpact (removed from the operator API and CRD)
- require name+type on verification.steps[] items and description on the
  verification object, matching the operator schema / AnalysisResult CRD

Verified by the live claude/Vertex eval (test_find_token_skill) and offline
jsonschema checks. Pre-existing eval-specific divergences (token sentinels,
lowercase enums, boolean reversible) are intentional and left unchanged.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@harche, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 49 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b37a696-bb19-4b67-b050-25a81649ca75

📥 Commits

Reviewing files that changed from the base of the PR and between 8c0481b and 41c53ec.

📒 Files selected for processing (1)
  • evals/schemas.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign raptorsun for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@harche

harche commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Note: this eval fixture is a best-effort approximation of the operator's AnalysisOutputSchema, not a strict mirror that has to stay in lockstep — at runtime the operator sends its own schema, so this only needs to be a valid structured-output shape for the eval pipeline. It can keep eval-specific divergences (token/components/audit fields, enum casing) and doesn't need to block operator-side schema changes.

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@joshuawilson joshuawilson left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

The 3 code changes are mechanically correct and exactly mirror operator PR #174. No bugs. But the framing overclaims alignment, and test coverage is single-provider only.

Concern 1: "Resync" title is misleading

The PR title says "resync analysis schema with operator" but significant pre-existing divergences remain untouched — several more impactful than estimatedImpact:

Path Sandbox (eval) Operator Severity
proposal.reversible {"type": "boolean"} {"type": "string", "enum": ["Reversible", "Irreversible", "Partial"]} Type mismatch
proposal.risk enum ["low", "medium", "high", "critical"] ["Low", "Medium", "High", "Critical"] Casing mismatch
diagnosis.confidence enum ["low", "medium", "high"] ["Low", "Medium", "High"] Casing mismatch
rollbackPlan location Under verification.properties Under proposal.properties Structural mismatch
diagnosis.required Includes "token" Does not Extra required field
options[].required Includes "rbac", "components" Does not Extra required fields

The reversible type mismatch is the most glaring: the eval asks models for a boolean while the operator expects one of three enum strings (Reversible/Irreversible/Partial). A model producing true/false for the eval would produce the wrong type at runtime.

Your comment acknowledges this is "a best-effort approximation" — which is fair, but then the PR title shouldn't say "resync." Consider: "evals: drop estimatedImpact, tighten verification required fields (match operator #174)".

Concern 2: Tightening verified with one provider only

The required additions are schema tightenings — previously valid responses become invalid. Verification was only against Claude (make eval EVAL_ARGS="-k claude"). If Gemini or OpenAI produce verification steps without name/type, those evals will now fail. Not blocking (evals are per-provider), but worth noting.

Concern 3: Operator PR #174 not merged

Both PRs are state: open. If #174 is revised (e.g., verification required changes), this PR would be pre-emptively tightened beyond what the operator enforces. Low risk since eval-only, but coordination matters.

Bottom line

Code changes are safe to merge. The diff correctly mirrors #174's three changes. Consider adjusting the title to not overclaim "resync" given the intentional divergences that remain.

@joshuawilson joshuawilson left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments on the three changed lines — see above for the full adversarial review.

Comment thread evals/schemas.py
"enum": ["low", "medium", "high", "critical"],
},
"reversible": {"type": "boolean"},
"estimatedImpact": {"type": "string"},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal is clean — estimatedImpact was never in proposal.required, so this is a pure relaxation. Models that still produce the field won't break since there's no additionalProperties: false.

Worth noting: two lines below this removal, reversible is {"type": "boolean"} while the operator defines it as {"type": "string", "enum": ["Reversible", "Irreversible", "Partial"]}. That's a bigger schema/operator divergence than estimatedImpact ever was — a model trained on this eval schema would produce the wrong type at runtime.

Comment thread evals/schemas.py
"expected": {"type": "string"},
"type": {"type": "string"},
},
"required": ["name", "type"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tightening was only verified with Claude (-k claude). Since output_schema is passed to each provider and post-hoc jsonschema.validate runs on the result, if Gemini or OpenAI omit name or type in step items, their evals will start failing after this merge. Might be worth a quick make eval across all providers, or at minimum noting this is a known risk.

Comment thread evals/schemas.py
},
},
},
"required": ["description"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator PR #174 is still open. If it gets revised (e.g., this required constraint is dropped or changed), this sandbox PR would be pre-emptively tightened beyond what the operator enforces. Low risk since eval-only, but worth tracking — consider adding a comment like # Aligned with operator PR #174 so future readers know the provenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants