evals: resync analysis schema with operator (drop estimatedImpact, tighten verification) by harche · Pull Request #80 · openshift/lightspeed-agentic-sandbox

harche · 2026-06-19T15:13:39Z

Resyncs evals/schemas.py (ANALYSIS_WITH_COMPONENTS_SCHEMA, which mirrors the operator's AnalysisOutputSchema) with the operator change in openshift/lightspeed-agentic-operator#174.

Changes

Drop estimatedImpact from proposal — the operator removed it from the API type and CRD.
Add required: ["name", "type"] to verification.steps[] items and required: ["description"] to the verification object, matching the operator schema and the AnalysisResult CRD.

Why

The eval schema is sent as the outputSchema to /run and used for the post-hoc jsonschema.validate. Keeping it in lockstep with the operator ensures eval validation matches what the operator actually enforces, and avoids drift like the estimatedImpact schema/CRD mismatch that motivated #174.

Verification

Live eval: make eval EVAL_ARGS="-k claude" against Vertex / claude-opus-4-6 — test_find_token_skill[claude] PASSED. The model's native structured output conformed to the tightened schema (provider honors the nested required) and jsonschema.validate passed.
Offline: schema is meta-valid; the new constraints behave as intended (missing description/name/type → invalid; steps stays optional).

🤖 Generated with Claude Code

ANALYSIS_WITH_COMPONENTS_SCHEMA mirrors the operator's AnalysisOutputSchema. Sync it with the operator change in openshift/lightspeed-agentic-operator#174: - drop estimatedImpact (removed from the operator API and CRD) - require name+type on verification.steps[] items and description on the verification object, matching the operator schema / AnalysisResult CRD Verified by the live claude/Vertex eval (test_find_token_skill) and offline jsonschema checks. Pre-existing eval-specific divergences (token sentinels, lowercase enums, boolean reversible) are intentional and left unchanged.

coderabbitai · 2026-06-19T15:13:47Z

Warning

Review limit reached

@harche, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 49 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b37a696-bb19-4b67-b050-25a81649ca75

📥 Commits

Reviewing files that changed from the base of the PR and between 8c0481b and 41c53ec.

📒 Files selected for processing (1)

evals/schemas.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-19T15:13:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign raptorsun for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

harche · 2026-06-19T15:20:27Z

Note: this eval fixture is a best-effort approximation of the operator's AnalysisOutputSchema, not a strict mirror that has to stay in lockstep — at runtime the operator sends its own schema, so this only needs to be a valid structured-output shape for the eval pipeline. It can keep eval-specific divergences (token/components/audit fields, enum casing) and doesn't need to block operator-side schema changes.

openshift-ci · 2026-06-19T15:24:59Z

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

joshuawilson

Adversarial Review

The 3 code changes are mechanically correct and exactly mirror operator PR #174. No bugs. But the framing overclaims alignment, and test coverage is single-provider only.

Concern 1: "Resync" title is misleading

The PR title says "resync analysis schema with operator" but significant pre-existing divergences remain untouched — several more impactful than estimatedImpact:

Path	Sandbox (eval)	Operator	Severity
`proposal.reversible`	`{"type": "boolean"}`	`{"type": "string", "enum": ["Reversible", "Irreversible", "Partial"]}`	Type mismatch
`proposal.risk` enum	`["low", "medium", "high", "critical"]`	`["Low", "Medium", "High", "Critical"]`	Casing mismatch
`diagnosis.confidence` enum	`["low", "medium", "high"]`	`["Low", "Medium", "High"]`	Casing mismatch
`rollbackPlan` location	Under `verification.properties`	Under `proposal.properties`	Structural mismatch
`diagnosis.required`	Includes `"token"`	Does not	Extra required field
`options[].required`	Includes `"rbac"`, `"components"`	Does not	Extra required fields

The reversible type mismatch is the most glaring: the eval asks models for a boolean while the operator expects one of three enum strings (Reversible/Irreversible/Partial). A model producing true/false for the eval would produce the wrong type at runtime.

Your comment acknowledges this is "a best-effort approximation" — which is fair, but then the PR title shouldn't say "resync." Consider: "evals: drop estimatedImpact, tighten verification required fields (match operator #174)".

Concern 2: Tightening verified with one provider only

The required additions are schema tightenings — previously valid responses become invalid. Verification was only against Claude (make eval EVAL_ARGS="-k claude"). If Gemini or OpenAI produce verification steps without name/type, those evals will now fail. Not blocking (evals are per-provider), but worth noting.

Concern 3: Operator PR #174 not merged

Both PRs are state: open. If #174 is revised (e.g., verification required changes), this PR would be pre-emptively tightened beyond what the operator enforces. Low risk since eval-only, but coordination matters.

Bottom line

Code changes are safe to merge. The diff correctly mirrors #174's three changes. Consider adjusting the title to not overclaim "resync" given the intentional divergences that remain.

joshuawilson

Inline comments on the three changed lines — see above for the full adversarial review.

joshuawilson · 2026-06-23T02:52:11Z

                                "enum": ["low", "medium", "high", "critical"],
                            },
                            "reversible": {"type": "boolean"},
-                            "estimatedImpact": {"type": "string"},


Removal is clean — estimatedImpact was never in proposal.required, so this is a pure relaxation. Models that still produce the field won't break since there's no additionalProperties: false.

Worth noting: two lines below this removal, reversible is {"type": "boolean"} while the operator defines it as {"type": "string", "enum": ["Reversible", "Irreversible", "Partial"]}. That's a bigger schema/operator divergence than estimatedImpact ever was — a model trained on this eval schema would produce the wrong type at runtime.

joshuawilson · 2026-06-23T02:52:15Z

                                        "expected": {"type": "string"},
                                        "type": {"type": "string"},
                                    },
+                                    "required": ["name", "type"],


This tightening was only verified with Claude (-k claude). Since output_schema is passed to each provider and post-hoc jsonschema.validate runs on the result, if Gemini or OpenAI omit name or type in step items, their evals will start failing after this merge. Might be worth a quick make eval across all providers, or at minimum noting this is a known risk.

joshuawilson · 2026-06-23T02:52:17Z

                                },
                            },
                        },
+                        "required": ["description"],


Operator PR #174 is still open. If it gets revised (e.g., this required constraint is dropped or changed), this sandbox PR would be pre-emptively tightened beyond what the operator enforces. Low risk since eval-only, but worth tracking — consider adding a comment like # Aligned with operator PR #174 so future readers know the provenance.

openshift-ci Bot requested review from blublinsky and joshuawilson June 19, 2026 15:13

joshuawilson reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

evals: resync analysis schema with operator (drop estimatedImpact, tighten verification)#80

evals: resync analysis schema with operator (drop estimatedImpact, tighten verification)#80
harche wants to merge 1 commit into
openshift:mainfrom
harche:drop-estimated-impact

harche commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review limit reached

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

harche commented Jun 19, 2026

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

joshuawilson left a comment

Uh oh!

joshuawilson left a comment

Uh oh!

joshuawilson Jun 23, 2026

Uh oh!

joshuawilson Jun 23, 2026

Uh oh!

joshuawilson Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

harche commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Why

Verification

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review limit reached

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

harche commented Jun 19, 2026

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

joshuawilson left a comment

Choose a reason for hiding this comment

Adversarial Review

Concern 1: "Resync" title is misleading

Concern 2: Tightening verified with one provider only

Concern 3: Operator PR #174 not merged

Bottom line

Uh oh!

joshuawilson left a comment

Choose a reason for hiding this comment

Uh oh!

joshuawilson Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

joshuawilson Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

joshuawilson Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harche commented Jun 19, 2026 •

edited

Loading