Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion evals/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@
"enum": ["low", "medium", "high", "critical"],
},
"reversible": {"type": "boolean"},
"estimatedImpact": {"type": "string"},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal is clean — estimatedImpact was never in proposal.required, so this is a pure relaxation. Models that still produce the field won't break since there's no additionalProperties: false.

Worth noting: two lines below this removal, reversible is {"type": "boolean"} while the operator defines it as {"type": "string", "enum": ["Reversible", "Irreversible", "Partial"]}. That's a bigger schema/operator divergence than estimatedImpact ever was — a model trained on this eval schema would produce the wrong type at runtime.

},
"required": ["description", "actions", "risk", "reversible"],
},
Expand All @@ -69,6 +68,7 @@
"expected": {"type": "string"},
"type": {"type": "string"},
},
"required": ["name", "type"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tightening was only verified with Claude (-k claude). Since output_schema is passed to each provider and post-hoc jsonschema.validate runs on the result, if Gemini or OpenAI omit name or type in step items, their evals will start failing after this merge. Might be worth a quick make eval across all providers, or at minimum noting this is a known risk.

},
},
"rollbackPlan": {
Expand All @@ -79,6 +79,7 @@
},
},
},
"required": ["description"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator PR #174 is still open. If it gets revised (e.g., this required constraint is dropped or changed), this sandbox PR would be pre-emptively tightened beyond what the operator enforces. Low risk since eval-only, but worth tracking — consider adding a comment like # Aligned with operator PR #174 so future readers know the provenance.

},
"rbac": {
"type": "object",
Expand Down