-
Notifications
You must be signed in to change notification settings - Fork 0
feat(core): report inconclusive status when all tests have execution errors #894
Copy link
Copy link
Open
Description
Problem
When all eval tests fail due to execution errors (e.g., Not Found from a misconfigured model, network failures, auth errors), the run still reports as PASS or FAIL based on the score threshold. This is misleading — no actual evaluation happened, so the result should not be treated as a grading outcome.
Example:
1/3 ❌ violates-lightweight-core | default | ERROR: Not Found
2/3 ❌ violates-ai-first-design | default | ERROR: Not Found
3/3 ❌ follows-principles | default | ERROR: Not Found
This reports as FAIL (score below threshold), but the failure isn't due to low eval scores — it's because the provider couldn't be reached at all.
Proposed Behavior
- When all tests in a run have
execution_errorstatus, report aninconclusiveorerrorexit code/status distinct from threshold failure - The JUnit XML output should also reflect this (e.g., tests marked as
errorrather thanfailure) - CLI should print a clear message: "All tests had execution errors — no evaluation was performed"
- Consider a distinct exit code (e.g., exit 2 for execution errors vs exit 1 for threshold failure) so CI workflows can differentiate
Acceptance Criteria
- Distinct exit code when all tests are execution errors
- Clear CLI messaging distinguishing execution errors from grading failures
- JUnit XML uses
<error>elements (not<failure>) for execution errors
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels