QA: harden flight recorder, ROI and benchmark#47
Merged
Conversation
Fix three defects found while exercising the v0.9.0 flight/ROI/benchmark modules against real stored state, and add edge-case coverage. Defects fixed: - benchmark.load_capture raised an uncaught FileNotFoundError on a missing capture path, producing a traceback instead of a clean CLI exit 1. It now raises ValueError (caught by cli.main) with a clear message, and reports malformed JSON and non-object JSON the same way. - benchmark.compare_captures crashed with AttributeError when a capture had "metrics": None (the dict.get fallback only triggered on a missing key, not a None/non-dict value). Added _metrics_view to fall back to the flat capture whenever metrics is absent or not a mapping. - roi.roi_summary crashed in provider aggregation if an event payload was not a dict. It now skips non-dict payloads defensively. Tests added (tests/test_roi_benchmark.py, tests/test_evidence.py): - fresh/uninitialized store: roi_summary and `roi --json` exit 0, no crash. - flight-record on unknown task -> CLI exit 1 with "Unknown task". - bare task (no worktree/gates/claims): task_metrics, capture, and flight record render without crashing. - provider_usage aggregation across model-only, provider+model, and agent-only events, plus a non-dict payload that must be ignored. - compare_captures fallback (flat capture, metrics=None) and all _verdict branches (tie/win/loss/merge_ready swing). - benchmark compare CLI with missing and malformed JSON -> exit 1. Test count: 15 -> 32 in the targeted files; full suite 303 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@-