Child of #75. Builds on Phase 1 (#79). Full suite + automated aggregation + first tracked baseline.
Goal
Run the full task set end to end, aggregate, and commit a baseline we can diff over time (same discipline as the Rudder playground/bench baseline).
Deliverables
Out of scope
Charts/visualization are optional and can follow.
Child of #75. Builds on Phase 1 (#79). Full suite + automated aggregation + first tracked baseline.
Goal
Run the full task set end to end, aggregate, and commit a baseline we can diff over time (same discipline as the Rudder
playground/benchbaseline).Deliverables
aggregator: median time, median interventions, success rate per (framework, task) from the rawreport.jsonfiles.framework | median time | median interventions | success rate | n.Out of scope
Charts/visualization are optional and can follow.