measurement: confidence calibration score (do confidence scores predict outcomes?)

From the loop-engineering measurement canon (med priority). The Loop Scorecard (v1.4.0) now measures outcome quality, goal progress, gap resolution, and lesson application. The remaining canon metric not yet implemented is **confidence calibration**: for each domain, correlate confidence-at-decision (decision_log) against the realized outcome quality_multiplier (outcomes.json). A well-calibrated domain's high-confidence cycles should yield high quality; miscalibration means Orient is updating confidence on activity, not value.

Compute (every N cycles, e.g. in 6-C4): per domain, Pearson/sign correlation between confidence_at_decision and the cycle's quality_multiplier over the retained history; surface as a 'Calibration' line on the scorecard. Deferred from the v1.4.0 measurement release to keep that PR focused; data (decision_log confidence + outcomes quality) already exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

measurement: confidence calibration score (do confidence scores predict outcomes?) #56

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

measurement: confidence calibration score (do confidence scores predict outcomes?) #56

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions