From the loop-engineering measurement canon (med priority). The Loop Scorecard (v1.4.0) now measures outcome quality, goal progress, gap resolution, and lesson application. The remaining canon metric not yet implemented is confidence calibration: for each domain, correlate confidence-at-decision (decision_log) against the realized outcome quality_multiplier (outcomes.json). A well-calibrated domain's high-confidence cycles should yield high quality; miscalibration means Orient is updating confidence on activity, not value.
Compute (every N cycles, e.g. in 6-C4): per domain, Pearson/sign correlation between confidence_at_decision and the cycle's quality_multiplier over the retained history; surface as a 'Calibration' line on the scorecard. Deferred from the v1.4.0 measurement release to keep that PR focused; data (decision_log confidence + outcomes quality) already exists.
From the loop-engineering measurement canon (med priority). The Loop Scorecard (v1.4.0) now measures outcome quality, goal progress, gap resolution, and lesson application. The remaining canon metric not yet implemented is confidence calibration: for each domain, correlate confidence-at-decision (decision_log) against the realized outcome quality_multiplier (outcomes.json). A well-calibrated domain's high-confidence cycles should yield high quality; miscalibration means Orient is updating confidence on activity, not value.
Compute (every N cycles, e.g. in 6-C4): per domain, Pearson/sign correlation between confidence_at_decision and the cycle's quality_multiplier over the retained history; surface as a 'Calibration' line on the scorecard. Deferred from the v1.4.0 measurement release to keep that PR focused; data (decision_log confidence + outcomes quality) already exists.