Skip to content

fix(golden): refresh knob_step reference #296

Open
j0sh wants to merge 2 commits into
mainfrom
codex/update-knob-step-reference
Open

fix(golden): refresh knob_step reference #296
j0sh wants to merge 2 commits into
mainfrom
codex/update-knob-step-reference

Conversation

@j0sh

@j0sh j0sh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This requires an updated knob_step.tar.gz uploaded to Huggingface; not sure I have access to that


Summary

Refresh the knob_step golden reference from current DEMON main
and tighten its thresholds around the new canonical.

Also document that refs.json's per-scenario env block is
provenance only, not part of the golden pass/fail contract.

What Changed

  • Replaced the knob_step reference bundle hash and canonical hash in
    tests/golden/refs.json
  • Recalibrated knob_step thresholds for the refreshed canonical:
    mel_l2 0.08, rms_db_diff 0.1, win_cos_min 0.999
  • Updated the threshold note to explain the 2026-06-25 refresh and the
    observed same-pod / fresh-pod verification spread
  • Normalized the recorded pod URL to remove the signed query token
  • Added a README note clarifying that refs.json env metadata is
    provenance only and may be sparse for queued production-pod captures

Why

The old 2026-06-04 knob_step canonical had drifted enough that clean
current runs were failing the calibrated mel_l2 <= 0.13 gate.

A fresh reference captured from DEMON main fixes that mismatch.
Same-pod repeat-3 noise remained tiny, while fresh-pod verification
runs against the new canonical stayed comfortably inside the updated
thresholds.

Verification

  • Reproduced the old failure against live pods before the refresh
  • Captured a fresh knob_step canonical from DEMON main
  • Ran a same-pod repeat-3 variance probe
  • Verified fresh-pod golden runs pass against the refreshed canonical

j0sh added 2 commits June 25, 2026 10:33
Refresh the knob_step canonical bundle from DEMON main after the 2026-06-04 reference drifted past the calibrated mel_l2 gate. Keep the fix scoped to refs.json: update the bundle/canonical hashes, tighten thresholds to the observed 2026-06-25 same-pod and fresh-pod runs, and strip the signed pod query token from the recorded pod URL.
Document that refs.json env metadata is provenance only and not part of the golden comparison contract. This matches the current harness behavior and explains why queued production-pod captures may carry sparse or normalized env fields without affecting pass/fail semantics.
@j0sh j0sh requested a review from ryanontheinside June 25, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant