Skip to content

fix(SatisfactionCapture): don't fabricate a neutral 5 when there's no real rating#1370

Open
lewta wants to merge 3 commits into
danielmiessler:mainfrom
lewta:fix/satisfaction-skip-on-inference-failure
Open

fix(SatisfactionCapture): don't fabricate a neutral 5 when there's no real rating#1370
lewta wants to merge 3 commits into
danielmiessler:mainfrom
lewta:fix/satisfaction-skip-on-inference-failure

Conversation

@lewta

@lewta lewta commented Jun 20, 2026

Copy link
Copy Markdown

Problem

SatisfactionCapture.hook.ts writes a fabricated rating: 5 whenever it lacks a
real satisfaction signal, in three places:

  1. Inference call returns failure → wrote {rating:5,"Inference failed…"}.
  2. Inference call throws → wrote {rating:5,"Inference error…"}.
  3. Inference succeeds but the model omits/garbles the rating
    const rating = (… valid …) ? r.rating : 5; quietly substituted 5.

A fabricated 5 is "we don't know" masquerading as "neutral." These rows
accumulate and flatten every today/week/month average — the aggregator
(PAI/statusline-command.sh ~L1422-1438) flat-means .rating and never reads
.confidence, so low-confidence placeholders still count at full weight. In one
real install, 238 of 881 signals (27%) were "Inference failed" placeholders.

Fix

Introduce isValidRating(x) as the single source of truth (a real number in
1-10). Persist a rating only when it's valid; in every other case log for
observability and write nothing. The success path no longer coerces a
missing rating to 5. No behavior change when the model returns a valid rating.

Alternatives considered (and why skipping wins)

  • Lower the confidence on the placeholder row — no effect. The aggregator
    flat-means .rating and ignores .confidence; the failure rows are already
    written at confidence: 0.3 and still skewed every average.
  • Write a sentinel rating excluded from averages — would require teaching
    every rating consumer (the statusline jq, LearningPatternSynthesis.ts, …) to
    recognize and exclude it: a larger, multi-file change for no added benefit.

Skipping keeps the change small while preserving observability — failures are
still logged to stderr. If failures should be queryable, the right home is a
separate failure stream, not the ratings signal.

Testing evidence

Two tests under Releases/v5.0.0/.claude/hooks/tests/:

  • isValidRating.test.ts — unit test of the validity guard (accepts 1-10;
    rejects null/undefined/0/11/NaN/Infinity/strings/objects). bun test → 2 pass.
  • SatisfactionCapture.failure-path.test.sh — behavioral: forces a deterministic
    inference failure (removes claude from PATH), runs the hook against a
    throwaway temp PAI_DIR (never touches a real ~/.claude), asserts no row
    written. Before/after on identical input:
[PRE-FIX ] inference failure → wrote 1 row {"rating":5,"sentiment_summary":"Inference failed: …"}
[POST-FIX] inference failure → wrote 0 rows; stderr: "… skipping signal (no rating written)"
→ PASS (exit 0)

lewta added 3 commits June 20, 2026 09:09
… defaulting to 5

A failed/timed-out classification was being written to ratings.jsonl as a
neutral rating:5 row. These placeholder rows accumulate (in one running
install, ~27% of all signals) and drag every today/week/month average toward
5, making the satisfaction trend meaningless. Both failure branches now log
the error for observability and write no rating.
…skip

Reproducibly proves the failure path writes no ratings row (forces an
inference failure by removing claude from PATH, asserts 0 rows + skip log).
Writes only to a throwaway temp PAI_DIR; never touches a real ~/.claude tree.
… no valid 1-10 value

The success branch defaulted a missing/garbled rating to 5 (`? r.rating : 5`),
the same fabrication the failure branches did. Extract isValidRating() as the
single source of truth and only persist when the model returned a real 1-10
rating; otherwise log and skip. Guards main() behind import.meta.main so the
helper can be unit-tested without executing the hook. Adds isValidRating unit test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant