fix(SatisfactionCapture): don't fabricate a neutral 5 when there's no real rating#1370
Open
lewta wants to merge 3 commits into
Open
fix(SatisfactionCapture): don't fabricate a neutral 5 when there's no real rating#1370lewta wants to merge 3 commits into
lewta wants to merge 3 commits into
Conversation
… defaulting to 5 A failed/timed-out classification was being written to ratings.jsonl as a neutral rating:5 row. These placeholder rows accumulate (in one running install, ~27% of all signals) and drag every today/week/month average toward 5, making the satisfaction trend meaningless. Both failure branches now log the error for observability and write no rating.
…skip Reproducibly proves the failure path writes no ratings row (forces an inference failure by removing claude from PATH, asserts 0 rows + skip log). Writes only to a throwaway temp PAI_DIR; never touches a real ~/.claude tree.
… no valid 1-10 value The success branch defaulted a missing/garbled rating to 5 (`? r.rating : 5`), the same fabrication the failure branches did. Extract isValidRating() as the single source of truth and only persist when the model returned a real 1-10 rating; otherwise log and skip. Guards main() behind import.meta.main so the helper can be unit-tested without executing the hook. Adds isValidRating unit test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
SatisfactionCapture.hook.tswrites a fabricatedrating: 5whenever it lacks areal satisfaction signal, in three places:
{rating:5,"Inference failed…"}.{rating:5,"Inference error…"}.const rating = (… valid …) ? r.rating : 5;quietly substituted 5.A fabricated 5 is "we don't know" masquerading as "neutral." These rows
accumulate and flatten every today/week/month average — the aggregator
(
PAI/statusline-command.sh~L1422-1438) flat-means.ratingand never reads.confidence, so low-confidence placeholders still count at full weight. In onereal install, 238 of 881 signals (27%) were
"Inference failed"placeholders.Fix
Introduce
isValidRating(x)as the single source of truth (a real number in1-10). Persist a rating only when it's valid; in every other case log for
observability and write nothing. The success path no longer coerces a
missing rating to 5. No behavior change when the model returns a valid rating.
Alternatives considered (and why skipping wins)
flat-means
.ratingand ignores.confidence; the failure rows are alreadywritten at
confidence: 0.3and still skewed every average.every rating consumer (the statusline jq,
LearningPatternSynthesis.ts, …) torecognize and exclude it: a larger, multi-file change for no added benefit.
Skipping keeps the change small while preserving observability — failures are
still logged to stderr. If failures should be queryable, the right home is a
separate failure stream, not the ratings signal.
Testing evidence
Two tests under
Releases/v5.0.0/.claude/hooks/tests/:isValidRating.test.ts— unit test of the validity guard (accepts 1-10;rejects null/undefined/0/11/NaN/Infinity/strings/objects).
bun test→ 2 pass.SatisfactionCapture.failure-path.test.sh— behavioral: forces a deterministicinference failure (removes
claudefromPATH), runs the hook against athrowaway temp
PAI_DIR(never touches a real~/.claude), asserts no rowwritten. Before/after on identical input: