calibrate: lexicon density_threshold 2.0 -> 3.0 (cut detector false-positives)#495
Merged
Conversation
… false-positives
Approved calibration delta (ralplan run 2026-06-03-1359-33c4, Critic APPROVE).
Measured, fixtures-first: src/features stays deterministic; burstiness/MATTR/
ko-diagnostics UNCHANGED (burstiness-FP is a deferred separate delta).
Lever: DEFAULT_LEXICON_DENSITY_THRESHOLD 2.0 -> 3.0 (src/features/lexicon-core.js),
mirrored in .patina.default.yaml, SKILL.md, core/stylometry.md (doc-sync tests).
The lexicon signal was largely a false-positive generator on modern text; the
C1 experiment (16-candidate grid over the 49 fixtures + private KO/EN corpus via
analyzeText opts) showed dT=3.0 is the smallest change delivering the clean win.
Effect (re-scored manifests, current analyzer + dT=3.0):
- EN human FP 15.0% (30/200) -> 5.0% (10/200), AI recall unchanged at 86.9%.
- KO human FP 14.0% (35/250), recall unchanged at 59.2% (lexicon does not move
KO; KO FPs are burstiness-driven -> deferred. The prior 16.8% was the stale
2026-05-22 analyzer; re-scoring corrects it).
- 49 suspect-zone fixtures stay 100% accuracy / ROC-AUC + PR-AUC 1.000.
Refreshed docs/benchmarks/rebaseline-{ko,en}-latest, rebaseline-low-fpr-{ko,en}-latest,
and audit notes. No raw text committed (0 text rows; check:no-private-assets OK).
Verify: npm test 766/766; npm run benchmark 100% / AUC 1.000; lint; release:check 4.3.0.
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Approved calibration delta (ralplan run 2026-06-03-1359-33c4, Critic APPROVE). The deferred "separate approved-delta PR" from the corpus-expansion effort. Measured, fixtures-first;
src/featuresstays deterministic; burstiness/MATTR/ko-diagnostics unchanged (burstiness-FP deferred to a separate delta).Change
DEFAULT_LEXICON_DENSITY_THRESHOLD2.0 → 3.0 (src/features/lexicon-core.js), mirrored in.patina.default.yaml,SKILL.md,core/stylometry.md(doc-sync parity tests enforce it). Single constant;minMatchesunchanged.The C1 experiment (16-candidate grid over the 49 fixtures + private KO/EN corpus via
analyzeTextopts overrides) showed the lexicon signal is largely a false-positive generator on modern text, and dT=3.0 is the smallest change delivering the clean win.Effect (re-scored manifests)
*KO 16.8% was the stale 2026-05-22 analyzer; re-scoring corrects it to 14.0%. The threshold change does not move KO — KO FPs are burstiness-driven and deferred.
rebaseline-{ko,en}-latest,rebaseline-low-fpr-{ko,en}-latest, audit notes.check:no-private-assetsOK.Verification
npm test766/766 ·npm run benchmark100% / AUC 1.000 ·npm run lintclean ·npm run check:no-private-assetsclean ·npm run release:check4.3.0 sync intact.