Skip to content

calibrate: lexicon density_threshold 2.0 -> 3.0 (cut detector false-positives)#495

Merged
devswha merged 1 commit into
mainfrom
bot/calibration-lexicon-fp
Jun 14, 2026
Merged

calibrate: lexicon density_threshold 2.0 -> 3.0 (cut detector false-positives)#495
devswha merged 1 commit into
mainfrom
bot/calibration-lexicon-fp

Conversation

@devswha

@devswha devswha commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Approved calibration delta (ralplan run 2026-06-03-1359-33c4, Critic APPROVE). The deferred "separate approved-delta PR" from the corpus-expansion effort. Measured, fixtures-first; src/features stays deterministic; burstiness/MATTR/ko-diagnostics unchanged (burstiness-FP deferred to a separate delta).

Change

DEFAULT_LEXICON_DENSITY_THRESHOLD 2.0 → 3.0 (src/features/lexicon-core.js), mirrored in .patina.default.yaml, SKILL.md, core/stylometry.md (doc-sync parity tests enforce it). Single constant; minMatches unchanged.

The C1 experiment (16-candidate grid over the 49 fixtures + private KO/EN corpus via analyzeText opts overrides) showed the lexicon signal is largely a false-positive generator on modern text, and dT=3.0 is the smallest change delivering the clean win.

Effect (re-scored manifests)

before after recall
EN human FP 15.0% (30/200) 5.0% (10/200) 86.9% (unchanged)
KO human FP 16.8%* 14.0% (35/250) 59.2% (unchanged)

*KO 16.8% was the stale 2026-05-22 analyzer; re-scoring corrects it to 14.0%. The threshold change does not move KO — KO FPs are burstiness-driven and deferred.

  • 49 suspect-zone fixtures stay 100% accuracy / ROC-AUC + PR-AUC 1.000.
  • Refreshed rebaseline-{ko,en}-latest, rebaseline-low-fpr-{ko,en}-latest, audit notes.
  • 0 raw text committed; check:no-private-assets OK.

Verification

npm test 766/766 · npm run benchmark 100% / AUC 1.000 · npm run lint clean · npm run check:no-private-assets clean · npm run release:check 4.3.0 sync intact.

… false-positives

Approved calibration delta (ralplan run 2026-06-03-1359-33c4, Critic APPROVE).
Measured, fixtures-first: src/features stays deterministic; burstiness/MATTR/
ko-diagnostics UNCHANGED (burstiness-FP is a deferred separate delta).

Lever: DEFAULT_LEXICON_DENSITY_THRESHOLD 2.0 -> 3.0 (src/features/lexicon-core.js),
mirrored in .patina.default.yaml, SKILL.md, core/stylometry.md (doc-sync tests).
The lexicon signal was largely a false-positive generator on modern text; the
C1 experiment (16-candidate grid over the 49 fixtures + private KO/EN corpus via
analyzeText opts) showed dT=3.0 is the smallest change delivering the clean win.

Effect (re-scored manifests, current analyzer + dT=3.0):
- EN human FP 15.0% (30/200) -> 5.0% (10/200), AI recall unchanged at 86.9%.
- KO human FP 14.0% (35/250), recall unchanged at 59.2% (lexicon does not move
  KO; KO FPs are burstiness-driven -> deferred. The prior 16.8% was the stale
  2026-05-22 analyzer; re-scoring corrects it).
- 49 suspect-zone fixtures stay 100% accuracy / ROC-AUC + PR-AUC 1.000.

Refreshed docs/benchmarks/rebaseline-{ko,en}-latest, rebaseline-low-fpr-{ko,en}-latest,
and audit notes. No raw text committed (0 text rows; check:no-private-assets OK).

Verify: npm test 766/766; npm run benchmark 100% / AUC 1.000; lint; release:check 4.3.0.
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patina Ready Ready Preview, Comment Jun 14, 2026 2:09pm

Request Review

@devswha devswha merged commit 4d32751 into main Jun 14, 2026
8 checks passed
@devswha devswha deleted the bot/calibration-lexicon-fp branch June 14, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant