feat(ko): uniform plain-다 ending-monotony hot signal — KO×GPT catch 45→82.5% (5.2.0) by devswha · Pull Request #498 · devswha/patina

devswha · 2026-06-15T10:12:48Z

Summary

Adds a KO-only deterministic hot signal — uniform plain-다 register — that closes the biggest measured detection gap (KO×GPT recall 44%). Bumps to 5.2.0 (minor).

A paragraph is hot when declarative -다 endings dominate (ratio ≥ 0.6 and count ≥ 2) and sentence lengths are uniform (burstiness CV below the low band) and the paragraph has ≥ 20 tokens. Unlike the standard burstiness trigger it does not require 3 sentences, so it catches short, length-uniform AI Korean the band gate skipped — while the -다 + low-CV conjuncts spare formal human Korean (same -다, but varied sentence lengths → high CV) and conversational Korean (요/습니다), and the 20-token floor spares terse snippets.

Implemented as a first-class signal (not the advisory koPostEditese payload), wired into the per-paragraph hot OR in src/features/index.js, mirrored in playground/analyzer.js for browser parity, and counted in rebaseline-score trigger counts.

Measured impact (KO rebaseline manifest, n=380, deterministic analyzer)

metric	before → after
KO×GPT catch	45.0% → 82.5%
KO recall	59.2% → 70.8%
KO accuracy	77.6% → 80.8%
KO F1	0.644 → 0.716
KO precision	70.6% → 72.4%
human-control FPR	12.8% → 14.0% (within published 11.6–21.7% CI)

15 new true positives for 3 new false positives (5:1), so precision rose too. EN/ZH/JA are byte-identical (KO-only). The frozen public claim manifests and the headline catch/FP claim are refreshed on the next dedicated rebaseline pass, not here.

How it was derived

Root cause: all 22 missed KO×GPT samples were single-paragraph 2–3 sentence snippets, and KO detection relied entirely on burstiness (which needs ≥3 sentences); MATTR/lexicon/ko-diagnostics fire ~0% for KO. The -다-monotony + low-CV separator was found by length-matched comparison of missed-AI vs 250 human controls, and the 20-token floor + low-CV conjunct were added after the signal over-fired on terse toy fixtures.

Verification

npm test — 787 pass / 0 fail (5 new ending-monotony unit tests incl. precision guards)
npm run benchmark — 49-fixture suite still 100% (natural KO fixtures stay cold)
npm run lint — syntax OK, cspell 0 issues
npm run release:check — OK for 5.2.0
npm run check:no-private-assets — OK
node ↔ playground parity confirmed on KO samples

Documented in core/stylometry.md (hot rule + calibration + failure mode) and SKILL.md.

Adds a KO-only per-paragraph deterministic hot signal (koreanEndingMonotony): fires when declarative -다 endings dominate (ratio >= 0.6, count >= 2) AND burstiness CV is below the low band AND the paragraph has >= 20 tokens. Unlike the standard burstiness trigger it does not require 3 sentences, so it catches short, length-uniform AI Korean the band gate skipped, while the -다 + low-CV conjuncts spare formal human Korean (varied lengths -> high CV) and conversational Korean (요/습니다), and the 20-token floor spares terse snippets. Implemented as a first-class signal (not the advisory koPostEditese payload), wired into the per-paragraph hot OR in src/features/index.js, mirrored in playground/analyzer.js for browser parity, and counted in rebaseline-score's trigger_counts. KO rebaseline (n=380): KO×GPT catch 45.0->82.5%, recall 59.2->70.8%, F1 0.644->0.716, FPR 12.8->14.0% (within published CI); EN unchanged. Documented in core/stylometry.md and SKILL.md with calibration.

Minor bump for the KO uniform plain-다 ending-monotony hot signal. KO-only deterministic stylometry addition; en/zh/ja byte-identical; FP within published tolerance. Syncs all version surfaces and adds the CHANGELOG 5.2.0 entry.

vercel · 2026-06-15T10:12:50Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
patina	Ready	Preview, Comment	Jun 15, 2026 10:12am

devswha added 2 commits June 15, 2026 19:11

chore(release): 5.2.0

f9d94ef

Minor bump for the KO uniform plain-다 ending-monotony hot signal. KO-only deterministic stylometry addition; en/zh/ja byte-identical; FP within published tolerance. Syncs all version surfaces and adds the CHANGELOG 5.2.0 entry.

vercel Bot deployed to Preview June 15, 2026 10:12 View deployment

devswha merged commit 030ebce into main Jun 15, 2026
8 checks passed

devswha deleted the bot/ko-da-monotony-signal branch June 15, 2026 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ko): uniform plain-다 ending-monotony hot signal — KO×GPT catch 45→82.5% (5.2.0)#498

feat(ko): uniform plain-다 ending-monotony hot signal — KO×GPT catch 45→82.5% (5.2.0)#498
devswha merged 2 commits into
mainfrom
bot/ko-da-monotony-signal

devswha commented Jun 15, 2026

Uh oh!

vercel Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devswha commented Jun 15, 2026

Summary

Measured impact (KO rebaseline manifest, n=380, deterministic analyzer)

How it was derived

Verification

Uh oh!

vercel Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 15, 2026 •

edited

Loading