Skip to content

feat(ko): uniform plain-다 ending-monotony hot signal — KO×GPT catch 45→82.5% (5.2.0)#498

Merged
devswha merged 2 commits into
mainfrom
bot/ko-da-monotony-signal
Jun 15, 2026
Merged

feat(ko): uniform plain-다 ending-monotony hot signal — KO×GPT catch 45→82.5% (5.2.0)#498
devswha merged 2 commits into
mainfrom
bot/ko-da-monotony-signal

Conversation

@devswha

@devswha devswha commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a KO-only deterministic hot signaluniform plain-다 register — that closes the biggest measured detection gap (KO×GPT recall 44%). Bumps to 5.2.0 (minor).

A paragraph is hot when declarative -다 endings dominate (ratio ≥ 0.6 and count ≥ 2) and sentence lengths are uniform (burstiness CV below the low band) and the paragraph has ≥ 20 tokens. Unlike the standard burstiness trigger it does not require 3 sentences, so it catches short, length-uniform AI Korean the band gate skipped — while the -다 + low-CV conjuncts spare formal human Korean (same -다, but varied sentence lengths → high CV) and conversational Korean (요/습니다), and the 20-token floor spares terse snippets.

Implemented as a first-class signal (not the advisory koPostEditese payload), wired into the per-paragraph hot OR in src/features/index.js, mirrored in playground/analyzer.js for browser parity, and counted in rebaseline-score trigger counts.

Measured impact (KO rebaseline manifest, n=380, deterministic analyzer)

metric before → after
KO×GPT catch 45.0% → 82.5%
KO recall 59.2% → 70.8%
KO accuracy 77.6% → 80.8%
KO F1 0.644 → 0.716
KO precision 70.6% → 72.4%
human-control FPR 12.8% → 14.0% (within published 11.6–21.7% CI)

15 new true positives for 3 new false positives (5:1), so precision rose too. EN/ZH/JA are byte-identical (KO-only). The frozen public claim manifests and the headline catch/FP claim are refreshed on the next dedicated rebaseline pass, not here.

How it was derived

Root cause: all 22 missed KO×GPT samples were single-paragraph 2–3 sentence snippets, and KO detection relied entirely on burstiness (which needs ≥3 sentences); MATTR/lexicon/ko-diagnostics fire ~0% for KO. The -다-monotony + low-CV separator was found by length-matched comparison of missed-AI vs 250 human controls, and the 20-token floor + low-CV conjunct were added after the signal over-fired on terse toy fixtures.

Verification

  • npm test787 pass / 0 fail (5 new ending-monotony unit tests incl. precision guards)
  • npm run benchmark — 49-fixture suite still 100% (natural KO fixtures stay cold)
  • npm run lint — syntax OK, cspell 0 issues
  • npm run release:check — OK for 5.2.0
  • npm run check:no-private-assets — OK
  • node ↔ playground parity confirmed on KO samples

Documented in core/stylometry.md (hot rule + calibration + failure mode) and SKILL.md.

devswha added 2 commits June 15, 2026 19:11
Adds a KO-only per-paragraph deterministic hot signal (koreanEndingMonotony):
fires when declarative -다 endings dominate (ratio >= 0.6, count >= 2) AND
burstiness CV is below the low band AND the paragraph has >= 20 tokens. Unlike
the standard burstiness trigger it does not require 3 sentences, so it catches
short, length-uniform AI Korean the band gate skipped, while the -다 + low-CV
conjuncts spare formal human Korean (varied lengths -> high CV) and
conversational Korean (요/습니다), and the 20-token floor spares terse snippets.

Implemented as a first-class signal (not the advisory koPostEditese payload),
wired into the per-paragraph hot OR in src/features/index.js, mirrored in
playground/analyzer.js for browser parity, and counted in rebaseline-score's
trigger_counts. KO rebaseline (n=380): KO×GPT catch 45.0->82.5%, recall
59.2->70.8%, F1 0.644->0.716, FPR 12.8->14.0% (within published CI); EN
unchanged. Documented in core/stylometry.md and SKILL.md with calibration.
Minor bump for the KO uniform plain-다 ending-monotony hot signal. KO-only
deterministic stylometry addition; en/zh/ja byte-identical; FP within published
tolerance. Syncs all version surfaces and adds the CHANGELOG 5.2.0 entry.
@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patina Ready Ready Preview, Comment Jun 15, 2026 10:12am

Request Review

@devswha devswha merged commit 030ebce into main Jun 15, 2026
8 checks passed
@devswha devswha deleted the bot/ko-da-monotony-signal branch June 15, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant