Skip to content

corpus: KO collection wave (measure-only, G007)#491

Merged
devswha merged 1 commit into
mainfrom
bot/corpus-ko-wave1
Jun 14, 2026
Merged

corpus: KO collection wave (measure-only, G007)#491
devswha merged 1 commit into
mainfrom
bot/corpus-ko-wave1

Conversation

@devswha

@devswha devswha commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Wave 1 of the approved corpus-expansion plan (.gjc/plans/ralplan/2026-06-03-1359-33c4). Measure-only: no detector threshold change, no src/features change.

What's in the manifest

artifacts/rebaseline-2025/manifest.ko.scored.public.jsonl — 380 hash-only rows:

  • 250 natural-human controls (5 registers × 50, reused)
  • 120 ai-like positives across 3 model families (gpt 40 / claude 40 / gemini 40)
  • 5 lightly-edited-ai + 5 heavily-edited-ai (one light + one heavy per register)

Raw text stays in the gitignored private workspace; only hashes/metadata/scores are committed. .gitignore allowlists the new hash-only manifest.

Findings (docs/benchmarks/)

  • rebaseline-ko-latest — accuracy 75.0%, recall 59.2%, FP 16.8%; catch rate gpt 50% / claude 62.5% / gemini 67.5%. Public claim gate stays BLOCKED (per-family n<100 is an explicit measure-only Non-Goal).
  • rebaseline-low-fpr-ko-latest — B4 TPR@1%/5%FPR for ko and ko × register. Overall TPR at 5% FPR is 0.0% — high-scoring human controls block low-FPR operation (the honest "corpus is hard" outcome motivating a future, separately-approved calibration delta).
  • rebaseline-audit-ko-latest — operator audit of perfect/boundary samples: 0 mislabeled, 0 too-easy.

Verification

  • npm test 766/766
  • npm run benchmark 100% / ROC-AUC 1.000 / PR-AUC 1.000 (baseline fixtures unchanged)
  • benchmark:report, benchmark:robustness, check:no-private-assets, lint all pass

Wave 1 of the approved corpus-expansion plan. Measure-only: no detector
threshold change, no src/features change.

Manifest artifacts/rebaseline-2025/manifest.ko.scored.public.jsonl (380 rows,
hash-only): 250 natural-human controls + 120 ai-like positives across 3 model
families (gpt 40 / claude 40 / gemini 40) + 5 lightly-edited-ai + 5
heavily-edited-ai (one light + one heavy per register). Raw text stays in the
gitignored private workspace; only hashes/metadata/scores are committed.

Reports (docs/benchmarks/):
- rebaseline-ko-latest.{md,json}: accuracy 75.0%, recall 59.2%, FP 16.8%;
  catch rate by family gpt 50% / claude 62.5% / gemini 67.5%. Public claim gate
  stays BLOCKED (per-family n<100 is an explicit measure-only Non-Goal).
- rebaseline-low-fpr-ko-latest.{md,json}: B4 TPR@1%/5%FPR for ko and
  ko x register. Overall TPR at 5% FPR is 0.0% — high-scoring human controls
  block low-FPR operation, the honest "corpus is hard" outcome.
- rebaseline-audit-ko-latest.md: operator audit of perfect/boundary cases;
  0 mislabeled, 0 too-easy.

Verify: npm test 766/766; npm run benchmark 100% / ROC-AUC 1.000 / PR-AUC 1.000;
benchmark:report, benchmark:robustness, check:no-private-assets, lint all pass.
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patina Ready Ready Preview, Comment Jun 14, 2026 11:59am

Request Review

@devswha devswha merged commit c877a97 into main Jun 14, 2026
8 checks passed
@devswha devswha deleted the bot/corpus-ko-wave1 branch June 14, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant