Skip to content

corpus: ZH source-feasibility gate (G009)#493

Merged
devswha merged 1 commit into
mainfrom
bot/corpus-zh-feasibility
Jun 14, 2026
Merged

corpus: ZH source-feasibility gate (G009)#493
devswha merged 1 commit into
mainfrom
bot/corpus-zh-feasibility

Conversation

@devswha

@devswha devswha commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Wave 3 source-feasibility gate (G009). Metadata-only; no raw text; no code change. Builds a candidate source inventory + dry-run evidence + GO/NO-GO recommendation. The GO/NO-GO decision and any collection are reserved for the maintainer.

Deliverable

  • artifacts/rebaseline-2025/sources.zh-public.jsonl — 18 Wikimedia CC-BY-SA candidate sources across academic-summary / blog / technical-how-to, full schema (url/language/register/domain/source_type/source_license/source_review/reviewer_notes/redistribution/constraints), no raw text.
  • docs/research/zh-source-feasibility.md — dry-run evidence + recommendation.

Dry-run evidence

Inventory validates (18 rows, 0 errors). Bounded dry-run would-collect 13 candidates across 3 registers (academic 5 / blog 5 / technical 3); projects to ~100+ at full caps.

Recommendation: GO (conditional)

Feasible via CC-BY-SA Wikimedia, conditioned on maintainer ratification of hash-only redistribution, a full run to confirm ≥100 yield, and the 3-register scope. STOP for maintainer GO/NO-GO before any ZH collection.

Verify: check:no-private-assets passes (0 forbidden); no threshold/src/features change.

Wave 3 source-feasibility gate. Builds a metadata-only candidate source
inventory artifacts/rebaseline-2025/sources.zh-public.jsonl (18 Wikimedia
CC-BY-SA sources across academic-summary/blog/technical-how-to) with the full
schema (url/language/register/domain/source_type/source_license/source_review/
reviewer_notes/redistribution/constraints). No raw text.

Dry-run evidence (no text written): inventory validates (18 rows, 0 errors);
would-collect 13 candidates across 3 registers at small caps, projecting to
>=100 across >=3 registers at full caps.

Recommendation: GO (conditional) — feasible via CC-BY-SA Wikimedia, conditioned
on maintainer ratification of the hash-only redistribution choice, a full
collection run to confirm the >=100 yield, and accepting the 3-register scope
(product-doc/chat-update deferred). See docs/research/zh-source-feasibility.md.

STOP for maintainer GO/NO-GO before any ZH collection. Measure-only; no
threshold or src/features change; check:no-private-assets passes.
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patina Ready Ready Preview, Comment Jun 14, 2026 12:33pm

Request Review

@devswha devswha merged commit 21e958f into main Jun 14, 2026
8 checks passed
@devswha devswha deleted the bot/corpus-zh-feasibility branch June 14, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant