feat(bundles): add cricket domain knowledge bundle (OKF v0.4)#144
Open
i-m-arul wants to merge 1 commit into
Open
feat(bundles): add cricket domain knowledge bundle (OKF v0.4)#144i-m-arul wants to merge 1 commit into
i-m-arul wants to merge 1 commit into
Conversation
A hand-crafted, curated OKF bundle for the cricket domain — not agent-generated from BigQuery, but authored from cricket domain expertise and validated against the CricketStudio OKF specification. Demonstrates OKF applied to a knowledge domain (metrics, entities, methodology, agent Q&A patterns, narrative stories) rather than a data catalog. Intended to show OKF's range beyond database schemas. Files: - spec/types.md — 20 cricket type values - spec/provenance.md — source_boundary convention - spec/sample-size.md — minimum data floors before claims are valid - metrics/ — 3 metric definitions (batting SR, bowling economy, death-overs) - players/ — annotated example player entity - teams/ — annotated example team entity - venues/ — annotated example venue entity - dossier/ — verified Q&A pattern for AI agents - stories/ — annotated narrative layer example (story type) - sources/cricsheet.md — open data source (CC BY 3.0) Reference implementation: https://okf.cricketstudio.ai GitHub: https://github.com/i-m-arul/cricketstudio-okf License: CC-BY-4.0
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Author
|
@googlebot I have signed the CLA! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a cricket domain knowledge bundle to
okf/bundles/cricket/— a hand-crafted, curated OKF implementation for the cricket domain.Unlike the existing BigQuery-backed bundles (stackoverflow, ga4, crypto_bitcoin), this is a domain knowledge bundle, not a data catalog. It demonstrates OKF applied to sports domain expertise: metrics, player entities, venue entities, methodology, verified Q&A patterns for AI agents, and a narrative story layer.
Cricket is a useful test case for OKF at domain scale: multiple formats with incompatible comparison contexts, phase-level statistics (powerplay / middle / death overs), strict sample-size requirements before rankings are valid, and multiple data sources with different license boundaries.
What's included
spec/types.mdspec/provenance.mdsource_boundaryconvention (6 values) + confidence + freshnessspec/sample-size.mdmetrics/players/teams/venues/dossier/stories/storytype example — provenance-backed narrative with "what it doesn't say" requiredsources/cricsheet.mdWhy this is different from existing samples
The existing samples run the reference agent against structured datasets. This bundle is hand-authored by cricket domain experts and validated with an open-source validator (
validate_okf.py) rather than generated by the reference agent.This is intentional — the goal is to show what OKF looks like when domain knowledge is curated by humans rather than extracted from a database. Both are valid OKF; this is the other half of the use case.
New type:
storyThe
storytype inspec/types.mdis the most novel addition. It defines a format for provenance-backed domain narratives — a file that tells a cricket story (hook + data + wow moment) but requires a "What It Doesn't Say" section and a mandatoryprovenance:block. This makes narrative content as citable and agent-readable as metric definitions.Reference implementation
A 430+ file CI-validated implementation of this bundle pattern is live at https://okf.cricketstudio.ai, including 65 player profiles, 10 metric definitions, 37 dossier Q&A patterns, 5 cricket stories, and 8 research reports.
GitHub: https://github.com/i-m-arul/cricketstudio-okf
Validator:
python scripts/validate_okf.py(open source)Compatibility with Google OKF v0.1
All files use standard OKF frontmatter. Additional fields (
source_boundary,entity_id,same_as,related,dataset_version) are additive — the spec explicitly permits additional keys. Field aliases:canonical_page=resource,last_verified=timestamp.License
spec/,metrics/,dossier/,stories/,README.md,index.md— CC-BY-4.0sources/cricsheet.md— documents data licensed CC BY 3.0 (Cricsheet.org); no raw Cricsheet data is included