Skip to content

feat(bundles): add cricket domain knowledge bundle (OKF v0.4)#144

Open
i-m-arul wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
i-m-arul:add-cricket-domain-bundle
Open

feat(bundles): add cricket domain knowledge bundle (OKF v0.4)#144
i-m-arul wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
i-m-arul:add-cricket-domain-bundle

Conversation

@i-m-arul

Copy link
Copy Markdown

Summary

This PR adds a cricket domain knowledge bundle to okf/bundles/cricket/ — a hand-crafted, curated OKF implementation for the cricket domain.

Unlike the existing BigQuery-backed bundles (stackoverflow, ga4, crypto_bitcoin), this is a domain knowledge bundle, not a data catalog. It demonstrates OKF applied to sports domain expertise: metrics, player entities, venue entities, methodology, verified Q&A patterns for AI agents, and a narrative story layer.

Cricket is a useful test case for OKF at domain scale: multiple formats with incompatible comparison contexts, phase-level statistics (powerplay / middle / death overs), strict sample-size requirements before rankings are valid, and multiple data sources with different license boundaries.

What's included

Path Content
spec/types.md 20 cricket type values extending OKF's open type system
spec/provenance.md source_boundary convention (6 values) + confidence + freshness
spec/sample-size.md Minimum data floors before claims are valid (≥30 balls batting, ≥15 bowling)
metrics/ 3 metric definitions — batting SR, bowling economy, death-overs economy
players/ Annotated example player entity (placeholder values, not real player data)
teams/ Annotated example team entity
venues/ Annotated example venue entity
dossier/ Verified Q&A pattern showing how agents should answer cricket questions
stories/ Annotated story type example — provenance-backed narrative with "what it doesn't say" required
sources/cricsheet.md Open data source declaration (Cricsheet CC BY 3.0)

Why this is different from existing samples

The existing samples run the reference agent against structured datasets. This bundle is hand-authored by cricket domain experts and validated with an open-source validator (validate_okf.py) rather than generated by the reference agent.

This is intentional — the goal is to show what OKF looks like when domain knowledge is curated by humans rather than extracted from a database. Both are valid OKF; this is the other half of the use case.

New type: story

The story type in spec/types.md is the most novel addition. It defines a format for provenance-backed domain narratives — a file that tells a cricket story (hook + data + wow moment) but requires a "What It Doesn't Say" section and a mandatory provenance: block. This makes narrative content as citable and agent-readable as metric definitions.

Reference implementation

A 430+ file CI-validated implementation of this bundle pattern is live at https://okf.cricketstudio.ai, including 65 player profiles, 10 metric definitions, 37 dossier Q&A patterns, 5 cricket stories, and 8 research reports.

GitHub: https://github.com/i-m-arul/cricketstudio-okf
Validator: python scripts/validate_okf.py (open source)

Compatibility with Google OKF v0.1

All files use standard OKF frontmatter. Additional fields (source_boundary, entity_id, same_as, related, dataset_version) are additive — the spec explicitly permits additional keys. Field aliases: canonical_page = resource, last_verified = timestamp.

License

  • spec/, metrics/, dossier/, stories/, README.md, index.mdCC-BY-4.0
  • sources/cricsheet.md — documents data licensed CC BY 3.0 (Cricsheet.org); no raw Cricsheet data is included
  • All entity example files — CC-BY-4.0 (annotated placeholders only; no real player statistics)

A hand-crafted, curated OKF bundle for the cricket domain — not
agent-generated from BigQuery, but authored from cricket domain
expertise and validated against the CricketStudio OKF specification.

Demonstrates OKF applied to a knowledge domain (metrics, entities,
methodology, agent Q&A patterns, narrative stories) rather than a
data catalog. Intended to show OKF's range beyond database schemas.

Files:
- spec/types.md          — 20 cricket type values
- spec/provenance.md     — source_boundary convention
- spec/sample-size.md    — minimum data floors before claims are valid
- metrics/               — 3 metric definitions (batting SR, bowling economy, death-overs)
- players/               — annotated example player entity
- teams/                 — annotated example team entity
- venues/                — annotated example venue entity
- dossier/               — verified Q&A pattern for AI agents
- stories/               — annotated narrative layer example (story type)
- sources/cricsheet.md   — open data source (CC BY 3.0)

Reference implementation: https://okf.cricketstudio.ai
GitHub: https://github.com/i-m-arul/cricketstudio-okf
License: CC-BY-4.0
@google-cla

google-cla Bot commented Jun 24, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@i-m-arul

Copy link
Copy Markdown
Author

@googlebot I have signed the CLA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant