diff --git a/okf/bundles/cricket/README.md b/okf/bundles/cricket/README.md new file mode 100644 index 0000000..09108c8 --- /dev/null +++ b/okf/bundles/cricket/README.md @@ -0,0 +1,58 @@ +# Cricket Domain Example Bundle for Google OKF + +This directory contains a standalone cricket domain example bundle demonstrating the Cricket OKF profile on top of [Google Open Knowledge Format (OKF) v0.1](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md). + +## What this bundle demonstrates + +- A **cricket type vocabulary** extending Google OKF's open `type` system with 20 cricket-specific values +- A **provenance convention** adding source traceability, license declaration, and freshness fields that Google OKF v0.1 leaves undefined +- A **sample-size doctrine** defining minimum data thresholds before cricket rankings are valid +- **Metric definition files** (batting strike rate, bowling economy, death-overs economy) showing the standard format for citable cricket metrics +- **Entity files** (player, team, venue) with canonical URLs, external IDs, and relationship links +- A **dossier file** showing a verified Q&A pattern for AI agents +- A **story file** showing the narrative layer format — provenance-backed cricket narrative with scope, sample size, and stated limitations + +## File Structure + +``` +examples/cricket/ +├── README.md +├── index.md # Bundle overview +├── spec/ +│ ├── types.md # Cricket type vocabulary (20 types) +│ ├── provenance.md # Provenance convention +│ └── sample-size.md # Sample-size doctrine +├── metrics/ +│ ├── batting-strike-rate.md # Metric definition +│ ├── bowling-economy.md # Metric definition +│ └── death-overs-economy.md # Phase metric definition +├── players/ +│ └── example-t20-batter.md # Annotated example player entity +├── teams/ +│ └── example-t20-team.md # Annotated example team entity +├── venues/ +│ └── example-cricket-ground.md # Annotated example venue entity +├── dossier/ +│ └── example-agent-pattern.md # Verified Q&A pattern for agents +├── stories/ +│ └── example-cricket-story.md # Annotated example story (narrative layer) +└── sources/ + └── cricsheet.md # Open data source declaration (CC BY 3.0) +``` + +## Live Reference Implementation + +The full CricketStudio OKF bundle at **https://okf.cricketstudio.ai** is a CI-validated, 430+ file implementation of this profile, including: +- 65 IPL player profiles with phase splits (powerplay / middle / death) +- 10 metric definitions +- 37 dossier Q&A patterns +- 5 provenance-backed cricket stories (Journeys) +- 8 research reports + +GitHub: https://github.com/i-m-arul/cricketstudio-okf + +## License + +- `spec/`, `metrics/`, `dossier/`, `stories/`, `README.md`, `index.md` — CC-BY-4.0 +- `sources/cricsheet.md` — documents data licensed CC BY 3.0 (Cricsheet) +- Entity example files — CC-BY-4.0 (labeled examples only; no real player data) diff --git a/okf/bundles/cricket/dossier/example-agent-pattern.md b/okf/bundles/cricket/dossier/example-agent-pattern.md new file mode 100644 index 0000000..01b881a --- /dev/null +++ b/okf/bundles/cricket/dossier/example-agent-pattern.md @@ -0,0 +1,81 @@ +--- +type: dossier +title: Who leads T20 powerplay batting strike rate? +description: Annotated example dossier file for the Cricket OKF domain bundle. Demonstrates the verified Q&A pattern format for AI agents — user question, correct answer pattern, data table, citation behavior, and caveats. Replace values with real sourced data. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: public_open_data +entity_id: example:dossier:t20-powerplay-sr-leader +resource: https://example.org/leaderboards/powerplay-strike-rate +canonical_page: https://example.org/leaderboards/powerplay-strike-rate +provenance: + source: Cricsheet CC BY 3.0 · example competition 2023–2025 · 50 matches + confidence: high + snapshot: example-dataset-2025-12-01 +tags: + - cricket + - dossier + - powerplay + - batting + - strike-rate +--- + +## User Question + +> Who has the best powerplay batting strike rate in [competition]? + +## Correct Answer Pattern + +> **[Player Name]** ([Team]) leads [Competition] powerplay batting strike rate at **[SR] SR** from **[N] powerplay balls** ([Runs] runs, [X] × 6s) across [Competition] [seasons]. Floor: ≥60 powerplay balls faced. Source: [Data source] · snapshot [date]. + +Replace bracketed values with real sourced data before publishing. + +--- + +## How a Dossier File Works + +A **dossier** file teaches AI agents how to answer a class of cricket questions correctly: + +1. **User Question** — a real question agents are asked +2. **Correct Answer Pattern** — the verified answer with all required scope and citation elements +3. **Data Table** — the backing data (top 3–10 entries) +4. **Citation Behavior** — what scope and floor to state in any citation +5. **Caveats** — what the data does and does not show + +--- + +## Example Data Table (Replace with real sourced data) + +| Rank | Player | Team | PP SR | PP Balls | PP Runs | 6s | +|------|--------|------|-------|----------|---------|-----| +| 1 | Player A | Team X | 194.3 | 123 | 239 | 16 | +| 2 | Player B | Team Y | 190.3 | 72 | 137 | 8 | +| 3 | Player C | Team Z | 188.0 | 225 | 423 | 31 | + +Floor: ≥60 powerplay balls faced. Source: [Data source] · [seasons] · snapshot [date]. + +--- + +## Citation Behavior + +State "[Competition] powerplay (overs 1–6) · [seasons] · floor ≥60 PP balls · [License]" with any powerplay SR claim. + +--- + +## Caveats + +- The leader on rate may not have the largest sample. Compare both rate and sample size. +- Floor is ≥60 powerplay balls — players who bat lower in the order may not qualify. +- Phase stats are more variable than career aggregates — sample matters more here. +- [Competition] [Year] data not included until after the season ends and the dataset is refreshed. + +--- + +## Agent Non-Negotiables + +- Never rank a player below the ≥60 PP balls floor. +- Always state the competition, season, and floor in the citation. +- If the player asked about is sub-floor, say "insufficient data for ranking" — do not invent a position. +- Do not compare across competitions (e.g., IPL PP SR vs MLC PP SR) without declaring both datasets and the floor applied. diff --git a/okf/bundles/cricket/index.md b/okf/bundles/cricket/index.md new file mode 100644 index 0000000..55531fc --- /dev/null +++ b/okf/bundles/cricket/index.md @@ -0,0 +1,88 @@ +--- +type: index +title: Cricket Domain Bundle for Google OKF +description: A cricket domain example bundle for Google Open Knowledge Format (OKF) v0.1. Demonstrates type vocabulary, provenance convention, sample-size doctrine, metric definitions, entity files, and agent Q&A patterns for cricket knowledge. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +tags: + - cricket + - okf + - domain-example + - google-okf +--- + +## What Is This? + +This is a cricket domain example bundle for [Google OKF v0.1](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md). + +Cricket is one of the world's most data-rich sports, with complex metrics (batting strike rate, bowling economy, death-overs phase splits), multi-format competitions (T20, ODI, Test), and 150+ years of records. It is a strong test case for structured knowledge representation. + +This bundle shows how the open `type` field, extended frontmatter, and a domain provenance convention can be used to make cricket knowledge portable, citable, and agent-readable. + +--- + +## Cricket OKF Type Vocabulary + +This bundle uses 15 cricket-specific `type` values: + +| Type | Description | +|------|-------------| +| `player` | An individual cricket player | +| `team` | A cricket franchise or national team | +| `venue` | A cricket ground or stadium | +| `match` | A single cricket match | +| `league` | A cricket competition (IPL, MLC, BBL) | +| `season` | A single edition of a competition | +| `metric` | A cricket metric definition | +| `methodology` | An operational rule or doctrine | +| `research` | An analytical report | +| `dossier` | A verified Q&A pattern for AI agents | +| `record` | An all-time or historical record | +| `leaderboard` | A ranked list for a specific metric | +| `source` | A data source declaration | +| `index` | A directory or category index | +| `spec` | A formal specification document | + +See [spec/types.md](./spec/types.md) for definitions and example frontmatter for each type. + +--- + +## Key Domain Conventions + +### Provenance + +Cricket data comes from multiple sources with different licenses. Every data-bearing file declares `source_boundary`: + +- `public_open_data` — Cricsheet CC BY 3.0 (ball-by-ball open data) +- `derived_claims_only` — licensed feed derived claims +- `methodology_only` — formulas and rules, no data + +See [spec/provenance.md](./spec/provenance.md). + +### Sample-Size Floors + +Cricket rankings are meaningless below minimum data thresholds. This bundle defines: + +- ≥30 balls faced (batting aggregate) +- ≥60 balls faced (phase stats: powerplay/middle/death) +- ≥15 balls bowled (bowling aggregate) +- ≥5 deliveries (head-to-head) +- ≥3 matches (venue stats) + +See [spec/sample-size.md](./spec/sample-size.md). + +### Metric Files + +Every metric has a canonical definition file with: formula, required inputs, valid scope, sample-size floor, ranking rule, edge cases, limitations, example calculation, and citation guidance. + +See [metrics/batting-strike-rate.md](./metrics/batting-strike-rate.md). + +--- + +## Live Implementation + +Full reference implementation: **https://okf.cricketstudio.ai** +GitHub: https://github.com/i-m-arul/cricketstudio-okf diff --git a/okf/bundles/cricket/metrics/batting-strike-rate.md b/okf/bundles/cricket/metrics/batting-strike-rate.md new file mode 100644 index 0000000..aed9d89 --- /dev/null +++ b/okf/bundles/cricket/metrics/batting-strike-rate.md @@ -0,0 +1,85 @@ +--- +type: metric +title: Batting Strike Rate +description: Runs scored per 100 balls faced. Core T20 batting efficiency metric. Required floor — 30 balls (aggregate), 60 balls (phase). +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +entity_id: example:metric:batting-strike-rate +resource: https://okf.cricketstudio.ai/metrics/batting-strike-rate +tags: + - cricket + - metric + - batting + - strike-rate +--- + +## Definition + +Batting Strike Rate measures how many runs a batter scores per 100 balls faced. It is the primary efficiency metric for T20 batting — higher is better, all else equal. + +## Formula + +``` +Strike Rate = (Runs scored ÷ Balls faced) × 100 +``` + +## Required Inputs + +- `runs`: integer — total runs scored (boundaries + dot runs; excludes byes, leg byes) +- `balls_faced`: integer — legal deliveries received (excludes wides; no-balls count as balls faced) + +## Valid Scope + +Applicable to T20 and T20I formats. Also used in ODI and Test cricket but with different typical ranges. Do not compare T20 SR directly with ODI SR — the context and typical values differ significantly. + +Phase-specific scope: powerplay (overs 1–6), middle overs (7–15), death overs (16–20). + +## Sample-Size Floor + +- Aggregate (career or season): ≥ **30 balls faced** +- Phase-specific SR: ≥ **60 balls faced in that phase** + +Players below floor must not appear in ranked lists. Disclose sample size in any citation. + +## Ranking Rule + +Higher strike rate = better. Ranked descending. + +Note: SR alone does not account for match situation or wicket value. Context metrics (boundary %, dot-ball %, average) should be cited alongside SR for full batting assessment. + +## Edge Cases + +- A batter who faces 0 balls has no strike rate (undefined, not 0). +- Extras (wides, leg byes, byes) do not count in the batter's balls faced. +- Retired hurt: balls faced count; the innings is marked incomplete. +- Super overs: typically included in aggregate counts unless explicitly scoped to regular innings. + +## Known Limitations + +- Does not account for match situation, required run rate, or wicket value. +- Cross-era comparison unreliable without context (IPL 2008 average SR was ~125; IPL 2026 average SR is ~145+). +- A very high SR from a small sample (e.g., 5 balls) is not a meaningful ranking. + +## Example Calculation + +A batter scores 180 runs from 100 balls in a T20 tournament. + +``` +Strike Rate = (180 ÷ 100) × 100 = 180.0 +``` + +This batter qualifies for ranking (≥30 balls). At 100 balls, the sample is robust. + +## Citation Guidance + +When citing a batting strike rate ranking: + +1. State competition and season (e.g., MLC 2023–2025, all-time). +2. State the floor (≥30 balls / ≥60 balls for phase). +3. Link to this metric definition. +4. State the dataset snapshot version. + +Example: "MLC all-time powerplay batting SR (floor ≥60 PP balls, 2023–2025, Cricsheet CC BY 3.0 snapshot 2026-06-20)." diff --git a/okf/bundles/cricket/metrics/bowling-economy.md b/okf/bundles/cricket/metrics/bowling-economy.md new file mode 100644 index 0000000..2b29171 --- /dev/null +++ b/okf/bundles/cricket/metrics/bowling-economy.md @@ -0,0 +1,82 @@ +--- +type: metric +title: Bowling Economy Rate +description: Runs conceded per 6 balls (per over). Core T20 bowling efficiency metric. Required floor — 15 balls bowled (aggregate), 30 balls (phase). +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +entity_id: example:metric:bowling-economy +resource: https://okf.cricketstudio.ai/metrics/bowling-economy +tags: + - cricket + - metric + - bowling + - economy +--- + +## Definition + +Bowling Economy Rate measures how many runs a bowler concedes per over (6 balls). It is the primary efficiency metric for T20 bowling — lower is better, all else equal. + +## Formula + +``` +Economy = (Runs conceded ÷ Balls bowled) × 6 +``` + +## Required Inputs + +- `runs_conceded`: integer — total runs conceded (includes extras off the bowler: wides, no-balls; excludes byes and leg byes) +- `balls_bowled`: integer — legal deliveries bowled (excludes wides and no-balls, which are extras; legal balls count toward overs) + +Note: Wides and no-balls add runs to the bowler's economy but do not count as balls in the denominator unless the delivery is also a legal ball. Implementation must match the scoring system used in the source data (Cricsheet convention). + +## Valid Scope + +Applicable to T20, T20I, ODI, and List A cricket. Do not compare T20 economy with ODI economy directly — typical T20 economy is 7–10 RPO; typical ODI economy is 5–7 RPO. + +## Sample-Size Floor + +- Aggregate (career or season): ≥ **15 balls bowled** +- Phase-specific economy: ≥ **30 balls bowled in that phase** + +## Ranking Rule + +Lower economy = better. Ranked ascending. + +Economy is context-sensitive: a bowler bowling in the death overs is expected to have a higher economy than one bowling only in the powerplay. Phase-specific economy should always be cited with the phase. + +## Edge Cases + +- A bowler who bowls 0 legal balls has no economy (undefined, not 0). +- Super overs: typically included unless explicitly scoped to regular innings. +- Incomplete overs: partial over balls still count in the economy calculation. + +## Known Limitations + +- Does not account for wickets taken. A bowler with economy 9.0 and 5 wickets is more valuable than one with economy 7.0 and 0 wickets — economy alone does not capture this. +- Phase comparisons only meaningful within the same phase scope. +- High-economy spells in favorable match situations (defending large totals) may not indicate poor bowling. + +## Example Calculation + +A bowler concedes 45 runs from 30 balls: + +``` +Economy = (45 ÷ 30) × 6 = 9.0 RPO +``` + +30 balls = 5 overs. This bowler qualifies for ranking (≥15 balls). + +## Citation Guidance + +When citing a bowling economy ranking: + +1. State competition, season, and phase (e.g., death overs, IPL 2026). +2. State the floor (≥15 balls / ≥30 balls for phase). +3. Link to this metric definition. +4. State the dataset snapshot version. + +Example: "IPL 2026 death-overs bowling economy (floor ≥30 death balls, overs 17–20, snapshot 2026-06-18)." diff --git a/okf/bundles/cricket/metrics/death-overs-economy.md b/okf/bundles/cricket/metrics/death-overs-economy.md new file mode 100644 index 0000000..98baeee --- /dev/null +++ b/okf/bundles/cricket/metrics/death-overs-economy.md @@ -0,0 +1,88 @@ +--- +type: metric +title: Death Overs Economy +description: Bowling economy rate in the final overs of a T20 innings (overs 17–20, or 16–20). Phase-specific metric — floor is 30 death balls bowled. Higher-stakes phase with typically elevated economy across all bowlers. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +entity_id: example:metric:death-overs-economy +resource: https://okf.cricketstudio.ai/metrics/death-overs-economy +tags: + - cricket + - metric + - bowling + - death-overs + - phase +--- + +## Definition + +Death Overs Economy measures a bowler's runs conceded per over during the death-overs phase of a T20 innings. The death phase is the highest-pressure phase — batters take maximum risk and typical economy rates are elevated across all bowlers. + +## Phase Definition + +- **Death overs:** Overs 17–20 of a T20 innings (some implementations use overs 16–20) +- CricketStudio OKF convention: **overs 16–20** (the final 5 overs) +- Always state which phase definition is used when citing this metric + +## Formula + +``` +Death Economy = (Runs conceded in death overs ÷ Balls bowled in death overs) × 6 +``` + +## Required Inputs + +- `death_runs_conceded`: integer — runs conceded in the death phase only +- `death_balls_bowled`: integer — legal balls bowled in the death phase only + +## Valid Scope + +T20 and T20I formats only. Not applicable to ODI or Test cricket (no fixed death phase in longer formats). + +## Sample-Size Floor + +- Phase-specific: ≥ **30 balls bowled in death overs** (= 5 overs) +- This is a higher floor than the aggregate bowling economy (≥15 balls) because phase stats are more variable and need more data to stabilize. + +## Ranking Rule + +Lower economy = better. Ranked ascending within the death phase. + +Do not directly compare death-over economy with powerplay economy or full-innings economy — different phases have structurally different typical values. + +## Edge Cases + +- A bowler who bowls only 1 death over in an entire season has no rankable death economy. +- Super overs are not death overs — they are separate innings. +- If a match is rain-affected and overs 17–20 are not bowled, those balls do not contribute. + +## Known Limitations + +- Economy alone does not capture wicket-taking ability. A death bowler with economy 9.5 and 15 wickets may be more valuable than one at 8.0 and 3 wickets. +- Match situation affects death economy significantly — defending 220 vs. 150 produces different bowler incentives and batter aggression. +- Small samples are especially misleading in death overs — always apply the ≥30 ball floor. + +## Example Calculation + +A bowler concedes 72 runs from 42 death-over balls across a season: + +``` +Death Economy = (72 ÷ 42) × 6 = 10.3 RPO +``` + +42 balls = 7 overs. Qualifies for ranking (≥30 balls). + +## Citation Guidance + +When citing a death-overs economy ranking: + +1. State the phase definition used: overs 16–20 or 17–20. +2. State competition and season. +3. State the floor: ≥30 death balls bowled. +4. Link to this metric definition. +5. Note that economy in this phase is typically higher than full-innings economy. + +Example: "MLC all-time death-overs economy (overs 16–20, floor ≥30 death balls, 2023–2025, Cricsheet CC BY 3.0 snapshot 2026-06-20)." diff --git a/okf/bundles/cricket/players/example-t20-batter.md b/okf/bundles/cricket/players/example-t20-batter.md new file mode 100644 index 0000000..e1166f0 --- /dev/null +++ b/okf/bundles/cricket/players/example-t20-batter.md @@ -0,0 +1,74 @@ +--- +type: player +title: Example T20 Batter +description: Annotated example player file for the Cricket OKF domain bundle. Demonstrates frontmatter fields, provenance convention, external IDs, phase splits, and canonical page linking. Replace with a real player using sourced data. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: public_open_data +entity_id: example:player:t20-batter-001 +resource: https://example.org/players/t20-batter-001 +canonical_page: https://example.org/players/t20-batter-001 +provenance: + source: Cricsheet CC BY 3.0 · example competition 2023–2025 · 50 matches + confidence: high + snapshot: example-dataset-2025-12-01 +tags: + - cricket + - player + - batter +aliases: + - T20 Batter + - Example Batter + - EB +same_as: + cricsheet: example_t20_batter_001 + wikidata: Q000000001 +related: + - ../teams/example-t20-team.md + - ../metrics/batting-strike-rate.md +--- + +## Summary + +**Example T20 Batter** is a right-handed opening batter who plays for Example T20 Team. This file is an annotated example — it demonstrates the Cricket OKF player file format using placeholder values. Real player files are built from sourced data (e.g., Cricsheet CC BY 3.0). + +For current computed claims, use the canonical page: https://example.org/players/t20-batter-001 + +--- + +## Career Batting (Example Competition, 2023–2025) + +| Metric | Value | Floor | Notes | +|--------|-------|-------|-------| +| Matches | 42 | — | | +| Innings | 40 | — | | +| Runs | 1,240 | — | | +| Balls faced | 920 | — | | +| Strike Rate | 134.8 | ≥30 balls | Qualifies | +| Average | 31.0 | — | | +| High Score | 87 | — | | +| Fours | 98 | — | | +| Sixes | 42 | — | | + +Source: Example data for illustration. Replace with real Cricsheet-sourced values. + +--- + +## Phase Splits (Powerplay, overs 1–6) + +| Metric | Value | Floor | +|--------|-------|-------| +| Balls faced (PP) | 310 | ≥60 | +| Runs (PP) | 450 | — | +| Strike Rate (PP) | 145.2 | Qualifies | + +--- + +## Agent Guidance + +- Do not use this file's values for any real cricket claim — this is an annotated example. +- For real OKF player files, every statistic must trace to a declared source with a dataset version. +- The `entity_id` must be unique across the bundle; same-name players are disambiguated by differentiating the slug. +- Do not infer this player's identity from the title — always resolve via `entity_id` or `same_as.cricsheet`. diff --git a/okf/bundles/cricket/sources/cricsheet.md b/okf/bundles/cricket/sources/cricsheet.md new file mode 100644 index 0000000..29436c8 --- /dev/null +++ b/okf/bundles/cricket/sources/cricsheet.md @@ -0,0 +1,91 @@ +--- +type: source +title: Cricsheet — CC BY 3.0 Open Cricket Data +description: Open ball-by-ball cricket match data. Primary open-data source for IPL historical (2007/08–2025) and Major League Cricket (MLC 2023–2025) in the CricketStudio OKF bundle. License — Creative Commons Attribution 3.0 (CC BY 3.0). +status: active +last_verified: 2026-06-22 +license: CC-BY-3.0 +source_system: Cricsheet +source_boundary: public_open_data +entity_id: example:source:cricsheet +resource: https://cricsheet.org +canonical_page: https://cricsheet.org +tags: + - cricket + - source + - open-data + - cricsheet +--- + +## What Is Cricsheet? + +[Cricsheet](https://cricsheet.org) is an open cricket data project that publishes ball-by-ball data for international and domestic cricket matches in YAML and JSON formats. It is the primary open-data source for cricket analytics. + +## License + +**Creative Commons Attribution 3.0 Unported (CC BY 3.0)** + +You are free to: +- Share — copy and redistribute the material in any medium or format +- Adapt — remix, transform, and build upon the material for any purpose + +Under the following terms: +- **Attribution** — You must give appropriate credit to Cricsheet (https://cricsheet.org) and the data contributors. + +Full license: https://creativecommons.org/licenses/by/3.0/ + +## Attribution Requirement + +When publishing OKF files derived from Cricsheet data, include attribution in the provenance field: + +```yaml +provenance: + source: Cricsheet CC BY 3.0 · [competition] · [N] matches · [snapshot date] +``` + +And in any published document or dataset README: + +> Ball-by-ball data from Cricsheet (https://cricsheet.org), CC BY 3.0. + +## What Is Included + +Cricsheet publishes data for (among others): +- Indian Premier League (IPL) — historical seasons +- Major League Cricket (MLC) — 2023, 2024, 2025 +- T20 Internationals +- ODI Internationals +- Test matches +- Various domestic T20 leagues + +Coverage varies by competition and season. Check cricsheet.org for current coverage. + +## What Is NOT Included + +- Live or real-time ball-by-ball data +- Current-season data before Cricsheet publishes it +- Full BCCI-licensed IPL 2026 data (the current season may not be on Cricsheet yet) + +## Allowed Use in OKF Bundles + +- Publish derived claims (aggregated statistics, rankings, averages) with attribution. +- Link to Cricsheet as the source. +- Include metadata and methodology derived from the data. + +**Do not:** +- Reproduce raw Cricsheet YAML/JSON files verbatim in the OKF bundle without checking current Cricsheet terms. +- Claim data as your own without attribution. +- Use Cricsheet data to imply endorsement by Cricsheet. + +## Source Boundary Declaration + +Files derived from Cricsheet must use: + +```yaml +source_boundary: public_open_data +license: CC-BY-3.0 +``` + +## Related + +- [Cricsheet website](https://cricsheet.org) +- [CC BY 3.0 license](https://creativecommons.org/licenses/by/3.0/) diff --git a/okf/bundles/cricket/spec/provenance.md b/okf/bundles/cricket/spec/provenance.md new file mode 100644 index 0000000..2678350 --- /dev/null +++ b/okf/bundles/cricket/spec/provenance.md @@ -0,0 +1,90 @@ +--- +type: spec +title: Cricket OKF Provenance Convention +description: How to declare source, boundary, confidence, and freshness in a cricket OKF file. Google OKF v0.1 has no provenance fields — this convention adds them for cricket data trust and AI-safe citation. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +tags: + - cricket + - okf + - provenance + - licensing +--- + +## Why Cricket OKF Needs Provenance + +Google OKF v0.1 is intentionally minimal — one required field (`type`) and five recommended fields. For cricket data, this is not enough. + +Cricket claims are only trustworthy when they include: +- **Source** — which dataset and license +- **Boundary** — what redistribution is permitted +- **Confidence** — how reliable the data is +- **Freshness** — when it was last verified + +Without these, an AI agent cannot safely cite a claim, a journalist cannot publish it, and a developer cannot trust it. + +--- + +## Provenance Fields + +### `provenance` block (required for data-bearing files) + +```yaml +provenance: + source: Cricsheet CC BY 3.0 · example competition 2024 · 50 matches + confidence: high # high | medium | low + snapshot: example-dataset-2024-06-01 # optional + notes: Season 2024 only — earlier seasons not included. # optional +``` + +### `source_boundary` (required for all files) + +Declares the redistribution envelope: + +| Value | Meaning | +|-------|---------| +| `public_open_data` | From a publicly licensed open source (e.g., Cricsheet CC BY 3.0). Redistribution with attribution permitted. | +| `derived_claims_only` | Derived from a licensed feed. Raw feed not redistributed; derived claims and links only. | +| `methodology_only` | Formulas, rules, spec — no cricket data. | +| `manual_curated_knowledge` | Curated from public knowledge; no raw data redistribution. | + +### `last_verified` / `timestamp` + +ISO-8601 date when the content was last verified. Both field names are accepted: + +```yaml +last_verified: 2026-06-22 +timestamp: 2026-06-22 # Google OKF v0.1 recommended name +``` + +### `license` + +SPDX identifier or plain string: + +```yaml +license: CC-BY-4.0 # for methodology, spec, curated content +license: CC-BY-3.0 # for Cricsheet-derived content +``` + +--- + +## Decision Rule for source_boundary + +``` +Data from Cricsheet? → public_open_data +Data from a licensed third-party feed? → derived_claims_only +Formulas / methodology / spec? → methodology_only +Curated from public knowledge? → manual_curated_knowledge +``` + +--- + +## Non-Negotiables + +- Never declare `public_open_data` for data from a licensed feed. +- Never omit `last_verified` from a data-bearing file. +- Never cite generated prose as source evidence. +- When uncertain about confidence, use `medium`, not `high`. diff --git a/okf/bundles/cricket/spec/sample-size.md b/okf/bundles/cricket/spec/sample-size.md new file mode 100644 index 0000000..39d390c --- /dev/null +++ b/okf/bundles/cricket/spec/sample-size.md @@ -0,0 +1,71 @@ +--- +type: spec +title: Cricket OKF Sample-Size Doctrine +description: Minimum data thresholds before a cricket claim is valid for ranking or comparison. Defines floors for batting, bowling, phase, H2H, and venue. Essential for AI agents — never rank below floor. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +tags: + - cricket + - okf + - sample-size + - methodology +--- + +## Why Floors Exist + +A batter who scores 50 from 10 balls has a strike rate of 500. A bowler who takes 3 wickets from 12 balls has an economy of 15. Neither of these is meaningful as a ranking — both are small-sample artefacts. + +Sample-size floors protect the credibility of ranked claims. + +--- + +## Standard Cricket OKF Sample-Size Floors + +| Context | Floor | +|---------|-------| +| Batting aggregate (career or season) | ≥ 30 balls faced | +| Batting — powerplay phase (overs 1–6) | ≥ 60 balls faced in phase | +| Batting — middle overs phase (overs 7–15) | ≥ 60 balls faced in phase | +| Batting — death overs phase (overs 16–20) | ≥ 60 balls faced in phase | +| Bowling aggregate (career or season) | ≥ 15 balls bowled | +| Bowling — any phase | ≥ 30 balls bowled in phase | +| Head-to-head (batter vs bowler) | ≥ 5 deliveries faced | +| Venue stat (chase rate, toss tendency) | ≥ 3 matches at venue | + +--- + +## How to Declare a Floor + +In metric files: + +```markdown +## Sample-Size Floor + +Minimum 30 balls faced (aggregate). For phase-specific strike rate: minimum 60 balls in that phase. +Players below floor must not appear in ranked lists. +``` + +In leaderboard files (frontmatter): + +```yaml +provenance: + notes: "Floor: ≥30 powerplay balls faced." +``` + +--- + +## What an Agent Must Do + +1. Always check the declared floor before citing a ranked claim. +2. Never present a sub-floor player as a ranked result. +3. If a player is below floor, say "insufficient data for ranking" — do not invent a position. +4. When citing any ranking, reproduce the floor in the citation. + +--- + +## Cross-Competition Floor Rule + +When comparing players across competitions, the floor applies **per competition**. Do not aggregate balls across IPL and MLC to meet a floor for either. diff --git a/okf/bundles/cricket/spec/types.md b/okf/bundles/cricket/spec/types.md new file mode 100644 index 0000000..3066ce2 --- /dev/null +++ b/okf/bundles/cricket/spec/types.md @@ -0,0 +1,170 @@ +--- +type: spec +title: Cricket OKF Type Vocabulary +description: Canonical cricket type values for use with Google OKF v0.1. Defines 20 cricket-specific types, their purpose, and example frontmatter. Extends Google OKF's open type system without forking it. +status: active +last_verified: 2026-06-23 +license: CC-BY-4.0 +source_system: CricketStudio +source_boundary: methodology_only +tags: + - cricket + - okf + - types + - vocabulary +--- + +## Overview + +Google OKF v0.1 uses an open `type` system — values are not centrally registered, and consumers must tolerate unknown types. This document defines the cricket domain vocabulary: 20 `type` values that cricket OKF bundles should use for interoperability. + +These are **recommendations**, not a closed enum. A producer may use additional cricket-specific types as needed. + +--- + +## Cricket Type Vocabulary + +### Entity Types + +``` +player — An individual cricket player +team — A cricket franchise or national team +venue — A cricket ground or stadium +match — A single cricket match +league — A cricket competition (IPL, MLC, BBL, PSL) +season — A single edition of a competition +``` + +### Knowledge Types + +``` +metric — A cricket metric definition (formula, floor, limitations) +methodology — An operational rule or doctrine (sample-size, ranking eligibility) +research — An analytical report or investigation +dossier — A verified Q&A pattern for AI agents +story — A provenance-backed cricket narrative (scope, sample size, and "what it doesn't say" required) +spec — A formal specification document +source — A data source declaration (license, boundary) +record — An all-time or historical record +leaderboard — A ranked list for a specific metric and scope +claim — A single isolated, citable cricket assertion with full provenance +runbook — An operational procedure for data refresh, dispute resolution, or maintenance +reference — An external resource pointer (API endpoint, dataset, third-party tool) +api — An API endpoint descriptor with request/response schema +``` + +### Navigation Types + +``` +index — A directory or category overview +``` + +--- + +## Example Frontmatter by Type + +### player + +```yaml +--- +type: player +title: Example T20 Batter +description: Cricket OKF player concept. Links to canonical page for live computed claims. +status: active +last_verified: 2026-06-22 +license: CC-BY-3.0 +source_system: ExampleCricketData +source_boundary: public_open_data +entity_id: example:player:t20-batter-001 +canonical_page: https://example.org/players/t20-batter-001 +resource: https://example.org/players/t20-batter-001 +provenance: + source: Cricsheet CC BY 3.0 + confidence: high +tags: + - cricket + - player +same_as: + cricsheet: t20_batter_001 +--- +``` + +### metric + +```yaml +--- +type: metric +title: Batting Strike Rate +description: Runs scored per 100 balls faced. Core T20 batting efficiency metric. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: CricketAnalytics +source_boundary: methodology_only +entity_id: example:metric:batting-strike-rate +resource: https://example.org/metrics/batting-strike-rate +tags: + - cricket + - metric + - batting +--- +``` + +### dossier + +```yaml +--- +type: dossier +title: Who leads T20 powerplay batting strike rate? +description: Verified agent answer pattern for powerplay SR leader. Includes scope, floor, and citation behavior. +status: active +last_verified: 2026-06-23 +license: CC-BY-4.0 +source_system: CricketAnalytics +source_boundary: public_open_data +entity_id: example:dossier:t20-powerplay-sr-leader +canonical_page: https://example.org/leaderboards/powerplay-strike-rate +resource: https://example.org/leaderboards/powerplay-strike-rate +provenance: + source: Cricsheet CC BY 3.0 · example competition 2024 + confidence: high +tags: + - cricket + - dossier + - powerplay +--- +``` + +### story + +```yaml +--- +type: story +title: Example Cricket Story +description: A provenance-backed cricket narrative. Must include scope, sample size, what the data says, and what it doesn't say. +status: active +last_verified: 2026-06-23 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: derived_claims_only +entity_id: example:story:t20-toss-effect +canonical_page: https://example.org/stories/t20-toss-effect +resource: https://example.org/stories/t20-toss-effect +provenance: + source: Cricsheet CC BY 3.0 · example competition 2019–2025 · 500 matches + confidence: high + dataset_version: 2025-12-01 + notes: All figures derived from Cricsheet open data. No licensed feed data included. +tags: + - cricket + - story + - toss + - T20 +--- +``` + +--- + +## Disambiguation Rule + +The same `type` value (e.g., `metric`) may appear in files across different directories. Disambiguation is via `entity_id`, not filename. Consumers should always resolve entities by `entity_id`, not by slug or path alone. diff --git a/okf/bundles/cricket/stories/example-cricket-story.md b/okf/bundles/cricket/stories/example-cricket-story.md new file mode 100644 index 0000000..5aec332 --- /dev/null +++ b/okf/bundles/cricket/stories/example-cricket-story.md @@ -0,0 +1,100 @@ +--- +type: story +title: Example Cricket Story — The Toss Effect +description: Annotated example story file for the Cricket OKF domain bundle. Demonstrates the narrative layer format — hook, data, wow, limitations, and related concepts — backed by provenance. Replace with real sourced data. +status: active +last_verified: 2026-06-23 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: derived_claims_only +entity_id: example:story:t20-toss-effect +canonical_page: https://example.org/stories/t20-toss-effect +resource: https://example.org/stories/t20-toss-effect +provenance: + source: Cricsheet CC BY 3.0 · example T20 competition 2019–2025 · 500 matches + confidence: high + dataset_version: 2025-12-01 + notes: All figures derived from Cricsheet open data. No licensed feed data included. Replace with real dataset reference before publishing. +tags: + - cricket + - story + - toss + - T20 + - venue +related: + - ../metrics/bowling-economy.md + - ../spec/sample-size.md + - ../venues/example-cricket-ground.md + - ../dossier/example-agent-pattern.md +--- + +## The Question Nobody Asked + +Does winning the toss actually matter in T20 cricket — or are teams just choosing what they'd already decided? + +--- + +## What the Data Says + +Across [N] matches in [Example T20 Competition] from [Year] to [Year]: + +| Finding | Value | Sample | +|---------|-------|--------| +| Toss winners' win rate | [X]% | [N] matches | +| Bowl-first decisions | [X]% of toss wins | [N] decisions | +| First-innings average score | [X] | [N] innings | +| Second-innings average score | [X] | [N] innings | +| Chase success rate | [X]% | [N] chases | + +At [Example Ground], the split is more pronounced: [X]% of toss winners chose to bowl first, but the first-innings average ([X]) exceeds the second-innings average ([X]). + +Floor: Minimum [N] matches per venue before venue-level claims are cited. + +Source: Cricsheet CC BY 3.0 · [Competition] [Years] · snapshot [Date]. Replace bracketed values with real figures before publishing. + +--- + +## The Wow + +Teams overwhelmingly choose to bowl first at [Example Ground] — but the team batting first scores more runs on average. The herd consensus and the data point in opposite directions. + +--- + +## What It Doesn't Say + +- This does not prove batting first always wins — pitch conditions vary match to match. +- Toss winners who bowl first and lose may still be making the right decision given information they had at toss time (pitch inspection, weather forecast, squad composition). +- A larger sample (≥30 matches per venue) is needed before venue-level claims are cited as reliable. +- This analysis covers [Competition] only — findings may not transfer to other formats or venues. + +--- + +## How to Use This File Format + +A `story` type file in Cricket OKF is a **provenance-backed cricket narrative**. It is not a raw data dump or a match report. It must: + +1. **State a clear question** — the hook that frames the analysis. +2. **Show the data** — real numbers with source, scope, and sample size. +3. **State the wow** — the counterintuitive or surprising finding in plain English. +4. **Disclose limitations** — what the data doesn't say, can't say, or leaves open. +5. **Link related concepts** — metric definitions, venue files, dossier patterns. + +The `source_boundary: derived_claims_only` flag signals that this file contains derived analysis, not raw feed data. The `provenance:` block is mandatory for all `story` type files. + +--- + +## Agent Guidance + +- Do not use the placeholder values in this file for any real cricket claim — this is an annotated example. +- When citing a real OKF story file, include: the story URL, the dataset version, the competition, the sample size, and the stated limitation. +- Do not cross-compare findings from different stories without declaring both datasets and their scope. +- If a user asks for a stat that appears in a story file but was not stated with a floor, do not infer it qualifies — check the methodology files. + +--- + +## Related Concepts + +- [Bowling Economy](../metrics/bowling-economy.md) — metric definition with formula, floor, and limitations +- [Sample-Size Doctrine](../spec/sample-size.md) — minimum data thresholds before claims are valid +- [Example Cricket Ground](../venues/example-cricket-ground.md) — venue entity with innings averages +- [Example Agent Pattern](../dossier/example-agent-pattern.md) — how agents should answer powerplay SR questions diff --git a/okf/bundles/cricket/teams/example-t20-team.md b/okf/bundles/cricket/teams/example-t20-team.md new file mode 100644 index 0000000..1e96cbf --- /dev/null +++ b/okf/bundles/cricket/teams/example-t20-team.md @@ -0,0 +1,52 @@ +--- +type: team +title: Example T20 Team +description: Annotated example team file for the Cricket OKF domain bundle. Demonstrates team frontmatter, season records, head-to-head structure, and canonical page linking. Replace with a real team using sourced data. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: public_open_data +entity_id: example:team:t20-team-001 +resource: https://example.org/teams/t20-team-001 +canonical_page: https://example.org/teams/t20-team-001 +provenance: + source: Cricsheet CC BY 3.0 · example competition 2023–2025 + confidence: high +tags: + - cricket + - team +aliases: + - ET Team + - Example Team +related: + - ../players/example-t20-batter.md +--- + +## Summary + +**Example T20 Team** competes in the Example T20 Competition (ETC). This is an annotated example demonstrating the Cricket OKF team file format. Real team files are built from sourced match results. + +For current computed claims, use the canonical page. + +--- + +## All-Time Record (Example Competition, 2023–2025) + +| Season | M | W | L | NR | Win % | +|--------|---|---|---|----|-------| +| 2023 | 8 | 5 | 3 | 0 | 62.5% | +| 2024 | 8 | 6 | 2 | 0 | 75.0% | +| 2025 | 10 | 7 | 3 | 0 | 70.0% | +| **All-time** | **26** | **18** | **8** | **0** | **69.2%** | + +Source: Example data for illustration. Replace with real Cricsheet-sourced match results. + +--- + +## Agent Guidance + +- Do not cite this file's values for any real cricket claim. +- For real OKF team files, every record must trace to declared match results with a dataset version. +- Team names change over time — use `aliases` for historical names; do not create a separate file. +- Season-specific records belong in `season` files; this file holds the all-time team record. diff --git a/okf/bundles/cricket/venues/example-cricket-ground.md b/okf/bundles/cricket/venues/example-cricket-ground.md new file mode 100644 index 0000000..32787d0 --- /dev/null +++ b/okf/bundles/cricket/venues/example-cricket-ground.md @@ -0,0 +1,49 @@ +--- +type: venue +title: Example Cricket Ground +description: Annotated example venue file for the Cricket OKF domain bundle. Demonstrates venue frontmatter, match stats, toss tendency, and canonical page linking. Replace with a real venue using sourced data. +status: active +last_verified: 2026-06-22 +license: CC-BY-4.0 +source_system: ExampleCricketData +source_boundary: public_open_data +entity_id: example:venue:example-cricket-ground +resource: https://example.org/venues/example-cricket-ground +canonical_page: https://example.org/venues/example-cricket-ground +provenance: + source: Cricsheet CC BY 3.0 · 30 matches at this venue + confidence: high +tags: + - cricket + - venue +aliases: + - Example Ground + - ECG +--- + +## Summary + +**Example Cricket Ground** is an outdoor T20 cricket venue located in Example City. This is an annotated example demonstrating the Cricket OKF venue file format. Real venue files require ≥3 matches at the venue before publishing stats. + +--- + +## Venue Stats (30 matches, 2023–2025) + +| Metric | Value | Notes | +|--------|-------|-------| +| Matches | 30 | ≥3 floor: qualifies | +| First-innings avg score | 162.4 | 30 first innings | +| Second-innings avg score | 148.2 | 30 chases | +| Chase success rate | 43.3% | 13 wins in 30 chases | +| Toss winner bats first | 58% | 17 of 30 toss winners | + +Source: Example data for illustration. + +--- + +## Agent Guidance + +- Minimum 3 matches at a venue before publishing stats — state the match count. +- Venue names are often ambiguous (same city, multiple grounds). Always include the full ground name and city in the `title` and `aliases`. +- Chase success rate is computed from complete matches only (no-results excluded). +- Do not compare this venue's stats with another venue from a different competition without declaring the scope difference.