Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions okf/bundles/cricket/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Cricket Domain Example Bundle for Google OKF

This directory contains a standalone cricket domain example bundle demonstrating the Cricket OKF profile on top of [Google Open Knowledge Format (OKF) v0.1](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md).

## What this bundle demonstrates

- A **cricket type vocabulary** extending Google OKF's open `type` system with 20 cricket-specific values
- A **provenance convention** adding source traceability, license declaration, and freshness fields that Google OKF v0.1 leaves undefined
- A **sample-size doctrine** defining minimum data thresholds before cricket rankings are valid
- **Metric definition files** (batting strike rate, bowling economy, death-overs economy) showing the standard format for citable cricket metrics
- **Entity files** (player, team, venue) with canonical URLs, external IDs, and relationship links
- A **dossier file** showing a verified Q&A pattern for AI agents
- A **story file** showing the narrative layer format — provenance-backed cricket narrative with scope, sample size, and stated limitations

## File Structure

```
examples/cricket/
├── README.md
├── index.md # Bundle overview
├── spec/
│ ├── types.md # Cricket type vocabulary (20 types)
│ ├── provenance.md # Provenance convention
│ └── sample-size.md # Sample-size doctrine
├── metrics/
│ ├── batting-strike-rate.md # Metric definition
│ ├── bowling-economy.md # Metric definition
│ └── death-overs-economy.md # Phase metric definition
├── players/
│ └── example-t20-batter.md # Annotated example player entity
├── teams/
│ └── example-t20-team.md # Annotated example team entity
├── venues/
│ └── example-cricket-ground.md # Annotated example venue entity
├── dossier/
│ └── example-agent-pattern.md # Verified Q&A pattern for agents
├── stories/
│ └── example-cricket-story.md # Annotated example story (narrative layer)
└── sources/
└── cricsheet.md # Open data source declaration (CC BY 3.0)
```

## Live Reference Implementation

The full CricketStudio OKF bundle at **https://okf.cricketstudio.ai** is a CI-validated, 430+ file implementation of this profile, including:
- 65 IPL player profiles with phase splits (powerplay / middle / death)
- 10 metric definitions
- 37 dossier Q&A patterns
- 5 provenance-backed cricket stories (Journeys)
- 8 research reports

GitHub: https://github.com/i-m-arul/cricketstudio-okf

## License

- `spec/`, `metrics/`, `dossier/`, `stories/`, `README.md`, `index.md` — CC-BY-4.0
- `sources/cricsheet.md` — documents data licensed CC BY 3.0 (Cricsheet)
- Entity example files — CC-BY-4.0 (labeled examples only; no real player data)
81 changes: 81 additions & 0 deletions okf/bundles/cricket/dossier/example-agent-pattern.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
type: dossier
title: Who leads T20 powerplay batting strike rate?
description: Annotated example dossier file for the Cricket OKF domain bundle. Demonstrates the verified Q&A pattern format for AI agents — user question, correct answer pattern, data table, citation behavior, and caveats. Replace values with real sourced data.
status: active
last_verified: 2026-06-22
license: CC-BY-4.0
source_system: ExampleCricketData
source_boundary: public_open_data
entity_id: example:dossier:t20-powerplay-sr-leader
resource: https://example.org/leaderboards/powerplay-strike-rate
canonical_page: https://example.org/leaderboards/powerplay-strike-rate
provenance:
source: Cricsheet CC BY 3.0 · example competition 2023–2025 · 50 matches
confidence: high
snapshot: example-dataset-2025-12-01
tags:
- cricket
- dossier
- powerplay
- batting
- strike-rate
---

## User Question

> Who has the best powerplay batting strike rate in [competition]?
## Correct Answer Pattern

> **[Player Name]** ([Team]) leads [Competition] powerplay batting strike rate at **[SR] SR** from **[N] powerplay balls** ([Runs] runs, [X] × 6s) across [Competition] [seasons]. Floor: ≥60 powerplay balls faced. Source: [Data source] · snapshot [date].
Replace bracketed values with real sourced data before publishing.

---

## How a Dossier File Works

A **dossier** file teaches AI agents how to answer a class of cricket questions correctly:

1. **User Question** — a real question agents are asked
2. **Correct Answer Pattern** — the verified answer with all required scope and citation elements
3. **Data Table** — the backing data (top 3–10 entries)
4. **Citation Behavior** — what scope and floor to state in any citation
5. **Caveats** — what the data does and does not show

---

## Example Data Table (Replace with real sourced data)

| Rank | Player | Team | PP SR | PP Balls | PP Runs | 6s |
|------|--------|------|-------|----------|---------|-----|
| 1 | Player A | Team X | 194.3 | 123 | 239 | 16 |
| 2 | Player B | Team Y | 190.3 | 72 | 137 | 8 |
| 3 | Player C | Team Z | 188.0 | 225 | 423 | 31 |

Floor: ≥60 powerplay balls faced. Source: [Data source] · [seasons] · snapshot [date].

---

## Citation Behavior

State "[Competition] powerplay (overs 1–6) · [seasons] · floor ≥60 PP balls · [License]" with any powerplay SR claim.

---

## Caveats

- The leader on rate may not have the largest sample. Compare both rate and sample size.
- Floor is ≥60 powerplay balls — players who bat lower in the order may not qualify.
- Phase stats are more variable than career aggregates — sample matters more here.
- [Competition] [Year] data not included until after the season ends and the dataset is refreshed.

---

## Agent Non-Negotiables

- Never rank a player below the ≥60 PP balls floor.
- Always state the competition, season, and floor in the citation.
- If the player asked about is sub-floor, say "insufficient data for ranking" — do not invent a position.
- Do not compare across competitions (e.g., IPL PP SR vs MLC PP SR) without declaring both datasets and the floor applied.
88 changes: 88 additions & 0 deletions okf/bundles/cricket/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
type: index
title: Cricket Domain Bundle for Google OKF
description: A cricket domain example bundle for Google Open Knowledge Format (OKF) v0.1. Demonstrates type vocabulary, provenance convention, sample-size doctrine, metric definitions, entity files, and agent Q&A patterns for cricket knowledge.
status: active
last_verified: 2026-06-22
license: CC-BY-4.0
source_system: CricketStudio
source_boundary: methodology_only
tags:
- cricket
- okf
- domain-example
- google-okf
---

## What Is This?

This is a cricket domain example bundle for [Google OKF v0.1](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md).

Cricket is one of the world's most data-rich sports, with complex metrics (batting strike rate, bowling economy, death-overs phase splits), multi-format competitions (T20, ODI, Test), and 150+ years of records. It is a strong test case for structured knowledge representation.

This bundle shows how the open `type` field, extended frontmatter, and a domain provenance convention can be used to make cricket knowledge portable, citable, and agent-readable.

---

## Cricket OKF Type Vocabulary

This bundle uses 15 cricket-specific `type` values:

| Type | Description |
|------|-------------|
| `player` | An individual cricket player |
| `team` | A cricket franchise or national team |
| `venue` | A cricket ground or stadium |
| `match` | A single cricket match |
| `league` | A cricket competition (IPL, MLC, BBL) |
| `season` | A single edition of a competition |
| `metric` | A cricket metric definition |
| `methodology` | An operational rule or doctrine |
| `research` | An analytical report |
| `dossier` | A verified Q&A pattern for AI agents |
| `record` | An all-time or historical record |
| `leaderboard` | A ranked list for a specific metric |
| `source` | A data source declaration |
| `index` | A directory or category index |
| `spec` | A formal specification document |

See [spec/types.md](./spec/types.md) for definitions and example frontmatter for each type.

---

## Key Domain Conventions

### Provenance

Cricket data comes from multiple sources with different licenses. Every data-bearing file declares `source_boundary`:

- `public_open_data` — Cricsheet CC BY 3.0 (ball-by-ball open data)
- `derived_claims_only` — licensed feed derived claims
- `methodology_only` — formulas and rules, no data

See [spec/provenance.md](./spec/provenance.md).

### Sample-Size Floors

Cricket rankings are meaningless below minimum data thresholds. This bundle defines:

- ≥30 balls faced (batting aggregate)
- ≥60 balls faced (phase stats: powerplay/middle/death)
- ≥15 balls bowled (bowling aggregate)
- ≥5 deliveries (head-to-head)
- ≥3 matches (venue stats)

See [spec/sample-size.md](./spec/sample-size.md).

### Metric Files

Every metric has a canonical definition file with: formula, required inputs, valid scope, sample-size floor, ranking rule, edge cases, limitations, example calculation, and citation guidance.

See [metrics/batting-strike-rate.md](./metrics/batting-strike-rate.md).

---

## Live Implementation

Full reference implementation: **https://okf.cricketstudio.ai**
GitHub: https://github.com/i-m-arul/cricketstudio-okf
85 changes: 85 additions & 0 deletions okf/bundles/cricket/metrics/batting-strike-rate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
type: metric
title: Batting Strike Rate
description: Runs scored per 100 balls faced. Core T20 batting efficiency metric. Required floor — 30 balls (aggregate), 60 balls (phase).
status: active
last_verified: 2026-06-22
license: CC-BY-4.0
source_system: CricketStudio
source_boundary: methodology_only
entity_id: example:metric:batting-strike-rate
resource: https://okf.cricketstudio.ai/metrics/batting-strike-rate
tags:
- cricket
- metric
- batting
- strike-rate
---

## Definition

Batting Strike Rate measures how many runs a batter scores per 100 balls faced. It is the primary efficiency metric for T20 batting — higher is better, all else equal.

## Formula

```
Strike Rate = (Runs scored ÷ Balls faced) × 100
```

## Required Inputs

- `runs`: integer — total runs scored (boundaries + dot runs; excludes byes, leg byes)
- `balls_faced`: integer — legal deliveries received (excludes wides; no-balls count as balls faced)

## Valid Scope

Applicable to T20 and T20I formats. Also used in ODI and Test cricket but with different typical ranges. Do not compare T20 SR directly with ODI SR — the context and typical values differ significantly.

Phase-specific scope: powerplay (overs 1–6), middle overs (7–15), death overs (16–20).

## Sample-Size Floor

- Aggregate (career or season): ≥ **30 balls faced**
- Phase-specific SR: ≥ **60 balls faced in that phase**

Players below floor must not appear in ranked lists. Disclose sample size in any citation.

## Ranking Rule

Higher strike rate = better. Ranked descending.

Note: SR alone does not account for match situation or wicket value. Context metrics (boundary %, dot-ball %, average) should be cited alongside SR for full batting assessment.

## Edge Cases

- A batter who faces 0 balls has no strike rate (undefined, not 0).
- Extras (wides, leg byes, byes) do not count in the batter's balls faced.
- Retired hurt: balls faced count; the innings is marked incomplete.
- Super overs: typically included in aggregate counts unless explicitly scoped to regular innings.

## Known Limitations

- Does not account for match situation, required run rate, or wicket value.
- Cross-era comparison unreliable without context (IPL 2008 average SR was ~125; IPL 2026 average SR is ~145+).
- A very high SR from a small sample (e.g., 5 balls) is not a meaningful ranking.

## Example Calculation

A batter scores 180 runs from 100 balls in a T20 tournament.

```
Strike Rate = (180 ÷ 100) × 100 = 180.0
```

This batter qualifies for ranking (≥30 balls). At 100 balls, the sample is robust.

## Citation Guidance

When citing a batting strike rate ranking:

1. State competition and season (e.g., MLC 2023–2025, all-time).
2. State the floor (≥30 balls / ≥60 balls for phase).
3. Link to this metric definition.
4. State the dataset snapshot version.

Example: "MLC all-time powerplay batting SR (floor ≥60 PP balls, 2023–2025, Cricsheet CC BY 3.0 snapshot 2026-06-20)."
82 changes: 82 additions & 0 deletions okf/bundles/cricket/metrics/bowling-economy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
type: metric
title: Bowling Economy Rate
description: Runs conceded per 6 balls (per over). Core T20 bowling efficiency metric. Required floor — 15 balls bowled (aggregate), 30 balls (phase).
status: active
last_verified: 2026-06-22
license: CC-BY-4.0
source_system: CricketStudio
source_boundary: methodology_only
entity_id: example:metric:bowling-economy
resource: https://okf.cricketstudio.ai/metrics/bowling-economy
tags:
- cricket
- metric
- bowling
- economy
---

## Definition

Bowling Economy Rate measures how many runs a bowler concedes per over (6 balls). It is the primary efficiency metric for T20 bowling — lower is better, all else equal.

## Formula

```
Economy = (Runs conceded ÷ Balls bowled) × 6
```

## Required Inputs

- `runs_conceded`: integer — total runs conceded (includes extras off the bowler: wides, no-balls; excludes byes and leg byes)
- `balls_bowled`: integer — legal deliveries bowled (excludes wides and no-balls, which are extras; legal balls count toward overs)

Note: Wides and no-balls add runs to the bowler's economy but do not count as balls in the denominator unless the delivery is also a legal ball. Implementation must match the scoring system used in the source data (Cricsheet convention).

## Valid Scope

Applicable to T20, T20I, ODI, and List A cricket. Do not compare T20 economy with ODI economy directly — typical T20 economy is 7–10 RPO; typical ODI economy is 5–7 RPO.

## Sample-Size Floor

- Aggregate (career or season): ≥ **15 balls bowled**
- Phase-specific economy: ≥ **30 balls bowled in that phase**

## Ranking Rule

Lower economy = better. Ranked ascending.

Economy is context-sensitive: a bowler bowling in the death overs is expected to have a higher economy than one bowling only in the powerplay. Phase-specific economy should always be cited with the phase.

## Edge Cases

- A bowler who bowls 0 legal balls has no economy (undefined, not 0).
- Super overs: typically included unless explicitly scoped to regular innings.
- Incomplete overs: partial over balls still count in the economy calculation.

## Known Limitations

- Does not account for wickets taken. A bowler with economy 9.0 and 5 wickets is more valuable than one with economy 7.0 and 0 wickets — economy alone does not capture this.
- Phase comparisons only meaningful within the same phase scope.
- High-economy spells in favorable match situations (defending large totals) may not indicate poor bowling.

## Example Calculation

A bowler concedes 45 runs from 30 balls:

```
Economy = (45 ÷ 30) × 6 = 9.0 RPO
```

30 balls = 5 overs. This bowler qualifies for ranking (≥15 balls).

## Citation Guidance

When citing a bowling economy ranking:

1. State competition, season, and phase (e.g., death overs, IPL 2026).
2. State the floor (≥15 balls / ≥30 balls for phase).
3. Link to this metric definition.
4. State the dataset snapshot version.

Example: "IPL 2026 death-overs bowling economy (floor ≥30 death balls, overs 17–20, snapshot 2026-06-18)."
Loading