Skip to content

examples(migration_v5): fixture foundation for the four-guarantee MAKO notebook (#107 phase 1)#155

Draft
caohy1988 wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
caohy1988:feat/migration-v5-notebook
Draft

examples(migration_v5): fixture foundation for the four-guarantee MAKO notebook (#107 phase 1)#155
caohy1988 wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
caohy1988:feat/migration-v5-notebook

Conversation

@caohy1988
Copy link
Copy Markdown
Contributor

Draft PR — fixtures only. The four-guarantee notebook (`examples/migration_v5_demo_notebook.ipynb`) follows as a second commit on this branch once the fixture shape is settled.

Why this is a separate review unit

The risky design choices in #107 aren't the notebook prose — they're the fixture contracts:

  • 5-entity MAKO subset vs full 41-entity import
  • id as the primary key strategy
  • skos:notation sort / display-token behavior (intentionally NON-sorted on DecisionPoint to exercise the round-3 lex-min rule end-to-end)
  • Seeded event shape + whether it cleanly exercises all four guarantees
  • Reference extractor scope (only mako_decision is handwritten — `context_captured` stays AI-handled to demonstrate the Beat 3 contrast)

Surfacing these as a focused draft PR with a README.md makes fixture-design debate cheap to resolve here rather than buried inside 30 executed notebook cells.

What's in here

File Origin
`mako_core.ttl` User-authored — pulled from the gist. The real MAKO ontology.
`ontology.yaml` Auto-generated via `gm import-owl --include-namespace https://ontology.yahoo.com/mako/\`. Full 41 entities; 17 `FILL_IN` placeholders preserved.
`ontology_demo.yaml` Hand-curated 5-entity demo subset (`AgentSession`, `DecisionPoint`, `Candidate`, `SelectionOutcome`, `ContextSnapshot`). Validates clean.
`binding.yaml` + `table_ddl.sql` Auto-scaffolded via `gm scaffold`.
`property_graph.sql` User-authored CREATE PROPERTY GRAPH — Beat 1's "you own the graph definition" evidence.
`seed_events.py` Deterministic seeded RNG → 404 events × 50 sessions. Same seed → byte-identical output.
`reference_extractors.py` Handwritten extractor for `mako_decision` + `EXTRACTORS`/`RESOLVED_GRAPH`/`SPEC` contract for `bqaa-revalidate-extractors`.
`revalidation_thresholds.json` Beat 3 threshold gates.
`README.md` Design decisions, alternatives I rejected, per-beat coverage check, validation commands run. Read this first.

Validation commands already run (all four pass)

```bash
python -m bigquery_ontology.cli validate examples/migration_v5/ontology_demo.yaml
python -m bigquery_ontology.cli validate examples/migration_v5/binding.yaml \
--ontology examples/migration_v5/ontology_demo.yaml
PYTHONPATH=src:examples/migration_v5 python -c "import seed_events; print(len(seed_events.generate_events()))"
PYTHONPATH=src:examples/migration_v5 python -c "import seed_events, reference_extractors; ev = next(e for e in seed_events.generate_events() if e.event_type == 'mako_decision'); out = reference_extractors.extract_mako_decision_event(ev.to_dict(), None); print(len(out.nodes), len(out.edges))"
```

What this PR is NOT

  • The notebook itself — lands as a second commit on this branch once fixture shape is settled.
  • NODE/FIELD/EDGE synthetic failure fixtures for Beat 3.6 — tightly coupled to the cell that exercises them; lands with the notebook.
  • End-to-end execution against real BigQuery — the notebook commit will execute every cell and inline outputs.
  • `docs/README.md` / `CHANGELOG.md` entries — land with the notebook PR so the index points at a complete artifact.

Test plan

  • Two `gm validate` invocations pass clean.
  • Seed generator produces 404 events × 50 sessions deterministically.
  • Reference extractor on the first `mako_decision` event produces the expected MAKO shape (8 nodes, 7 edges).
  • (Pending review) Fixture-shape decisions in `README.md` get sign-off before notebook implementation begins.

Once this PR's fixture shape is approved (or adjusted), I'll add the notebook commit on the same branch and flip this draft to ready.

caohy1988 added 2 commits May 12, 2026 16:15
…ee notebook

Phase 1 of issue GoogleCloudPlatform#107: fixtures directory under
examples/migration_v5/ that the four-guarantee MAKO demo
notebook (landing in a follow-up commit) consumes.

Inputs from the user-authored side (the "what you actually
have to provide" floor — one ontology file):

* mako_core.ttl — the real MAKO ontology pulled from the
  reference gist (https://gist.github.com/haiyuan-eng-google/
  a69ff6282ebcc877f77f9aa4e3db1afd). Domain-agnostic
  decision semantics, agent coordination, outcome tracking
  per Yahoo Monetization Platform's design doc.

Generated / scaffolded artifacts (the SDK + plugin produce
these; the demo never asks the user to hand-author them):

* ontology.yaml — full 380-line auto-import via gm import-owl
  --include-namespace https://ontology.yahoo.com/mako/.
  Captures all 41 MAKO entities; the notebook displays this
  in Section 0 to show the realistic "import → resolve
  FILL_INs → curate" workflow.
* ontology_demo.yaml — hand-curated 5-entity demo subset
  (AgentSession, DecisionPoint, Candidate, SelectionOutcome,
  ContextSnapshot) with FILL_IN primary keys resolved to id
  per MAKO's "every artifact has a stable identifier"
  contract. Validates clean against gm validate. The other
  36 MAKO entities would scaffold the same way — narrowing
  keeps the notebook's four-guarantee story focused.
* binding.yaml — auto-scaffolded via gm scaffold --ontology
  ontology_demo.yaml --dataset migration_v5_demo --project
  test-project-0728-467323 --out scaffold/. Demonstrates
  the "one file in, two files out" minimum-input path the
  storyboard's Section 0.5 calls out.
* table_ddl.sql — companion to binding.yaml, also from gm
  scaffold.
* property_graph.sql — user-authored CREATE PROPERTY GRAPH
  for the demo subset. Beat 1's "you own the graph
  definition" evidence. Uses __DATASET__ placeholder for
  per-run dataset substitution.

Demo-specific Python fixtures (the notebook imports these):

* seed_events.py — deterministic seeded RNG generator
  producing 404 events across 50 sessions: each session
  contains 2-4 decision points; each decision evaluates
  3-5 candidates against a context snapshot and produces
  one outcome. Seeded RNG (_RANDOM_SEED = 20260512) so the
  notebook's outputs round-trip byte-identically across
  runs. Event shape mirrors the BQ AA plugin's payload so
  Beat 3 extractors see the same surface as production.
* reference_extractors.py — handwritten extractor for
  mako_decision events (1 DecisionPoint + 1
  SelectionOutcome + 1 ContextSnapshot reference + N
  Candidates + their edges per event), plus the exact
  EXTRACTORS / RESOLVED_GRAPH / SPEC module contract
  bqaa-revalidate-extractors requires. Mirrors the BKA
  decision pattern from
  bigquery_agent_analytics.structured_extraction.
* revalidation_thresholds.json — threshold gate values for
  bqaa-revalidate-extractors --thresholds-json: 95%
  compiled_unchanged_rate, 99% parity_match_rate, etc.

Validation done in this commit:

* gm validate examples/migration_v5/ontology_demo.yaml →
  clean.
* gm validate examples/migration_v5/binding.yaml --ontology
  examples/migration_v5/ontology_demo.yaml → clean.
* seed_events.py generates 404 events across 50 sessions
  (50 session-start + 50 session-end + 152 context + 152
  decision events).
* reference_extractors.extract_mako_decision_event() on the
  first seeded decision produces 8 nodes (1 DecisionPoint +
  1 Outcome + 1 Context + 5 Candidates) and 7 edges.

Next commit lands the notebook itself.
Surface the design decisions baked into the fixtures so the
fixture-shape review can happen before the notebook PR.
Per-decision rationale on: 5-entity MAKO subset, id-as-primary-key
strategy, NON-sorted skos:notation on DecisionPoint (exercises the
round-3 lex-min display-token rule end-to-end), seeded event shape
+ per-beat coverage check, reference extractor scope. Lists the
four validation commands already run + their pass status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant