examples(migration_v5): fixture foundation for the four-guarantee MAKO notebook (#107 phase 1)#155
Draft
caohy1988 wants to merge 2 commits into
Draft
Conversation
…ee notebook Phase 1 of issue GoogleCloudPlatform#107: fixtures directory under examples/migration_v5/ that the four-guarantee MAKO demo notebook (landing in a follow-up commit) consumes. Inputs from the user-authored side (the "what you actually have to provide" floor — one ontology file): * mako_core.ttl — the real MAKO ontology pulled from the reference gist (https://gist.github.com/haiyuan-eng-google/ a69ff6282ebcc877f77f9aa4e3db1afd). Domain-agnostic decision semantics, agent coordination, outcome tracking per Yahoo Monetization Platform's design doc. Generated / scaffolded artifacts (the SDK + plugin produce these; the demo never asks the user to hand-author them): * ontology.yaml — full 380-line auto-import via gm import-owl --include-namespace https://ontology.yahoo.com/mako/. Captures all 41 MAKO entities; the notebook displays this in Section 0 to show the realistic "import → resolve FILL_INs → curate" workflow. * ontology_demo.yaml — hand-curated 5-entity demo subset (AgentSession, DecisionPoint, Candidate, SelectionOutcome, ContextSnapshot) with FILL_IN primary keys resolved to id per MAKO's "every artifact has a stable identifier" contract. Validates clean against gm validate. The other 36 MAKO entities would scaffold the same way — narrowing keeps the notebook's four-guarantee story focused. * binding.yaml — auto-scaffolded via gm scaffold --ontology ontology_demo.yaml --dataset migration_v5_demo --project test-project-0728-467323 --out scaffold/. Demonstrates the "one file in, two files out" minimum-input path the storyboard's Section 0.5 calls out. * table_ddl.sql — companion to binding.yaml, also from gm scaffold. * property_graph.sql — user-authored CREATE PROPERTY GRAPH for the demo subset. Beat 1's "you own the graph definition" evidence. Uses __DATASET__ placeholder for per-run dataset substitution. Demo-specific Python fixtures (the notebook imports these): * seed_events.py — deterministic seeded RNG generator producing 404 events across 50 sessions: each session contains 2-4 decision points; each decision evaluates 3-5 candidates against a context snapshot and produces one outcome. Seeded RNG (_RANDOM_SEED = 20260512) so the notebook's outputs round-trip byte-identically across runs. Event shape mirrors the BQ AA plugin's payload so Beat 3 extractors see the same surface as production. * reference_extractors.py — handwritten extractor for mako_decision events (1 DecisionPoint + 1 SelectionOutcome + 1 ContextSnapshot reference + N Candidates + their edges per event), plus the exact EXTRACTORS / RESOLVED_GRAPH / SPEC module contract bqaa-revalidate-extractors requires. Mirrors the BKA decision pattern from bigquery_agent_analytics.structured_extraction. * revalidation_thresholds.json — threshold gate values for bqaa-revalidate-extractors --thresholds-json: 95% compiled_unchanged_rate, 99% parity_match_rate, etc. Validation done in this commit: * gm validate examples/migration_v5/ontology_demo.yaml → clean. * gm validate examples/migration_v5/binding.yaml --ontology examples/migration_v5/ontology_demo.yaml → clean. * seed_events.py generates 404 events across 50 sessions (50 session-start + 50 session-end + 152 context + 152 decision events). * reference_extractors.extract_mako_decision_event() on the first seeded decision produces 8 nodes (1 DecisionPoint + 1 Outcome + 1 Context + 5 Candidates) and 7 edges. Next commit lands the notebook itself.
Surface the design decisions baked into the fixtures so the fixture-shape review can happen before the notebook PR. Per-decision rationale on: 5-entity MAKO subset, id-as-primary-key strategy, NON-sorted skos:notation on DecisionPoint (exercises the round-3 lex-min display-token rule end-to-end), seeded event shape + per-beat coverage check, reference extractor scope. Lists the four validation commands already run + their pass status.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft PR — fixtures only. The four-guarantee notebook (`examples/migration_v5_demo_notebook.ipynb`) follows as a second commit on this branch once the fixture shape is settled.
Why this is a separate review unit
The risky design choices in #107 aren't the notebook prose — they're the fixture contracts:
idas the primary key strategyskos:notationsort / display-token behavior (intentionally NON-sorted onDecisionPointto exercise the round-3 lex-min rule end-to-end)mako_decisionis handwritten — `context_captured` stays AI-handled to demonstrate the Beat 3 contrast)Surfacing these as a focused draft PR with a
README.mdmakes fixture-design debate cheap to resolve here rather than buried inside 30 executed notebook cells.What's in here
Validation commands already run (all four pass)
```bash
python -m bigquery_ontology.cli validate examples/migration_v5/ontology_demo.yaml
python -m bigquery_ontology.cli validate examples/migration_v5/binding.yaml \
--ontology examples/migration_v5/ontology_demo.yaml
PYTHONPATH=src:examples/migration_v5 python -c "import seed_events; print(len(seed_events.generate_events()))"
PYTHONPATH=src:examples/migration_v5 python -c "import seed_events, reference_extractors; ev = next(e for e in seed_events.generate_events() if e.event_type == 'mako_decision'); out = reference_extractors.extract_mako_decision_event(ev.to_dict(), None); print(len(out.nodes), len(out.edges))"
```
What this PR is NOT
Test plan
Once this PR's fixture shape is approved (or adjusted), I'll add the notebook commit on the same branch and flip this draft to ready.