Skip to content

feat(schema): audit and update base schema files#9215

Draft
gkatre wants to merge 2 commits into
mainfrom
base-schema-curator/2026-04-15
Draft

feat(schema): audit and update base schema files#9215
gkatre wants to merge 2 commits into
mainfrom
base-schema-curator/2026-04-15

Conversation

@gkatre
Copy link
Copy Markdown
Contributor

@gkatre gkatre commented Apr 16, 2026

Summary

Comprehensive audit and update of all base schema files in bigquery_etl/schema/, performed by the base-schema-curator agent.

Changes Applied (Safe, Mechanical)

  • Type corrections (2 fields): Fixed app_version (INTEGER -> STRING) and os_version (INTEGER -> STRING) in global.yaml -- descriptions clearly indicate these hold version strings like "1.0.3" and "100.9.11"
  • Missing modes added (119 fields): Added explicit mode: NULLABLE to all 43 fields in global.yaml and 76 fields in ads_derived.yaml that lacked a mode key
  • Type normalization (2 fields): Changed BOOL to BOOLEAN for targets_default_site and targets_default_zone in ads_derived.yaml
  • Missing types added (2 fields): Added type: STRING to sites and zones in ads_derived.yaml
  • Descriptions filled/improved (5 fields): payout (was null), price (was "Price."), creative_type (was describing flight_name), legacy_telemetry_client_id (was generic), uapi_clicks (was incorrectly labeled "impressions")
  • Fields promoted to global.yaml (13 fields): activated, android_sdk_version, app_build, app_channel, city, days_since_first_seen, days_since_seen, default_search_engine, device_manufacturer, device_model, is_new_profile, metric_date, new_profiles -- each appears in 4+ distinct datasets
  • New file search_derived.yaml (10 fields): tagged_sap, tagged_follow_on, search_with_ads, ad_click, organic, sap, normalized_engine, unknown, ad_click_organic, search_with_ads_organic
  • New file firefox_desktop_derived.yaml (5 fields): attribution_dlsource, attribution_ua, is_dau, windows_build_number, windows_version

Items Requiring Human Review (13)

See bigquery_etl/schema/SCHEMA_AUDIT_RECOMMENDATIONS.md for the full list. Key items:

  • source_file canonical/alias conflict between global.yaml and ads_derived.yaml
  • creative_type description vs. field name mismatch (needs source verification)
  • dau and profile_group_id cross-file duplicates between global.yaml and ads_derived.yaml
  • Recommendations for creating app_newtab.yaml, app_mobile.yaml, and dataset-specific schema files for 7 additional datasets

Test plan

  • All YAML files validate successfully with yaml.safe_load()
  • Pre-commit hooks pass (yamllint, trim trailing whitespace, fix end of files)
  • Run ./bqetl query schema update <table> --use-global-schema on a sample table to verify new global fields are applied correctly
  • Review recommendations document for items requiring team input

🤖 Generated with Claude Code

- Fix type mismatches: app_version (INTEGER->STRING), os_version (INTEGER->STRING) in global.yaml
- Add mode: NULLABLE to all 43 fields in global.yaml and 76 fields in ads_derived.yaml that were missing explicit mode
- Fix BOOL->BOOLEAN for targets_default_site and targets_default_zone in ads_derived.yaml
- Fill missing/minimal descriptions: payout (was null), price (was "Price."), creative_type (was describing flight_name), legacy_telemetry_client_id, uapi_clicks in ads_derived.yaml
- Add missing type: STRING for sites and zones fields in ads_derived.yaml
- Promote 13 cross-dataset fields to global.yaml: activated, android_sdk_version, app_build, app_channel, city, days_since_first_seen, days_since_seen, default_search_engine, device_manufacturer, device_model, is_new_profile, metric_date, new_profiles
- Create search_derived.yaml with 10 search-specific fields (tagged_sap, tagged_follow_on, search_with_ads, ad_click, organic, sap, normalized_engine, unknown, ad_click_organic, search_with_ads_organic)
- Create firefox_desktop_derived.yaml with 5 desktop-specific fields (attribution_dlsource, attribution_ua, is_dau, windows_build_number, windows_version)
- Add SCHEMA_AUDIT_RECOMMENDATIONS.md with 13 items requiring human review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gkatre gkatre requested a review from a team April 16, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants