Skip to content

feat(firefox_desktop_derived): enrich schema and add README for newtab_content_items_daily_v1#9250

Draft
gkatre wants to merge 2 commits into
mainfrom
schema-creation-agent/newtab-content-items-daily-v1
Draft

feat(firefox_desktop_derived): enrich schema and add README for newtab_content_items_daily_v1#9250
gkatre wants to merge 2 commits into
mainfrom
schema-creation-agent/newtab-content-items-daily-v1

Conversation

@gkatre
Copy link
Copy Markdown
Contributor

@gkatre gkatre commented Apr 24, 2026

Summary

  • Enriched schema.yaml for firefox_desktop_derived.newtab_content_items_daily_v1 — all 19 fields now have descriptions sourced from global.yaml (4 fields), upstream newtab schemas (2 fields), query context (1 field), and retained/corrected existing descriptions (12 fields)
  • Created README.md (139 lines) documenting the dual-source data flow (legacy newtab_v1 ping + dedicated newtab_content ping), aggregation logic, example queries, and field conventions
  • Created newtab_content_items_daily_v1_missing_metadata.yaml recommending 3 newtab-specific columns (newtab_content_surface_id, corpus_item_id, newtab_content_ping_version) for addition to app_newtab.yaml

Changes

  • sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml — all 19 fields enriched: 4 from global.yaml base schema, 2 from upstream newtab source schemas, 1 from query context, 12 retained with 3 quality fixes (typos and clarifications)
  • sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/README.md — new file (139 lines) covering Overview, Data Flow (Mermaid), How It Works, Key Fields, 3 graduated example queries, Implementation Notes, Notes & Conventions, Schema & Related Tables
  • bigquery_etl/schema/missing_metadata/newtab_content_items_daily_v1_missing_metadata.yaml — 3 columns not found in any live base schema, recommended for app_newtab.yaml
  • bigquery_etl/schema/missing_metadata/newtab_content_items_daily_v1-metadata-summary.md — full enrichment run summary with per-column source tracking

🤖 Generated with Claude Code

…b_content_items_daily_v1

- schema.yaml: all 19 fields enriched — 4 from global.yaml (submission_date, channel, country, app_version), 2 from upstream source schemas (newtab_content_surface_id, corpus_item_id), 1 from query context (newtab_content_ping_version), 12 retained with 3 quality fixes (matches_selected_topic typo, section_position typo, content_redacted clarified)
- README.md: created (139 lines) covering data flow from two sources (newtab_v1 legacy + newtab_content ping), How It Works, Key Fields, 3 graduated example queries, Implementation Notes, Notes & Conventions
- newtab_content_items_daily_v1_missing_metadata.yaml: 3 non-base-schema columns recommended for app_newtab.yaml
- newtab_content_items_daily_v1-metadata-summary.md: full enrichment summary

Co-Authored-By: claude-sonnet-4-6 <noreply@anthropic.com>
@gkatre gkatre requested a review from a team April 24, 2026 07:22
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Integration report for "feat(firefox_desktop_derived): enrich schema and add README for newtab_content_items_daily_v1"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1: README.md
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/README.md /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/README.md
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/README.md	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/README.md	2026-05-05 20:46:45.589664005 +0000
@@ -0,0 +1,139 @@
+# Newtab Content Items Daily
+
+Daily aggregation of Pocket/newtab content item actions (impressions, clicks, dismissals) for Firefox desktop, one row per unique combination of date, surface, corpus item, position, and dimensional attributes.
+
+---
+
+## 📌 Overview
+
+| | |
+|---|---|
+| **Grain** | One row per `(submission_date, channel, country, newtab_content_surface_id, corpus_item_id, position, is_sponsored, is_section_followed, matches_selected_topic, received_rank, section, section_position, topic, content_redacted, newtab_content_ping_version, app_version)` |
+| **Source** | `moz-fx-data-shared-prod.firefox_desktop_stable.newtab_v1` + `moz-fx-data-shared-prod.firefox_desktop.newtab_content` |
+| **DAG** | `bqetl_newtab` · daily · incremental |
+| **Partitioning** | `submission_date` *(partition filter required)* |
+| **Clustering** | `channel`, `country` |
+| **Retention** | No automatic expiration |
+| **Owner** | lmcfall@mozilla.com |
+| **Version** | v1 (initial version) |
+
+**Use cases:** Pocket content engagement analysis · sponsored vs. organic item performance · section and topic click-through rates
+
+---
+
+## 🗺️ Data Flow
+
+```mermaid
+flowchart TD
+  A1[Legacy newtab events — Pocket category<br/>`moz-fx-data-shared-prod.firefox_desktop_stable.newtab_v1`] -->|filter @submission_date<br/>category IN pocket, name IN impression/click/dismiss| B[**This query**]
+  A2[Dedicated newtab_content ping<br/>`moz-fx-data-shared-prod.firefox_desktop.newtab_content`] -->|filter @submission_date<br/>category IN newtab_content, ping_version IS NOT NULL| B
+  B --> C[Partitioned table<br/>time: `submission_date`<br/>cluster: `channel`, `country`]
+```
+
+---
+
+## 🧠 How It Works
+
+1. **Input** — Events are unnested from two sources: legacy `newtab_v1` (Pocket category, app version ≥ 121) and the dedicated `newtab_content` ping (filtered to non-null ping version to confirm Newtab Content ping origin).
+2. **Flattening** — Event extra fields (`corpus_item_id`, `position`, `is_sponsored`, `section`, `topic`, etc.) are extracted via `mozfun.map.get_key` and cast to their target types.
+3. **Surface resolution** — `newtab_content_surface_id` is resolved using `mozfun.newtab.surface_id_country()`; for legacy events with no surface ID, `mozfun.newtab.scheduled_surface_id_v1()` is used as fallback.
+4. **Aggregation** — Events are grouped by all dimensional keys; `COUNTIF(event_name = 'impression/click/dismiss')` produces the three metric columns.
+5. **Data inclusion** — Legacy source excludes `content_redacted = 'true'` events (those are counted in the newtab_content ping path). Only app version ≥ 121 is included from the legacy source to prevent duplicates from pre-Glean releases.
+
+---
+
+## 🧾 Key Fields
+
+### Dimensions
+
+| Category | Fields |
+|---|---|
+| Date & Geo | `submission_date`, `country` |
+| Browser | `channel`, `app_version` |
+| Newtab config | `newtab_content_surface_id`, `newtab_content_ping_version`, `content_redacted` |
+| Item | `corpus_item_id`, `position`, `received_rank`, `is_sponsored`, `topic` |
+| Section | `section`, `section_position`, `is_section_followed`, `matches_selected_topic` |
+
+### Metrics
+
+| Category | Fields |
+|---|---|
+| Engagement | `impression_count`, `click_count`, `dismiss_count` |
+
+---
+
+## 🧩 Example Queries
+
+```sql
+-- 1. Daily total impressions, clicks, and dismissals for the last 7 days
+SELECT
+  submission_date,
+  SUM(impression_count) AS impressions,
+  SUM(click_count) AS clicks,
+  SUM(dismiss_count) AS dismissals
+FROM `moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_v1`
+WHERE submission_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
+GROUP BY 1
+ORDER BY 1 DESC;
+```
+
+```sql
+-- 2. Click-through rate by topic for a single day
+SELECT
+  submission_date,
+  topic,
+  SUM(impression_count) AS impressions,
+  SUM(click_count) AS clicks,
+  SAFE_DIVIDE(SUM(click_count), SUM(impression_count)) AS ctr
+FROM `moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_v1`
+WHERE submission_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+GROUP BY 1, 2
+ORDER BY impressions DESC;
+```
+
+```sql
+-- 3. Sponsored vs. organic engagement by country over the last 30 days
+SELECT
+  submission_date,
+  country,
+  is_sponsored,
+  SUM(impression_count) AS impressions,
+  SUM(click_count) AS clicks,
+  SUM(dismiss_count) AS dismissals,
+  SAFE_DIVIDE(SUM(click_count), SUM(impression_count)) AS ctr
+FROM `moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_v1`
+WHERE submission_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
+  AND country = 'US'
+GROUP BY 1, 2, 3
+ORDER BY 1 DESC, impressions DESC;
+```
+
+---
+
+## 🔧 Implementation Notes
+
+- Incremental: filtered by `@submission_date`; one partition written per run.
+- Two source paths are unioned: legacy newtab ping (Pocket events, app version ≥ 121) and the dedicated newtab_content ping (identified by non-null `newtab_content_ping_version`).
+- `content_redacted = 'true'` rows are excluded from the legacy source path; those events are captured instead via the newtab_content ping to avoid double-counting.
+- Country is resolved via `mozfun.newtab.surface_id_country()` rather than Fastly relay IP (`normalized_country_code`) for the newtab_content ping path.
+- Use `SAFE_DIVIDE()` for all CTR/ratio calculations to avoid division-by-zero on rows with zero impressions.
+
+---
+
+## 📌 Notes & Conventions
+
+- `impression_count` = `COUNTIF(event_name = 'impression')` — total item impressions per dimensional group.
+- `click_count` = `COUNTIF(event_name = 'click')` — total item clicks per dimensional group.
+- `dismiss_count` = `COUNTIF(event_name = 'dismiss')` — total item dismissals per dimensional group.
+- `corpus_item_id` is the canonical Newtab content identifier, replacing legacy `tile_id` and `scheduled_corpus_item_id`.
+- `newtab_content_ping_version` is NULL for rows sourced from the legacy newtab ping; non-null values indicate the dedicated Newtab Content ping.
+- `content_redacted` is NULL for newtab_content ping rows; `'false'` for legacy newtab ping rows included in this table.
+
+---
+
+## 🗃️ Schema & Related Tables
+
+- Full field definitions: [`schema.yaml`](schema.yaml)
+- **Upstream (legacy)**: `moz-fx-data-shared-prod.firefox_desktop_stable.newtab_v1` — raw Glean newtab ping events for Firefox desktop
+- **Upstream (content ping)**: `moz-fx-data-shared-prod.firefox_desktop.newtab_content` — dedicated newtab content ping with item-level detail
+- **Downstream**: Used by Pocket/Newtab teams for content performance reporting and recommendation quality analysis
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml	2026-05-05 20:46:49.778683420 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml	2026-05-05 20:46:45.622665735 +0000
@@ -2,19 +2,26 @@
 - name: submission_date
   type: DATE
   mode: NULLABLE
-  description: Date when client action took place
+  description: The date when the telemetry ping is received on the server side.
 - name: channel
   type: STRING
   mode: NULLABLE
+  description: The normalized channel the application is being distributed on.
 - name: country
   type: STRING
   mode: NULLABLE
+  description: Name of the country in which the activity took place, as determined
+    by the IP geolocation.
 - name: newtab_content_surface_id
   type: STRING
   mode: NULLABLE
+  description: The surface identifier for the newtab content ping.
 - name: corpus_item_id
   type: STRING
   mode: NULLABLE
+  description: A content identifier. For organic Newtab recommendations it is an opaque
+    id produced by Newtab's recommendation systems that corresponds uniquely to the
+    URL. This is the replacement for tile_id and scheduled_corpus_item_id.
 - name: position
   type: INTEGER
   mode: NULLABLE
@@ -30,14 +37,12 @@
 - name: matches_selected_topic
   type: STRING
   mode: NULLABLE
-  description: >
-    Returns value based on if a the topic of the pocket recommendation
-    matches one of the user-selected topic categories
+  description: Returns value based on whether the topic of the pocket recommendation
+    matches one of the user-selected topic categories.
 - name: received_rank
   type: INTEGER
   mode: NULLABLE
-  description: >
-    The rank or order of the recommendation at the time it was sent to
+  description: The rank or order of the recommendation at the time it was sent to
     the client.
 - name: section
   type: STRING
@@ -46,7 +51,7 @@
 - name: section_position
   type: INTEGER
   mode: NULLABLE
-  description: If click belongs in a section, the numberic position of the section
+  description: If click belongs in a section, the numeric position of the section.
 - name: topic
   type: STRING
   mode: NULLABLE
@@ -54,10 +59,14 @@
 - name: content_redacted
   type: STRING
   mode: NULLABLE
-  description: Are content details sent separately in the newtab_content ping
+  description: Indicates whether content details were redacted and sent separately
+    via the newtab_content ping. Value is 'false' for events included in this table.
 - name: newtab_content_ping_version
   type: INTEGER
   mode: NULLABLE
+  description: The version of the newtab_content ping schema used to send the event.
+    Used to distinguish events originating from the dedicated Newtab Content ping
+    versus the legacy newtab ping.
 - name: impression_count
   type: INTEGER
   mode: NULLABLE
@@ -73,3 +82,4 @@
 - name: app_version
   type: INTEGER
   mode: NULLABLE
+  description: User visible version string (e.g. "1.0.3") for the browser.

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants