add visit level widget table#9346
Conversation
There was a problem hiding this comment.
Overall this looks like a clean, well-scoped addition that closely mirrors the established newtab_visits_daily_v2 pattern, which makes it easy to read and reason about. Manual verification via the linked doc and the consistent dimension handling between visit-level and widget-level grains are both nice. Comments below are mostly polish / robustness suggestions — none are blockers.
Themes:
- Grain & contracts. The metadata describes the grain as
(newtab_visit_id, widget_name, widget_size)persubmission_date, but the SQL doesn't guard against NULLwidget_name/widget_sizefrom malformed extras, and the LEFT JOIN can leaveuser_action_countsas NULL for impression-only widgets. Worth deciding whether either is intentional and documenting it. - DRY. Three CTEs each re-extract
newtab_visit_id/widget_name/widget_sizefromevent_details; hoisting these intoevents_unnestedwould simplify the rest of the query. - View surface. The new view adds
windows_versionbut keeps the underlyingwindows_build_number. Excluding the raw build number (or at least describing both in the view metadata) would tighten the user-facing contract. The metadata description also currently omits theapp_namecolumn the view adds. - Style. A couple of comment-indentation slips in
visit_aggregationsand curly-apostrophe Unicode characters in two schema descriptions —./bqetl formatplus a quick s/’/'/ should clean these up. - Data quality. No
checks.sqlwas added; given this table will back experiment analysis, a few standard partition/uniqueness/non-null checks would be cheap insurance. - Minor: The PR description references
newtab_visits_daily_v1as the model table, but the actual current table in the repo isnewtab_visits_daily_v2. Worth updating the description for future readers chasing the lineage.
The reviewer checklist item about not duplicating an existing dataset is worth a quick gut-check from a domain owner — confirming that no existing firefox_desktop_derived widget aggregation already covers this — but assuming that's been validated, the structure here is sound.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
land-edi
left a comment
There was a problem hiding this comment.
I am wondering whether we should create a new dataset for widgets given it is copying all the same columns we already have in newtab_visits_daily, which include today everything except widgets. Do you think it is will be better to just include the few widget columns you are creating itno that dataset? Beside that, everything else looks good
|
Did a quick run of the most recent file and compared to the validated v1 version. There were <1000 discrepancies and the enabled and disabled columns look fine in the aggregate so I think we're good. Updated the validation doc with this info. |
Integration report for "add visit level widget table"
|
This PR adds a visit-level aggregate table for widgets.
Widgets key experiments related to widgets are currently happening, and widgets are expected to be rolled out widely in June. This table enables the analysis of widget interactions at the visit level and will be used to create a client-level table that can be used in experiments.
The table was modeled on
firefox_desktop_derived.newtab_visits_daily_v1. It copies the same strategy for creating dimensions for visits.The table was manually verified for several visits, as documented here: https://docs.google.com/document/d/1d4H1JegVWJ4EhvAvVd95VD0WftS8nozdD6xitx81ji0/edit?tab=t.0