feat(DENG-11100): Create new base table zendesk_ticket_sla_v1 + KPI View#9338
Conversation
|
Claude finished @phil-lee70's task in 4m 51s —— View job Claude Code Review
Posted a formal PR review with 11 inline comments and a summary covering correctness risks, repo conventions, performance hygiene, and schema nits. Top items to address before merge:
Conventions:
Non-blocking: add clustering, consolidate the duplicate |
There was a problem hiding this comment.
Summary
Nice, well-documented introduction of a SUMO Zendesk SLA base table and KPI view. The business-minutes pipeline (week-expansion, schedule intersection, holiday subtraction) is laid out cleanly and the rolling-13-month, non-incremental rebuild strategy is well-justified for late-arriving CSAT/resolution updates. Metadata and schema descriptions are unusually thorough — appreciated.
A few cross-cutting observations; details inline.
Correctness / data-quality (please address before merge)
- CSAT join can fan out tickets. The
csatCTE is not deduplicated byticket_id. If any ticket has more than one survey response (or more than onerating_scaleanswer per response), the finalLEFT JOINmultiplies ticket rows and silently inflates every downstream KPI. Suggest aGROUP BY ticket_id(orQUALIFY ROW_NUMBER()) inside the CTE plus a quick spot-check query. tag LIKE '%test%'is too broad — matcheslatest,tested,manifest, etc., dropping legitimate tickets from KPI denominators. Prefer an explicit allow-list or a word-boundary regex.- Time-zone fallback risk: the
CASEmapping fortime_zonefalls through to the raw Zendesk label, which may not be a valid IANA zone forDATETIME(...). A defensive default plus a data check would make breakage loud rather than silent.
Conventions / repo guidance
- View name carries
_v1suffix, whichdocs/reference/recommended_practices.mdadvises against (views should be a stable interface). The PR description also references it assumo_zendesk_kpis_v1, but the directory issumo_zendesk_sla_kpis_v1— please pin one canonical name and update the cross-reference inzendesk_ticket_sla_v1/metadata.yaml. - Mixed source projects (
moz-fx-data-shared-prod.zendesk_syndicate.ticketvs.moz-fx-sumo-prod.zendesk.*) — worth a one-line comment explaining why or, ideally, reading tickets from the same project as the joined tables to avoid silent schema/coverage drift.
Performance / hygiene (non-blocking)
- Add
clusteringonproductandautomation_categoryat table-creation time — cheap now, expensive later. - The
scheduletable is read twice; foldingstart_time/end_timeinto theactive_scheduleCTE removes the foot-gun the inline comment correctly warns about. raw_delta_in_minutesis rounded to whole minutes while neighboring fields are rounded to one decimal — consider keeping precision for the schedule-intersection bounds.- Consider whether the simple
solved → openreopen detection is robust enough given that automation classification depends on it.
Schema nits
is_one_touchreads as a flag but isINTEGER; aligning onBOOLEAN(matchingis_excluded_from_slaandsurvey_responded) would be more consistent.- Document the
csat_percentageNULLsemantics when no group has a rated ticket.
Overall the design is solid; the main thing I'd want resolved before this lands is the CSAT fan-out check and the test-tag matcher.
| csat AS ( | ||
| SELECT | ||
| t_s.ticket_id, | ||
| a_s.rating_category | ||
| FROM | ||
| `moz-fx-sumo-prod.zendesk.csat_survey_answer` AS a_s | ||
| JOIN | ||
| `moz-fx-sumo-prod.zendesk.ticket_csat_survey` AS t_s | ||
| ON a_s.survey_response_id = t_s.survey_response_id | ||
| AND a_s.type = 'rating_scale' | ||
| ) |
There was a problem hiding this comment.
Potential row fan-out on CSAT join (correctness risk).
The csat CTE selects (ticket_id, rating_category) without grouping or de-duplicating. If ticket_csat_survey has more than one row per ticket (re-surveyed tickets, multiple survey responses, or test data), or if csat_survey_answer has more than one rating_scale answer per survey_response_id, the LEFT JOIN csat ON tw.ticket_id = csat.ticket_id at the bottom of the query will multiply ticket rows. Because every ticket-level metric (agent_reply_count, is_one_touch, business minutes, …) is then double-counted in the KPI view, this would silently inflate FCR and CSAT denominators.
Suggest aggregating to one row per ticket — e.g. pick the latest survey response, or MAX/ANY_VALUE the rating — and add a GROUP BY ticket_id here. A qualify ROW_NUMBER() OVER (PARTITION BY ticket_id ORDER BY survey_response_id DESC) = 1 works too. Worth verifying with a quick SELECT ticket_id, COUNT(*) FROM csat GROUP BY ticket_id HAVING COUNT(*) > 1 LIMIT 10 before merge.
| ANY_VALUE( | ||
| CASE | ||
| time_zone | ||
| WHEN 'Central Time (US & Canada)' | ||
| THEN 'America/Chicago' | ||
| WHEN 'Eastern Time (US & Canada)' | ||
| THEN 'America/New_York' | ||
| WHEN 'Mountain Time (US & Canada)' | ||
| THEN 'America/Denver' | ||
| WHEN 'Pacific Time (US & Canada)' | ||
| THEN 'America/Los_Angeles' | ||
| ELSE time_zone | ||
| END | ||
| ) AS schedule_tz |
There was a problem hiding this comment.
Time-zone ELSE fallback can silently break business-minutes math.
DATETIME(ticket_created_at, schedule_tz) (used downstream) requires an IANA-style identifier such as America/Los_Angeles. The ELSE time_zone branch passes Zendesk's raw time_zone string through unmodified — if Zendesk ever returns a friendly label not in the explicit list (e.g. "Alaska", "Arizona", "London"), the query will either fail at runtime or compute against UTC, producing wrong business-minute values.
Two suggestions:
- Filter the
name = 'Mozilla Support Hours'clause in this CTE so we know which TZ values are actually possible, and document the mapping is exhaustive for that schedule, or - Replace
ELSE time_zonewithELSE 'UTC'plus a defensiveASSERT/ data check that fails the run if an unmapped label appears, so the breakage is loud rather than silent.
Either way, please add a short comment naming the assumption — today the only protection is the schedule's name filter on line 140.
| test_tickets AS ( | ||
| SELECT DISTINCT | ||
| tt.ticket_id | ||
| FROM | ||
| `moz-fx-sumo-prod.zendesk.ticket_tag` AS tt | ||
| JOIN | ||
| tickets_in_window AS tw | ||
| ON tw.ticket_id = tt.ticket_id | ||
| WHERE | ||
| tt.tag LIKE '%test%' | ||
| ), |
There was a problem hiding this comment.
tag LIKE '%test%' is overly broad.
This will also match tags such as latest, tested, manifest, protest, contest, attestation, etc., flagging legitimate tickets as is_excluded_from_sla = TRUE and silently dropping them from KPI denominators.
If the intent is the literal token test, prefer either:
- An explicit allow-list of known test tags (
tag IN ('test', 'qa-test', 'internal-test', …)), or - Word-boundary matching:
REGEXP_CONTAINS(tag, r'(?:^|[-_])test(?:$|[-_])').
Worth confirming with the SUMO support team which tags are actually used to mark test tickets and pinning the list.
| -- Intersect each week's window with the schedule. The s.id = wp.schedule_id filter is | ||
| -- critical — without it every schedule in the table contributes, multiplying minutes. | ||
| intercepted_periods AS ( | ||
| SELECT | ||
| wp.ticket_id, | ||
| wp.event_type, | ||
| wp.schedule_id, | ||
| wp.schedule_tz, | ||
| wp.week_number, | ||
| wp.ticket_created_at, | ||
| wp.event_at, | ||
| wp.calendar_minutes, | ||
| s.start_time AS schedule_interval_start, | ||
| LEAST(wp.ticket_week_end_time, s.end_time) - GREATEST( | ||
| wp.ticket_week_start_time, | ||
| s.start_time | ||
| ) AS scheduled_minutes | ||
| FROM | ||
| weekly_periods AS wp | ||
| JOIN | ||
| `moz-fx-sumo-prod.zendesk.schedule` AS s | ||
| ON s.id = wp.schedule_id | ||
| AND wp.ticket_week_start_time <= s.end_time | ||
| AND wp.ticket_week_end_time >= s.start_time | ||
| ), |
There was a problem hiding this comment.
The schedule table is read twice; consider consolidating.
active_schedule already pre-filters to the single Mozilla Support Hours row but only keeps schedule_id and schedule_tz. Here the full schedule table is re-scanned to get start_time/end_time, with s.id = wp.schedule_id as the only safety net (called out correctly in the comment above).
Two equivalent simplifications that remove the foot-gun entirely:
- Pre-filter in this join itself:
JOIN \moz-fx-sumo-prod.zendesk.schedule` AS s ON s.name = 'Mozilla Support Hours' AND s.id = wp.schedule_id`, or - Extend
active_scheduleto also keepstart_time/end_time(one row per slot, not aggregated) and join only against that.
Option 2 is cleaner and means there's no scenario where someone deletes the s.id = wp.schedule_id predicate and silently multiplies minutes.
| -- GREATEST guards against any data weirdness where event_at < created_at. | ||
| GREATEST( | ||
| 0, | ||
| ROUND(TIMESTAMP_DIFF(te.event_at, te.ticket_created_at, SECOND) / 60.0, 0) | ||
| ) AS raw_delta_in_minutes |
There was a problem hiding this comment.
Integer-minute rounding here can drift from calendar_minutes.
raw_delta_in_minutes is rounded to whole minutes (decimals=0) while calendar_minutes upstream and business_minutes downstream are rounded to one decimal. Because the weekly-period expansion uses raw_delta_in_minutes to derive the upper bound of GENERATE_ARRAY(...), a ticket whose actual delta is e.g. 90.4 minutes uses 90 minutes for the schedule intersection, which can produce business-minute totals slightly inconsistent with calendar-minute totals (and with what Zendesk Explore shows).
If the goal is to keep GENERATE_ARRAY integer-bounded, leave this as-is but compute the schedule-intersection bounds against (start_time_in_minutes_from_week + raw_delta_in_minutes_float) so the precision is preserved at the boundary. Otherwise consider rounding everything consistently to 1 decimal.
| bigquery: | ||
| time_partitioning: | ||
| type: day | ||
| field: ticket_created_date | ||
| require_partition_filter: false |
There was a problem hiding this comment.
Consider adding clustering.
The table will be filtered downstream almost exclusively by product, automation_category, ticket_group, and is_excluded_from_sla (see the KPI view). Adding a clustering block alongside time_partitioning would meaningfully reduce bytes scanned for those queries — example:
bigquery:
time_partitioning:
type: day
field: ticket_created_date
require_partition_filter: false
clustering:
fields:
- product
- automation_categoryNot blocking, but cheap to add at table-creation time and expensive to add later (requires a backfill).
| CREATE OR REPLACE VIEW | ||
| `moz-fx-data-shared-prod.sumo_metrics_derived.sumo_zendesk_sla_kpis_v1` | ||
| AS |
There was a problem hiding this comment.
View name carries a _v1 version suffix.
Per docs/reference/recommended_practices.md: "Views should generally not be named with a version suffix; a view represents a stable interface for users and whenever possible should maintain compatibility with existing queries."
Suggest renaming the directory to sumo_zendesk_sla_kpis/ (no suffix) so future schema iterations can land non-breaking changes without a v2 cutover for downstream dashboards. Some _derived views in the repo do carry a _v1_live suffix, but that's a different convention used for live (streaming) variants and doesn't apply here.
Also worth considering: the description in the PR body refers to this as sumo_zendesk_kpis_v1, but the directory is named sumo_zendesk_sla_kpis_v1. Pick one canonical name and update the metadata cross-reference in zendesk_ticket_sla_v1/metadata.yaml (line 32).
| automation_category, | ||
| COUNT(*) AS tickets_resolved, | ||
| COUNTIF(is_one_touch = 1) AS one_touch_tickets, | ||
| COUNTIF(rating_category = 'good') AS satisfied_surveys, | ||
| COUNTIF(rating_category IN ('good', 'bad')) AS rated_surveys, | ||
| -- FCR (%) = one-touch resolved tickets / total resolved tickets * 100. | ||
| -- Mirrors Zendesk Explore's "% One-touch tickets" metric. | ||
| SAFE_DIVIDE(COUNTIF(is_one_touch = 1), COUNT(*)) * 100 AS fcr_percentage, | ||
| -- CSAT (%) = good ratings / (good + bad ratings) * 100. | ||
| -- Tickets without a rating are excluded from the denominator. | ||
| SAFE_DIVIDE( | ||
| COUNTIF(rating_category = 'good'), | ||
| COUNTIF(rating_category IN ('good', 'bad')) | ||
| ) * 100 AS csat_percentage |
There was a problem hiding this comment.
KPI numerators count tickets without a CSAT survey as "satisfied = no".
COUNTIF(rating_category = 'good') returns satisfied_surveys, but the field naming is slightly misleading — satisfied_surveys actually counts the subset of resolved tickets (after is_excluded_from_sla = FALSE) that received a good rating, not the subset of surveys. Likewise rated_surveys counts the rated subset of resolved tickets, not the survey universe.
Two minor improvements:
- Rename to
satisfied_tickets/rated_ticketsso the grain matches the column name, or - Filter
resolved_ticketstoWHERE rating_category IS NOT NULLin a separate CTE for the survey numerators so the names are accurate.
Also: csat_percentage will be NULL when no tickets in the group received any rating (because SAFE_DIVIDE(0, 0) IS NULL). Consumers should be aware — worth documenting on the metadata description.
| - name: is_one_touch | ||
| type: INTEGER | ||
| mode: NULLABLE | ||
| description: | | ||
| 1 if the ticket was resolved with fewer than two agent replies (FCR proxy); | ||
| 0 otherwise. Unresolved tickets are 0. |
There was a problem hiding this comment.
is_one_touch would read more naturally as BOOLEAN.
It's only ever 0 or 1 and is consumed as a flag (the KPI view does COUNTIF(is_one_touch = 1)). A BOOLEAN column with COUNTIF(is_one_touch) is more idiomatic and matches survey_responded / is_excluded_from_sla already in this schema.
Same observation for the INTEGER choice — is_excluded_from_sla is BOOLEAN, so the schema currently uses both conventions for what are conceptually flags. Aligning on BOOLEAN everywhere is the cleaner choice; just be aware it's a breaking change for any consumer that already does is_one_touch = 1.
| WITH tickets_in_window AS ( | ||
| SELECT | ||
| t.id AS ticket_id, | ||
| t.created_at AS ticket_created_at, | ||
| DATE(t.created_at) AS ticket_created_date, | ||
| t.status, | ||
| t.group_id, | ||
| t.custom_product, | ||
| COALESCE(m.product_mapping, t.custom_product) AS product | ||
| FROM | ||
| `moz-fx-data-shared-prod.zendesk_syndicate.ticket` AS t | ||
| LEFT JOIN | ||
| `moz-fx-data-shared-prod.static.cx_product_mappings_v1` AS m | ||
| ON m.product = t.custom_product | ||
| AND m.source = 'Zendesk' | ||
| WHERE | ||
| DATE(t.created_at) | ||
| BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 13 MONTH) | ||
| AND CURRENT_DATE() |
There was a problem hiding this comment.
Mixed source projects — please double-check.
tickets_in_window reads from moz-fx-data-shared-prod.zendesk_syndicate.ticket, while every other Zendesk source in this query (ticket_comment, user, ticket_field_history, ticket_tag, schedule, schedule_holiday, csat_survey_answer, ticket_csat_survey, group) reads from moz-fx-sumo-prod.zendesk.*.
Two risks:
- Schema drift between the two copies — if
zendesk_syndicate.ticketlags or has different filters thanmoz-fx-sumo-prod.zendesk.ticket, the join keys (ticket_id) may silently miss tickets that exist in one but not the other. - Cross-project query cost — depending on access patterns, querying both projects in one job is fine but worth verifying the dry-run cost.
If moz-fx-sumo-prod.zendesk.ticket exists, prefer reading from a single project for consistency. If zendesk_syndicate.ticket is the canonical source for SUMO derived datasets going forward, please add a one-line comment explaining why so future readers don't second-guess the choice.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as low quality.
This comment was marked as low quality.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…be aggregated in Looker Studio
This comment has been minimized.
This comment has been minimized.
Integration report for "feat(DENG-11100): Create new base table zendesk_ticket_sla_v1 + KPI View"
|
Description
This PR creates the
zendesk_ticket_sla_v1base table for the Customer Experience data model and a view for the KPI.KPIs created by this table:
first_reply_time_*full_resolution_time_*KPI List:
user-data-sources-and-metrics
Customer Experience Project Planning Doc
Related Tickets & Documents
Validation
Compared this PR's
query.sqlagainst the originalmzl-sumo-metrics/dashboard queries/zendesk_sla_metrics.sqlat the per-ticket grain forfirst_reply_time_calendar_minutesandfirst_reply_time_business_minutes.Date run: 2026-05-08
Alignment applied
Sumo Test/VPN QAgroups,%test%-tagged ticketsGROUP BYso per-ticket values are exposedRow counts
Statistical profile
calendar_minutesnullcalendar_minutesmin/max/avgbusiness_minutesnullbusiness_minutesmin/max/avgPer-ticket value diff (matched 93,854 tickets)
first_reply_time_calendar_minutesfirst_reply_time_business_minutesVerdict: 100% per-ticket parity on both columns across the 93,854 tickets present in both queries. The avg/null-count differences are explained entirely by the 6 new-only tickets.
Why the 6 new-only tickets: they have
group_id IS NULL. The dashboard query's filterg.name NOT IN ('Sumo Test', 'VPN QA')is NULL-unsafe — wheng.name IS NULLthe predicate is UNKNOWN and the row is silently dropped. The new query usesCOALESCE(g.name, '') NOT IN (...), which correctly retains tickets without a group. This is an intentional bug fix in the new query, not a regression.Validation query (click to expand)
Reviewer, please follow this checklist