feat(bqaa): ADK 2.0 minimum producer cut (#293 v5)#6
Conversation
Implements the customer-driven mid-June producer-only subset of the ADK 2.0 observability work tracked in google#293 (parent google#190). The customer needs these specific fields visible in BigQuery before their ADK 2.0 production cutover; full v15 contract (google#190) lands incrementally. This change is producer-only. No consumer-side SDK / typed-view work is included — the customer reads base-table JSON directly during the mid-June window. What lands ---------- A1/A2 — every ADK-enriched row carries the attributes.adk envelope: attributes.adk.schema_version (_ADK_ENVELOPE_SCHEMA_VERSION = "1") attributes.adk.app_name (from InvocationContext.session.app_name) A3 — rows from on_event_callback additionally carry: attributes.adk.source_event_id (Event.id, reliable join key) Note: never fabricated — callback rows without an originating Event leave it JSON-null. C1 — attributes.adk.node = {path, run_id, parent_path}. parent_path is derived from path; default-empty path (NodeInfo.path = "") is preserved verbatim with parent_path = null, no synthesis. C2 — attributes.adk.branch (absent stays JSON null). C3 — attributes.adk.scope = null | {id, kind} per google#198 / google#293 v5 derivation order: (1) None → null, (2) name@run_id / path/name@run_id → node_run, (3) any other non-empty string → function_call (model-provided FC IDs match here), (4) empty/non-string → unknown with warning. C4 — emit AGENT_TRANSFER from event.actions.transfer_to_agent. Payload pinned: from_agent = event.author, to_agent = the target. Verified against EventActions.transfer_to_agent which stores the target only. C5 — emit EVENT_COMPACTION from event.actions.compaction. Float start_timestamp / end_timestamp preserved with fractional precision (consumer view conversion deferred). C6 — emit AGENT_STATE_CHECKPOINT when actions.agent_state is not None OR actions.end_of_agent is True. Allows {agent_state: null, end_of_agent: true} payloads. Inline payload only; GCS offload for oversized state deferred. C7 — emit TOOL_PAUSED for each event.long_running_tool_ids id, with attributes.pause_kind (derived from function_call NAME via _HITL_PAUSE_KIND_MAP — hitl_* for adk_request_*, tool otherwise) and attributes.function_call_id. HITL routing is unchanged: HITL function_responses stay on HITL_*_COMPLETED, NEVER emit TOOL_COMPLETED. Non-HITL function_responses arriving via on_user_message_callback emit TOOL_COMPLETED with pause_kind='tool' so the customer can pair (TOOL_PAUSED ↔ TOOL_COMPLETED) on (app_name, user_id, session_id, function_call_id) directly in SQL. Pause registry / pause_orphan semantics deferred to google#206. C8 — attributes.adk.{route, render_ui_widgets, rewind_before_invocation_id} mirror EventActions, flat-with-prefix per google#203 (matches the rest of the attributes.adk.* envelope convention). D1 — delete the deprecated on_state_change_callback stub (never called by ADK 2.0; verified no callers). Compatibility ------------- * AGENT_RESPONSE retains the legacy flat extras (source_event_id, source_event_author, source_event_branch) for backward compat. The canonical keys are now under attributes.adk.*. * The HITL test fixtures use Mock events without long_running_tool_ids or .id; the envelope helper is defensive against missing attrs. * No EventData / _log_event signature change. Added one optional field EventData.source_event: Optional[Event] = None — a minimal B0 (google#194) step. Callbacks that have access to the source Event pass it through; others leave it None (and the envelope correctly leaves A3/C1/C2/C3 null on those rows). Tests ----- 257 plugin tests pass (238 existing + 19 new): * envelope shape on event-originating and non-event-originating rows * node parent_path derivation with both empty and nested paths * _derive_scope for None, bare node, path/node, FC IDs, empty string * C4/C5/C6 emit paths * C5 fractional float-epoch precision round-trip * C6 both-shape coverage + id-stabilization regression guard (Event.model_post_init auto-assigns id even when constructor omits it) * C7 TOOL_PAUSED pause_kind derivation for non-HITL and HITL * C7 HITL non-routing: HITL function_response → HITL_*_COMPLETED only, NEVER TOOL_COMPLETED * C7 user-message TOOL_COMPLETED with pause_kind='tool' * C8 flat-with-prefix route / rewind_before_invocation_id * D1: on_state_change_callback removed from the public surface Refs: google#293 (v5), google#190, google#194, google#196, google#197, google#198, google#199, google#200, google#201, google#202,
Caught in review of #6: the C7 pair keys (pause_kind, function_call_id) were being passed via EventData.extra_attributes, which _enrich_attributes() copies at the top of attrs *before* attrs["adk"] = _build_adk_envelope(...). That landed them at attributes.pause_kind / attributes.function_call_id, not attributes.adk.pause_kind / attributes.adk.function_call_id. The customer SQL pinned in google#293 v5 acceptance #3 is: JSON_VALUE(attributes, '$.adk.function_call_id') = JSON_VALUE(...) so the pair join would have returned null on every row. This commit makes the contract match the SQL. Changes: * EventData gains adk_extras: dict[str, Any], a sibling of extra_attributes that lives INSIDE attributes.adk. * _enrich_attributes merges adk_extras into the envelope after _build_adk_envelope (envelope wins on conflict — producer-derived identity fields like source_event_id are the source of truth). * The two emit sites (TOOL_PAUSED in on_event_callback, TOOL_COMPLETED in on_user_message_callback) pass the pair keys via adk_extras= instead of extra_attributes=. * The three C7 tests are updated to assert json.loads(row["attributes"])["adk"]["pause_kind"] etc., locking in the right shape this time. Full plugin suite: 252 passed.
|
Good catch — verified and fixed. The reviewer is right that Fix (commit 236b790)
|
End-to-end validation against real BigQuery + real Gemini 3.5 FlashRan the PR's plugin code against a fresh BigQuery dataset, calling Result: 16/16 validations PASS (10 from the main run + 6 from a supplemental run that closed envelope-field gaps the first run didn't query). Coverage table
How the validation was done# Phase 1 — real Gemini 3.5 Flash, global endpoint:
agent = LlmAgent(name="approval_agent", model="gemini-3.5-flash",
tools=[LongRunningFunctionTool(func=submit_for_human_approval)])
# Gemini called the tool → emitted long_running_tool_ids:
# {'adk-58381d5a-ea48-4d75-972c-34f3f3c172f2'}
# Phase 2 — synthetic Events for paths Gemini won't trigger:
await plugin.on_event_callback(invocation_context=ic, event=ev_transfer)
await plugin.on_event_callback(invocation_context=ic, event=ev_compact)
await plugin.on_event_callback(invocation_context=ic, event=ev_state)
await plugin.on_event_callback(invocation_context=ic, event=ev_end)
await plugin.on_event_callback(invocation_context=ic, event=ev_actions) # route/widgets/rewind
await plugin.on_user_message_callback(invocation_context=ic, user_message=hitl_msg)
await plugin.on_user_message_callback(invocation_context=ic, user_message=nonhitl_msg)
# Phase 3 — plugin.shutdown(), wait for Storage Write API visibility, query BQ.Concrete proof of the C7 regression fixThe previously-fixed bug (pair keys at the wrong JSON path) was specifically validated by querying the exact SQL the customer will use: SELECT JSON_VALUE(attributes, '$.adk.pause_kind') AS pk,
JSON_VALUE(attributes, '$.adk.function_call_id') AS fcid
FROM `…agent_events`
WHERE session_id = '…' AND event_type = 'TOOL_PAUSED';
-- → [{'pk': 'tool', 'fcid': 'adk-58381d5a-ea48-4d75-972c-34f3f3c172f2'}]Both keys resolve to non-null values — i.e. the customer's Explicitly not validated (deferred per google#293 v5)
Test infrastructure
|
Backward-compatibility analysisClassification: backward-compatible additive telemetry, not a breaking change. What is NOT changing
What IS observable (intentionally)
Operational nuances for the customer / analytics team
Caveats — who might need to update something
SummaryBackward-compatible additive telemetry. The only group with action items pre-merge is anyone running brittle assertions on exact event-type sets or attribute-JSON shape — and that group is tracked separately under google#211. |
Downstream SQL recipes for the new ADK 2.0 eventsEvery query below targets the BQAA agent_events table. Replace the placeholders: DECLARE _PROJECT STRING DEFAULT 'your-project';
DECLARE _DATASET STRING DEFAULT 'your_dataset';
DECLARE _TABLE STRING DEFAULT 'agent_events';
DECLARE _SINCE TIMESTAMP DEFAULT TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);Or inline a fully-qualified table reference like 1. Envelope health checks (A1 / A2 / A3)1.1 — Confirm SELECT
COUNT(*) AS total,
COUNTIF(JSON_VALUE(attributes, '$.adk.schema_version') IS NULL) AS missing_schema_version,
COUNTIF(JSON_VALUE(attributes, '$.adk.app_name') IS NULL) AS missing_app_name,
COUNT(DISTINCT JSON_VALUE(attributes, '$.adk.schema_version')) AS distinct_schema_versions
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
1.2 — Event-originating vs callback-only row split (A3 source_event_id presence) SELECT
event_type,
COUNTIF(JSON_VALUE(attributes, '$.adk.source_event_id') IS NOT NULL) AS event_originating,
COUNTIF(JSON_VALUE(attributes, '$.adk.source_event_id') IS NULL) AS callback_only,
COUNT(*) AS total
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY event_type
ORDER BY total DESC;Helps surface which event types are produced from non- 2. Workflow DAG via
|
| Field used by this PR | Present on v1 EventActions? |
Effect after upgrade |
|---|---|---|
actions.transfer_to_agent |
Yes (it's the original 1.x transfer mechanism) | Existing 1.x-shaped agents that transfer now emit a new AGENT_TRANSFER row per transfer |
actions.compaction |
Yes (v1 EventActions.compaction: Optional[EventCompaction]) |
1.x agents that triggered compaction now emit EVENT_COMPACTION rows |
actions.agent_state / actions.end_of_agent |
Yes (both fields exist on v1 EventActions) |
1.x agents that set either now emit AGENT_STATE_CHECKPOINT rows |
event.long_running_tool_ids |
Yes (v1 event.py: long_running_tool_ids: Optional[set[str]] = None) |
1.x agents with long-running tools now emit TOOL_PAUSED rows, and user-message-side resumes emit non-HITL TOOL_COMPLETED rows with the pair keys |
actions.rewind_before_invocation_id |
Yes | 1.x agents that requested a rewind now stamp attributes.adk.rewind_before_invocation_id |
actions.render_ui_widgets |
Yes | Stamped under attributes.adk.render_ui_widgets when present |
actions.route |
No — route is ADK-2.0-only on EventActions |
1.x agents never produce a non-null attributes.adk.route; null is the correct/expected value |
Key observation: even though most of these EventActions fields existed in 1.x, the 1.x BQAA plugin never emitted the new event types (AGENT_TRANSFER, EVENT_COMPACTION, AGENT_STATE_CHECKPOINT, TOOL_PAUSED) — they were silently ignored. The data was already in the agent execution; it just wasn't logged.
So for a 1.x → 2.x upgrade:
- All existing event types continue to be emitted with the same content shape. Old SQL keeps working.
- The new event types and the
attributes.adk.*envelope are additive. New SQL gets new observability into actions that were already happening. - No agent-behavior change. The plugin is observation-only; it doesn't modify the agent's event flow.
- Brittle assertions need a refresh. If your downstream has a hardcoded allowlist of event types, or a strict row-count test, or strict JSON-schema validation on the
attributescolumn, you'll need to update those to permit the new types and theadkenvelope. This is the same caveat already captured in the BC analysis above.
Quick "am I about to break?" check for a 1.x → 2.x upgrade
Run before the upgrade against the old 1.x data:
SELECT event_type, COUNT(*) AS n
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY event_type
ORDER BY n DESC;Run after the upgrade. If AGENT_TRANSFER, EVENT_COMPACTION, AGENT_STATE_CHECKPOINT, or TOOL_PAUSED show up where they didn't before — that's expected (your agent was always taking those actions; you can now see them).
|
Fresh review of the SQL-library / ADK 1.x impact comment. The structure is useful and the 1.x compatibility framing checks out against Findings
Verified Good
After the three SQL edits above, the library is in good shape as the customer-facing query starter set. |
SQL library v2 — three correctionsAll three findings verified before posting:
The long-running duration join in Section 6 intentionally omits Below: the three corrected sections inline. Everything else in the v1 library stands. Section 5 (corrected) — Agent-state checkpoints5.1 — Checkpoint stream with correct shape discriminator SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
agent,
timestamp,
JSON_QUERY(content, '$.agent_state') AS agent_state,
SAFE_CAST(JSON_VALUE(content, '$.end_of_agent') AS BOOL) AS end_of_agent,
CASE
WHEN JSON_QUERY(content, '$.agent_state') IS NULL
AND SAFE_CAST(JSON_VALUE(content, '$.end_of_agent') AS BOOL) = TRUE
THEN 'end_only'
WHEN JSON_QUERY(content, '$.agent_state') IS NOT NULL
AND COALESCE(SAFE_CAST(JSON_VALUE(content, '$.end_of_agent') AS BOOL), FALSE) = FALSE
THEN 'state_only'
ELSE 'both'
END AS shape
FROM `proj.ds.agent_events`
WHERE event_type = 'AGENT_STATE_CHECKPOINT'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
ORDER BY timestamp DESC
LIMIT 100;5.2 — Checkpoint frequency per agent (uses SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
agent,
COUNTIF(JSON_QUERY(content, '$.agent_state') IS NOT NULL) AS state_checkpoints,
COUNTIF(SAFE_CAST(JSON_VALUE(content, '$.end_of_agent') AS BOOL) = TRUE) AS end_of_agent_signals,
COUNT(*) AS total
FROM `proj.ds.agent_events`
WHERE event_type = 'AGENT_STATE_CHECKPOINT'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY 1, 2
ORDER BY total DESC;Sections 2 / 4 / 7 / 10 (corrected) — full telemetry identity in grouping2.1 — Workflow DAG join SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
JSON_VALUE(attributes, '$.adk.node.path') AS node_path,
JSON_VALUE(attributes, '$.adk.node.parent_path') AS parent_path,
JSON_VALUE(attributes, '$.adk.node.run_id') AS run_id,
COUNT(*) AS events_at_node
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.adk.node.path') IS NOT NULL
AND JSON_VALUE(attributes, '$.adk.node.path') != ''
GROUP BY 1, 2, 3, 4, 5, 6, 7
ORDER BY app_name, user_id, session_id, invocation_id, node_path;2.2 — Workflow node fan-out per invocation SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
COUNT(DISTINCT JSON_VALUE(attributes, '$.adk.node.path')) AS distinct_node_paths,
COUNT(*) AS total_events
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.adk.node.path') IS NOT NULL
AND JSON_VALUE(attributes, '$.adk.node.path') != ''
GROUP BY 1, 2, 3, 4
HAVING distinct_node_paths > 1
ORDER BY distinct_node_paths DESC
LIMIT 50;4.1 — Compaction list with full identity SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
TIMESTAMP_MICROS(CAST(CAST(JSON_VALUE(content, '$.start_timestamp') AS FLOAT64) * 1000000 AS INT64)) AS window_start,
TIMESTAMP_MICROS(CAST(CAST(JSON_VALUE(content, '$.end_timestamp') AS FLOAT64) * 1000000 AS INT64)) AS window_end,
CAST(JSON_VALUE(content, '$.end_timestamp') AS FLOAT64)
- CAST(JSON_VALUE(content, '$.start_timestamp') AS FLOAT64) AS window_seconds
FROM `proj.ds.agent_events`
WHERE event_type = 'EVENT_COMPACTION'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
ORDER BY window_seconds DESC
LIMIT 50;7.1 — HITL pair stream with app_name in the grouping SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
COUNTIF(event_type = 'HITL_CONFIRMATION_REQUEST') AS conf_request,
COUNTIF(event_type = 'HITL_CONFIRMATION_REQUEST_COMPLETED') AS conf_completed,
COUNTIF(event_type = 'HITL_CREDENTIAL_REQUEST') AS cred_request,
COUNTIF(event_type = 'HITL_CREDENTIAL_REQUEST_COMPLETED') AS cred_completed,
COUNTIF(event_type = 'HITL_INPUT_REQUEST') AS inp_request,
COUNTIF(event_type = 'HITL_INPUT_REQUEST_COMPLETED') AS inp_completed
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND event_type LIKE 'HITL_%'
GROUP BY 1, 2, 3, 4
ORDER BY (conf_request + cred_request + inp_request) DESC
LIMIT 50;10 — Branch fan-out with full identity SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
COUNT(DISTINCT JSON_VALUE(attributes, '$.adk.branch')) AS distinct_branches,
ARRAY_AGG(DISTINCT JSON_VALUE(attributes, '$.adk.branch') IGNORE NULLS LIMIT 10) AS branches
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.adk.source_event_id') IS NOT NULL
GROUP BY 1, 2, 3, 4
HAVING distinct_branches > 1
ORDER BY distinct_branches DESC
LIMIT 50;Section 6.1 (corrected) — per-stream dedupe addedThe two-CTE shape avoids the single-partition cross-event-type bug. The added WITH paused_dedup AS (
SELECT * EXCEPT(rn)
FROM (
SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS pause_ts,
ROW_NUMBER() OVER (
PARTITION BY
JSON_VALUE(attributes, '$.adk.app_name'),
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id')
ORDER BY timestamp
) AS rn
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_PAUSED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
WHERE rn = 1
),
completed_dedup AS (
SELECT * EXCEPT(rn)
FROM (
SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS complete_ts,
ROW_NUMBER() OVER (
PARTITION BY
JSON_VALUE(attributes, '$.adk.app_name'),
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id')
ORDER BY timestamp
) AS rn
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_COMPLETED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
WHERE rn = 1
)
SELECT
p.app_name, p.user_id, p.session_id, p.function_call_id, p.tool,
p.pause_ts, c.complete_ts,
TIMESTAMP_DIFF(c.complete_ts, p.pause_ts, SECOND) AS pause_seconds
FROM paused_dedup p
JOIN completed_dedup c
USING (app_name, user_id, session_id, function_call_id)
ORDER BY pause_seconds DESC
LIMIT 100;Why this layout vs a single combined partition: putting v1 sections that need no change1 (envelope health) / 3 (transfer chains) / 8 (action attributes) / 9 (scope breakdown) — already correct as posted. The 1.x impact analysis stands. |
|
Fresh review of SQL library v2. The three corrections are materially right: Section 5 now uses the right JSON function for object state, Sections 2/4/7/10 carry full invocation identity, and Section 6.1 now dedupes per stream. I found two remaining follow-ups worth tightening before this becomes customer copy/paste material. Findings
Verified Good
No blocker to the SQL library direction. I would make the two edits above so the recipes do not drift the moment google#206 lands and so shared-table users do not accidentally merge unrelated apps in their rollups. |
SQL library v3 — two final follow-upsBoth findings are real:
Section 6.1's join-time guard Section 6.1 (v3) — pause_orphan-safe + timestamp-ordered joinWITH paused_dedup AS (
SELECT * EXCEPT(rn)
FROM (
SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS pause_ts,
ROW_NUMBER() OVER (
PARTITION BY
JSON_VALUE(attributes, '$.adk.app_name'),
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id')
ORDER BY timestamp
) AS rn
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_PAUSED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
WHERE rn = 1
),
completed_dedup AS (
SELECT * EXCEPT(rn)
FROM (
SELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS complete_ts,
ROW_NUMBER() OVER (
PARTITION BY
JSON_VALUE(attributes, '$.adk.app_name'),
user_id, session_id,
JSON_VALUE(attributes, '$.adk.function_call_id')
ORDER BY timestamp
) AS rn
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_COMPLETED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
-- Forward-compat with #206: exclude orphan-tagged completions
-- from the healthy-pair branch. Null-safe vs current #293 rows.
AND COALESCE(SAFE_CAST(JSON_VALUE(attributes, '$.adk.pause_orphan') AS BOOL), FALSE) = FALSE
)
WHERE rn = 1
)
SELECT
p.app_name, p.user_id, p.session_id, p.function_call_id, p.tool,
p.pause_ts, c.complete_ts,
TIMESTAMP_DIFF(c.complete_ts, p.pause_ts, SECOND) AS pause_seconds
FROM paused_dedup p
JOIN completed_dedup c
USING (app_name, user_id, session_id, function_call_id)
WHERE c.complete_ts >= p.pause_ts -- guard against clock skew / replay
ORDER BY pause_seconds DESC
LIMIT 100;When google#206 lands, an orphan branch can be added as a sibling query (or Section 3.2 (v3) — top transfer pairs per appSELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
JSON_VALUE(content, '$.from_agent') AS from_agent,
JSON_VALUE(content, '$.to_agent') AS to_agent,
COUNT(*) AS transfers
FROM `proj.ds.agent_events`
WHERE event_type = 'AGENT_TRANSFER'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY 1, 2, 3
ORDER BY transfers DESC
LIMIT 25;For a true fleet-wide rollup across all apps, drop the Section 8.1 (v3) — routing histogram per appSELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
JSON_VALUE(attributes, '$.adk.route') AS route_value,
COUNT(*) AS occurrences,
COUNT(DISTINCT invocation_id) AS invocations
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.adk.route') IS NOT NULL
GROUP BY 1, 2
ORDER BY occurrences DESC;Section 8.2 (v3) — rewind requests with identitySELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id AS rewinding_invocation_id,
JSON_VALUE(attributes, '$.adk.rewind_before_invocation_id') AS rewinding_to,
agent,
timestamp
FROM `proj.ds.agent_events`
WHERE JSON_VALUE(attributes, '$.adk.rewind_before_invocation_id') IS NOT NULL
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
ORDER BY timestamp DESC
LIMIT 50;Section 8.3 (v3) — widget render rows with identitySELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
invocation_id,
agent,
ARRAY_LENGTH(JSON_QUERY_ARRAY(attributes, '$.adk.render_ui_widgets')) AS widget_count,
JSON_VALUE(attributes, '$.adk.render_ui_widgets[0].provider') AS first_widget_provider
FROM `proj.ds.agent_events`
WHERE JSON_QUERY(attributes, '$.adk.render_ui_widgets') IS NOT NULL
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
ORDER BY widget_count DESC
LIMIT 50;Net: the library now stays correct when google#206 ships and respects per-app boundaries in shared |
|
Fresh review of SQL library v3. The two fixes are directionally correct and the emitted JSON paths match the PR branch. I found one real SQL robustness issue in Section 6.1 and one small consistency nit in Section 8.1. Findings
Verified Good
No producer-code concern here. This is just making the customer SQL robust before it gets copied into docs or notebooks. |
SQL library v4 — order-of-operations fix in Section 6.1 + identity-count fix in Section 8.1
Section 6.1 (v4) — dedupe + join + QUALIFYWITH paused AS (
SELECT DISTINCT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS pause_ts
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_PAUSED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
),
completed AS (
SELECT DISTINCT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
user_id,
session_id,
JSON_VALUE(attributes, '$.adk.function_call_id') AS function_call_id,
JSON_VALUE(content, '$.tool') AS tool,
timestamp AS complete_ts
FROM `proj.ds.agent_events`
WHERE event_type = 'TOOL_COMPLETED'
AND JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool'
-- Forward-compat with #206: orphan-tagged completions are excluded
-- from the healthy-pair branch. Null-safe vs current #293 rows.
AND COALESCE(SAFE_CAST(JSON_VALUE(attributes, '$.adk.pause_orphan') AS BOOL), FALSE) = FALSE
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
SELECT
p.app_name, p.user_id, p.session_id, p.function_call_id, p.tool,
p.pause_ts, c.complete_ts,
TIMESTAMP_DIFF(c.complete_ts, p.pause_ts, SECOND) AS pause_seconds
FROM paused p
JOIN completed c
USING (app_name, user_id, session_id, function_call_id)
-- The guard happens BEFORE choosing the winning completion, so a skewed
-- early completion can't shadow a later valid one.
WHERE c.complete_ts >= p.pause_ts
-- For each (pause), pick the earliest valid completion after it.
QUALIFY ROW_NUMBER() OVER (
PARTITION BY p.app_name, p.user_id, p.session_id, p.function_call_id, p.pause_ts
ORDER BY c.complete_ts
) = 1
ORDER BY pause_seconds DESC
LIMIT 100;This shape:
Section 8.1 (v4) — full-identity invocation countSELECT
JSON_VALUE(attributes, '$.adk.app_name') AS app_name,
JSON_VALUE(attributes, '$.adk.route') AS route_value,
COUNT(*) AS occurrences,
COUNT(DISTINCT TO_JSON_STRING(STRUCT(user_id, session_id, invocation_id))) AS invocations
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.adk.route') IS NOT NULL
GROUP BY 1, 2
ORDER BY occurrences DESC;Same pattern works anywhere else a downstream user wants a strict-identity distinct count. Two other v3 queries (Section 2.2 fan-out, Section 10 branch fan-out) already group by No producer-code concern; the PR branch's emitted JSON paths still match. Other v3 sections stand as-posted. |
|
Fresh final review of SQL library v4. FindingsNo remaining blockers found. The two v4 changes close the last issues I had on the customer SQL:
Verified Good
Residual note: Section 6.1 is a healthy-pair recipe only. Once google#206 lands, add the sibling orphan branch ( |
… JSON-null The docstring claimed callback-only rows leave A3/C1/C2/C3/C8 keys 'JSON null'. The implementation actually returns early when source_event is None (line 2837-2838) so those keys are absent from the envelope, not written as null. Behavior is correct (and what the google#293 v5 contract intends). Updating the docstring to match — and noting that because the surrounding column is BigQuery JSON, an omitted key resolves to SQL NULL via JSON_VALUE(attributes, '$.adk.<field>'), so consumer SQL using 'IS NOT NULL' to gate Event-originating rows works without the producer writing explicit JSON nulls. Caught by the RFC google#97 review against the haiyuan-eng-google SDK repo; no code change required, docstring-only fix.
|
Pushed a docstring-only follow-up ( Caught by the RFC google#97 review at the SDK repo. No code change required; the omitted-vs-null distinction matters because:
Updated docstring now reads:
|
… contract Last stale 'writes those attributes as null' reference in the producer code. Behavior is unchanged; the helper omits the keys (return early at :2837 when source_event is None) and JSON_VALUE on the BigQuery JSON column resolves an omitted key to SQL NULL, so consumer gating with 'IS NOT NULL' works without explicit JSON nulls. Caught by the RFC google#97 final review pass; matches the corrected _build_adk_envelope docstring in 8d8eb05.
Following review feedback that docstrings shouldn't reference GitHub issue numbers or PR review-thread revisions. The technical substance (contract names like 'A1/A2 envelope', 'C7 pair-key emit', 'flat-with- prefix', 'HITL non-routing') stays where it aids navigation; only the '#NNN' and 'v5' annotations come out. 20 sites swept across the plugin module and test file. Behavior and test names unchanged; suite still 252/252. The existing 'google#4645' reference in workflow plumbing is left alone -- it was not introduced by this change.
Consolidated SQL recipe library (final — v5)This comment is the single canonical version after the v1 → v5 review cycle. All earlier SQL comments are superseded by this one. Every section here uses the latest contract decisions:
Replace these placeholders in the FROM clauses below: -- proj.ds.agent_events → your fully-qualified table referenceThe 1. Envelope health checks (A1 / A2 / A3)1.1 — Confirm SELECT
COUNT(*) AS total,
COUNTIF(JSON_VALUE(attributes, '$.adk.schema_version') IS NULL) AS missing_schema_version,
COUNTIF(JSON_VALUE(attributes, '$.adk.app_name') IS NULL) AS missing_app_name,
COUNT(DISTINCT JSON_VALUE(attributes, '$.adk.schema_version')) AS distinct_schema_versions
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
1.2 — Event-originating vs callback-only row split (A3 source_event_id presence) SELECT
event_type,
COUNTIF(JSON_VALUE(attributes, '$.adk.source_event_id') IS NOT NULL) AS event_originating,
COUNTIF(JSON_VALUE(attributes, '$.adk.source_event_id') IS NULL) AS callback_only,
COUNT(*) AS total
FROM `proj.ds.agent_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY event_type
ORDER BY total DESC;Surfaces which event types are produced from non- 2. Workflow DAG via
|
|
Fresh review of the consolidated SQL recipe library. The consolidation is the right move and most v1→v4 fixes carried over cleanly. I found two remaining SQL issues worth fixing before this becomes the canonical copy/paste comment. Findings
Verified Good
Net: fix Section 5's JSON-null presence checks and Section 6.2's timestamp-aware completion existence check. After that, this can be the canonical SQL library. |
|
Fresh review of the edited canonical SQL comment. FindingsNo SQL correctness blockers found. The two v5 fixes are now folded in correctly:
Small Polish
Verified Good
Net: the SQL library is canonical after the two wording updates above. |
|
Final SQL recipe review. No remaining findings. Verified:
The canonical SQL library is ready to use as the copy/paste reference. |
Draft — for iteration in this fork before raising upstream.
Originally raised at google#6015 against google/adk-python; moved here to iterate independently first.
This PR implements the customer-driven minimum producer subset of the ADK 2.0 observability work tracked in GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#293 (parent: google#190). The customer needs these specific fields visible in BigQuery before their ADK 2.0 production cutover; the full v15 contract lands incrementally.
Producer-only — no consumer-side SDK / typed-view work is needed for the customer to query the new data; they read base-table JSON directly during the mid-June window.
What lands
attributes.adk.{schema_version, app_name}on_event_callbackrows carryattributes.adk.source_event_id— never fabricated on callback-only rowsattributes.adk.node = {path, run_id, parent_path}— default-empty path preserved verbatim, no synthesisattributes.adk.branch(absent stays JSON null)attributes.adk.scope = null | {id, kind}per google#198 / google#293 v5 derivationAGENT_TRANSFER(from_agent = event.author,to_agent = actions.transfer_to_agent)EVENT_COMPACTION— fractional float-epoch seconds preservedAGENT_STATE_CHECKPOINT(both{agent_state, end_of_agent}shapes), inline payload onlyTOOL_PAUSEDwithpause_kind(HITL-aware via_HITL_PAUSE_KIND_MAP) +function_call_id; user-messageTOOL_COMPLETEDwithpause_kind='tool'for non-HITL; HITLfunction_responsesstay onHITL_*_COMPLETED, never emitTOOL_COMPLETEDattributes.adk.{route, render_ui_widgets, rewind_before_invocation_id}mirroron_state_change_callbackstubExplicitly deferred (post-mid-June, per google#293 v5)
WORKFLOW_NODE_STARTING/COMPLETEDevent typesattributes.adk.nodetoday.pause_orphansemantics / read-after-write visibilityTOOL_PAUSED ↔ TOOL_COMPLETEDSQL joins.attributes.adk.otel_span_idsource_event_id↔ ADK's span-sideassociated_event_ids.AGENT_STATE_CHECKPOINTon_event_callbackpathsEventData.source_event: Optional[Event] = Noneas a minimal step; full per-callback coverage matrix lands with google#194.Compatibility
AGENT_RESPONSEretains the legacy flat extras (source_event_id,source_event_author,source_event_branch) for backward compat alongside the new canonicalattributes.adk.*envelope. Existing consumers continue to work.EventDatasignature gains one optional field (source_event). No breaking change.Tests
252 plugin tests pass (233 existing on fork's main + 19 new). Note: this fork's
maindoesn't yet have upstream'sTestDropStatsclass (introduced in google/adk-python after this fork's last sync); that class is not in this PR.node.parent_pathderivation with both empty and nested paths_derive_scopeforNone, bare node, path/node, FC IDs, empty stringEvent.model_post_initauto-assignsideven when the constructor omits it —_create_agent_state_eventis covered)TOOL_PAUSEDpause_kindderivation for non-HITL and HITLfunction_response→HITL_*_COMPLETEDonly, neverTOOL_COMPLETEDTOOL_COMPLETEDwithpause_kind='tool'route/rewind_before_invocation_idon_state_change_callbackremoved from the public surfaceReferences
Test plan
pytest tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py— 252/252 passpyink --config pyproject.toml src/ tests/+isort src/ tests/— clean🤖 Generated with Claude Code