feat: bqaa-revalidate-extractors CLI for local revalidation (#75 follow-up to C2.d)#150
Merged
haiyuan-eng-google merged 5 commits intoMay 12, 2026
Conversation
…ogleCloudPlatform#75 follow-up to C2.d) Operationalizes revalidate_compiled_extractors so ops can run periodic revalidation without writing Python. **Local inputs only** this round; --events-bq-query is a deliberate follow-up so the auth/location/pagination surface stays isolated from the operational-loop contract. Public surface (src/bigquery_agent_analytics/extractor_compilation/cli_revalidate.py): * main(argv) — argparse-driven entry. Wired through pyproject.toml [project.scripts] as ``bqaa-revalidate-extractors``. * Flags: --bundles-root, --events-jsonl, --reference-extractors-module, --thresholds-json (optional), --report-out. * Exit codes: 0 pass / 1 threshold violation (report still written) / 2 usage-or-input error (report not written). Reference module contract: the dotted-path module passed to --reference-extractors-module must expose EXTRACTORS: dict[str, Callable[[dict, Any], StructuredExtractionResult]] RESOLVED_GRAPH: ResolvedGraph # from resolve(ontology, binding) SPEC: Any = None # optional The CLI deliberately carries no ontology / binding flags because the reference module is the operational contract that defined both the event_type-to-callable mapping AND the spec they validate against. One module, one contract. Implementation notes: * Fingerprint is auto-detected from the first bundle's manifest; mixed fingerprints under one --bundles-root fail-closed at exit 2 (a deployment mistake for revalidation). * JSONL parsing is strict — a malformed line aborts with exit 2 naming the line number. The harness's per-event skipped_events path is for legitimately-shaped events without coverage, not corrupted input. * Reference-module shape is checked at the CLI boundary (missing EXTRACTORS / RESOLVED_GRAPH; non-dict EXTRACTORS; non-string-or-empty keys; non-callable values) so the harness never sees a malformed registry. * Threshold JSON is validated via the existing RevalidationThresholds.__post_init__ bounds check; unknown fields also rejected so a typo doesn't silently produce a no-op gate. * Report JSON shape: {"report": ..., "threshold_check": null | {"ok": bool, "violations": [str, ...]}}. Tests (tests/test_extractor_compilation_cli_revalidate.py, 15 cases): * TestCliEndToEnd (3): happy path (exit 0, threshold_check=null); threshold pass (exit 0, ok=True); threshold violation (exit 1, report still written with violations listed). All run against real BKA fixtures plus hand-built bundles whose compiled source either agrees with or drifts from the reference. * TestCliUsageErrors (11): missing events file; malformed JSONL line; missing bundles root; mixed-fingerprint bundles root; empty bundles root; reference module not importable; reference module missing EXTRACTORS; reference module missing RESOLVED_GRAPH; bad EXTRACTORS shape; thresholds JSON with unknown field; thresholds JSON with out-of-range rate. * test_console_script_entry_point_registered (1): locks the pyproject.toml entry so a typo fails CI rather than breaking the binary at user-install time. Docs: docs/extractor_compilation_revalidate_cli.md describes the contract, exit codes, report shape, and reference-module surface. docs/README.md index entry + CHANGELOG entry added. 189 related extractor_compilation tests pass.
Addresses PR GoogleCloudPlatform#150 round-1 reviewer findings. P1 - --report-out parent-directory preflight + write wrap. The docs promise usage/input errors exit 2 with no report, but ``--report-out missing/report.json`` used to raise FileNotFoundError out of ``_write_report``. Now: * ``_load_config`` preflight checks ``report_out.parent.is_dir()`` before doing any work; missing parent surfaces as a clean ``_CliError`` with a "create it before running" message. * ``_write_report`` wraps ``path.write_text(...)`` in try/except OSError so the late-failure cases (permissions, disk full, parent removed between preflight and write) also surface as ``_CliError`` rather than a traceback. P2 - I/O exception wrapping on --events-jsonl and --thresholds-json. The previous code only caught JSONDecodeError; OSError (permission denied, file removed between is_file check and open) and UnicodeError (invalid UTF-8 in a tampered or wrong-encoding file) leaked as raw tracebacks. Both readers now catch both exception types and surface them as ``_CliError`` with the file path named. Tests: 15 -> 18. Three new cases in ``TestCliUsageErrors``: * test_report_out_parent_missing_exits_two_cleanly: ``--report-out tmp_path/does-not-exist/report.json`` exits 2 with "does not exist" in the message; the missing parent dir is NOT created. * test_events_jsonl_invalid_utf8_exits_two_cleanly: binary bytes (0xff 0xfe ...) in the JSONL file exits 2 with "UTF-8" in the message; no report leaked. * test_thresholds_json_invalid_utf8_exits_two_cleanly: same defensive shape on the thresholds reader. Docs: ``--report-out`` flag description spells out the preflight + late-failure-wrap behavior; ``TestCliUsageErrors`` test count bumped to 14 plus three new lines for the new cases.
Addresses PR GoogleCloudPlatform#150 round-2 reviewer finding. P2 - argparse's default ArgumentParser.error() writes the usage line + error message to stderr and then calls sys.exit(2), which bypasses main()'s documented "returns an exit code" contract. Tests calling ``main([])`` would catch a raised SystemExit instead of receiving the documented ``EXIT_USAGE_ERROR`` return value. Fix: subclass argparse.ArgumentParser as ``_NonExitingArgumentParser`` whose ``error()`` raises ``_CliError`` instead of calling sys.exit. The usage line is still printed to stderr first (matches argparse's default UX); main() catches ``_CliError`` from ``parse_args()`` through the same boundary that handles every other usage failure and returns ``EXIT_USAGE_ERROR``. ``exit()`` is NOT overridden, so ``--help`` and ``--version`` still terminate via ``SystemExit(0)`` — that's the expected terminal behavior for those flags. The override only catches argument errors that argparse would otherwise route through ``sys.exit(2)``. Tests: 18 -> 20. Two new cases in ``TestCliUsageErrors``: * test_missing_required_flag_returns_two_not_systemexit: ``main([])`` returns 2 (no raised SystemExit); stderr contains the standard "the following arguments are required" wording plus the usage line. * test_unrecognized_flag_returns_two_not_systemexit: all required flags supplied plus a bogus ``--no-such-flag`` returns 2 with "unrecognized arguments" in stderr. Docs: exit-code table calls out that ``main(argv)`` returns ``EXIT_USAGE_ERROR`` rather than raising on bad flags; ``--help`` / ``--version`` still ``SystemExit(0)`` behavior is documented as intentional.
…xit-code table PR GoogleCloudPlatform#150 round-3 P3 cleanup. The CLI's main() docstring, _NonExitingArgumentParser docstring, and the exit-code table in the doc page all claimed ``--version`` exited via SystemExit(0) — but the parser doesn't actually define a ``--version`` action. ``main(["--version"])`` falls through to the missing-required-args path and returns 2. Fix: remove the ``--version`` claims to match reality. The CLI inherits ``--help`` from argparse for free; ``--version`` is deliberately deferred until the package has a stable version-emission strategy (the existing pyproject.toml ``version`` field would need wiring through ``importlib.metadata.version("bigquery-agent-analytics")``, which is scope creep beyond the reviewer's actual finding). No code changes; no test changes. All 19 CLI tests + 1 entry-point skip still pass.
haiyuan-eng-google
approved these changes
May 12, 2026
Merged
3 tasks
haiyuan-eng-google
pushed a commit
that referenced
this pull request
May 12, 2026
…ors (#75 follow-up) (#151) * feat: --events-bq-query-file event source for bqaa-revalidate-extractors (#75 follow-up) The CLI now accepts a BigQuery event source in addition to --events-jsonl; the two are mutually exclusive and exactly one must be supplied. Promised follow-up to the local-only landing of PR #150. Contract: the SQL must produce exactly one column named ``event_json`` (STRING) per row, containing a JSON-encoded event dict — same shape --events-jsonl consumes line-by-line. The CLI does NOT auto-shape bigquery.Row objects; the query writer controls projection (typically via ``TO_JSON_STRING(STRUCT(...))``). Explicit non-goal: row-shape auto-inference would couple the CLI to every caller's table schema; the single-column contract keeps the CLI predictable. New flags: * --events-bq-query-file PATH (mutex with --events-jsonl; argparse mutex group with required=True so "both" / "neither" both return exit 2 via the existing _CliError boundary). * --bq-project PROJECT_ID — OPTIONAL. When absent the BigQuery client falls back to Application Default Credentials / environment for project inference. If both the flag and the inferred project are empty, the CLI exits 2 with "Set --bq-project explicitly" rather than confusing the operator with a downstream BigQuery API error. * --bq-location LOC — defaults to "US"; ignored under --events-jsonl. Error handling — every failure mode surfaces as a clean exit 2 with the file path and (for row-level errors) the 0-based row index named, so an operator can find the offender with ``LIMIT N OFFSET row_index``: * BigQuery raises (auth, syntax, table-not-found, permission) → "BigQuery query failed: <Type>: <message>" * Row missing the event_json column → "row N: missing required column 'event_json'" * event_json value isn't a STRING (struct projected without TO_JSON_STRING) → "row N: 'event_json' must be STRING" * event_json STRING isn't valid JSON → "row N: invalid JSON in 'event_json'" * event_json decodes to non-object (array, scalar) → "row N: 'event_json' decodes to <type>, expected a JSON object" * Empty / whitespace-only .sql file → "is empty" * Invalid UTF-8 in .sql file → "not valid UTF-8" Implementation notes: * Client construction is centralized behind ``_make_bq_client(project, location)`` so unit tests inject fakes via ``monkeypatch.setattr(cli_revalidate, "_make_bq_client", ...)`` rather than wiring through every call site. The real factory tries ``bigquery.Client(project=...)`` first; falls back to ``bigquery.Client()`` if project is None; raises _CliError if client.project ends up falsy. * ``_load_events_from_bq`` reads the .sql file (with the same UTF-8 / OSError wrapping pattern as _load_jsonl / _load_thresholds), constructs the client via the factory, runs ``client.query(sql).result()`` inside a single try/except that catches Exception (not BaseException, so KeyboardInterrupt / SystemExit still propagate), then iterates rows with row-indexed validation. * ``_load_config`` dispatches on whichever event-source flag is set; argparse's mutex group enforces exactly one. Tests: 20 -> 31 in tests/test_extractor_compilation_cli_revalidate.py. * TestCliUsageErrors gains two cases for the mutex (both / neither event sources). * New TestCliEventsBQ class (9 cases): happy path, ADC project inference, no-project-anywhere, query exceptions, every row-shape failure mode (missing column, non-string, invalid JSON, non-dict decode), empty .sql file. Live BQ test in tests/test_extractor_compilation_cli_revalidate_bq_live.py, gated behind BQAA_RUN_LIVE_TESTS=1 + BQAA_RUN_LIVE_BQ_REVALIDATE_TESTS=1 + PROJECT_ID + DATASET_ID. Creates a temp table, inserts two event_json rows shaped like BKA decisions, runs the CLI, asserts the report is written with both events as compiled_unchanged + parity_matches, deletes the table on the way out. Docs: CLI doc page describes the contract, error mapping, project resolution, and adds a BigQuery usage example. docs/README.md index entry + CHANGELOG entry updated. * fix(revalidate-cli): wrap BQ client construction + enforce exactly-one-column + fix stale example SQL Addresses PR #151 round-1 reviewer findings. P1 - BigQuery client construction now wrapped in its own try/except. Previously ``_make_bq_client(...)`` ran OUTSIDE the try that wraps query failures, so auth / ADC / invalid- credentials / network failures escaped as raw tracebacks despite the docs promising clean CLI errors. The construction wrap distinguishes the failure mode from query-time failures via a separate prefix: * "BigQuery client construction failed: ..." for auth/ADC errors before ``client.query`` runs. * "BigQuery query failed: ..." for syntax/permission/table- not-found at query time. Our own ``_CliError`` from project-inference failure inside the factory passes through unchanged (no re-wrapping). P2 - "Exactly one column named event_json" contract is now enforced. Previously a query returning ``event_json, extra_col`` was silently accepted because the row loop only reads ``row["event_json"]``. Now validated against the first row's keys before iteration; mismatch fails with exit 2 listing the offending column set so an operator can fix the SQL. Check is via ``sorted(rows[0].keys())`` (works for both ``bigquery.Row`` and dict fakes). Empty result sets are a no-op — nothing to validate, harness produces a zero-event report. P3 - Stale example SQL fixed. The doc previously used ``WHERE event_timestamp BETWEEN @start_time AND @end_time``, but the CLI doesn't accept query parameters; copying the example as-is would fail at BigQuery. Replaced with literal ``TIMESTAMP('2026-05-01')`` / ``TIMESTAMP('2026-05-12')`` and explicit doc text: "The SQL file must be fully self-contained — the CLI does not accept query parameters, so substitute concrete literals before invoking." Tests: 31 -> 33 in tests/test_extractor_compilation_cli_revalidate.py. * test_bq_client_construction_failure_returns_two: monkey-patches _make_bq_client to raise RuntimeError; asserts exit 2 with "BigQuery client construction failed" prefix + the underlying message. * test_bq_extra_column_rejected: row with extra_col key fails with "exactly one column ... got ['event_json', 'extra_col']"; report is NOT written. All 32 CLI tests + 1 entry-point skip pass; pyink + isort clean on the touched files. * fix(revalidate-cli): validate column contract via job.schema (catches zero-row wrong-schema case) PR #151 round-2 reviewer finding. P2 - The previous "exactly one column" check only ran when rows is non-empty. A query like ``SELECT event_json, extra_col FROM t WHERE FALSE`` returns zero rows but still has the wrong schema; the CLI silently skipped validation and wrote a successful zero-event report — a misleading "all clear" signal when the SQL itself is broken. Fix: derive column names via a new ``_query_result_column_names(job, rows)`` helper that prefers ``job.schema`` (real ``bigquery.QueryJob`` populates this regardless of row count) and falls back to ``sorted(rows[0].keys())`` only when the job lacks a usable schema (test fakes that don't simulate it). Returns ``None`` on the degenerate "fake + zero rows + no schema" case, which the caller treats as "skip the contract check" — that preserves existing fake-based tests without weakening the real-BigQuery path (production always has schema). The fake test client now supports an optional ``schema=`` parameter (list of objects exposing ``.name``) so tests can simulate both correct and incorrect schemas independently of the row payload. Tests: 33 -> 35. * test_bq_extra_column_rejected_on_empty_result_set: zero rows + ``schema=[event_json, extra_col]`` → exit 2 with the column-set listed; report NOT written. The bug reproducer. * test_bq_correct_schema_empty_result_set_succeeds: zero rows + ``schema=[event_json]`` → exit 0 with a zero-event report. Locks the design choice that an empty-but-well-shaped result is a valid revalidation outcome, not an error. Docs: failure-mode table notes the schema-based check; contract section + test list updated.
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Operationalizes
revalidate_compiled_extractorsso ops can run periodic checks without writing Python. Local inputs only this round —--events-bq-queryis a deliberate follow-up so the auth/location/pagination surface stays isolated from the operational-loop contract.Flags
--bundles-root--events-jsonl--reference-extractors-moduleEXTRACTORS+RESOLVED_GRAPH(+ optionalSPEC).--thresholds-jsonRevalidationThresholdsfields.--report-outRevalidationReport+ThresholdCheckResult.Reference module contract
The CLI carries no ontology/binding flags because the reference module is the operational contract that defined both the event_type-to-callable mapping AND the spec they validate against. One module, one contract.
Exit codes
01violations[]2Test plan
tests/test_extractor_compilation_cli_revalidate.py— 15 cases:TestCliEndToEnd(3): happy path; threshold pass; threshold violation.TestCliUsageErrors(11): missing events file; malformed JSONL line; missing bundles root; mixed-fingerprint bundles root; empty bundles root; reference module not importable; reference module missingEXTRACTORS; reference module missingRESOLVED_GRAPH; badEXTRACTORSshape; thresholds JSON with unknown field; thresholds JSON with out-of-range rate.test_console_script_entry_point_registered(1): locks thepyproject.toml [project.scripts]entry so a typo in the entry-point string fails CI rather than breaking the binary at user-install time.extractor_compilationtests pass.pyink+isortclean.Out of scope (deferred)
--events-bq-query— follow-up PR; brings auth + location + pagination + error handling.--report-outwrites a local file; pushing elsewhere is the caller's concern.