Skip to content

feat: bqaa-revalidate-extractors CLI for local revalidation (#75 follow-up to C2.d)#150

Merged
haiyuan-eng-google merged 5 commits into
GoogleCloudPlatform:mainfrom
caohy1988:feat/extractor-revalidate-cli
May 12, 2026
Merged

feat: bqaa-revalidate-extractors CLI for local revalidation (#75 follow-up to C2.d)#150
haiyuan-eng-google merged 5 commits into
GoogleCloudPlatform:mainfrom
caohy1988:feat/extractor-revalidate-cli

Conversation

@caohy1988
Copy link
Copy Markdown
Contributor

Summary

Operationalizes revalidate_compiled_extractors so ops can run periodic checks without writing Python. Local inputs only this round — --events-bq-query is a deliberate follow-up so the auth/location/pagination surface stays isolated from the operational-loop contract.

Flags

Flag Required Description
--bundles-root yes Bundle root; fingerprint auto-detected from first manifest, mixed fingerprints fail-closed.
--events-jsonl yes One event JSON object per line; malformed line aborts with line number.
--reference-extractors-module yes Dotted path; module exposes EXTRACTORS + RESOLVED_GRAPH (+ optional SPEC).
--thresholds-json no JSON object with any subset of RevalidationThresholds fields.
--report-out yes Combined JSON: raw RevalidationReport + ThresholdCheckResult.

Reference module contract

EXTRACTORS:     dict[str, Callable[[dict, Any], StructuredExtractionResult]]
RESOLVED_GRAPH: ResolvedGraph     # from resolve(ontology, binding)
SPEC:           Any = None        # optional

The CLI carries no ontology/binding flags because the reference module is the operational contract that defined both the event_type-to-callable mapping AND the spec they validate against. One module, one contract.

Exit codes

Code Meaning
0 Pass (no thresholds OR all thresholds passed)
1 Threshold violation; report still written with violations[]
2 Usage/load/input error; report not written

Test plan

  • tests/test_extractor_compilation_cli_revalidate.py — 15 cases:
    • TestCliEndToEnd (3): happy path; threshold pass; threshold violation.
    • TestCliUsageErrors (11): missing events file; malformed JSONL line; missing bundles root; mixed-fingerprint bundles root; empty bundles root; reference module not importable; reference module missing EXTRACTORS; reference module missing RESOLVED_GRAPH; bad EXTRACTORS shape; thresholds JSON with unknown field; thresholds JSON with out-of-range rate.
    • test_console_script_entry_point_registered (1): locks the pyproject.toml [project.scripts] entry so a typo in the entry-point string fails CI rather than breaking the binary at user-install time.
  • 189 related extractor_compilation tests pass.
  • pyink + isort clean.

Out of scope (deferred)

  • --events-bq-query — follow-up PR; brings auth + location + pagination + error handling.
  • Scheduled execution — operator owns cron / Cloud Scheduler / GitHub Actions.
  • BQ persistence of reports — --report-out writes a local file; pushing elsewhere is the caller's concern.
  • Rollout doc — lands after the BQ-query follow-up so the doc has the full CLI surface to describe.

caohy1988 and others added 5 commits May 12, 2026 01:31
…ogleCloudPlatform#75 follow-up to C2.d)

Operationalizes revalidate_compiled_extractors so ops can run
periodic revalidation without writing Python. **Local inputs
only** this round; --events-bq-query is a deliberate follow-up
so the auth/location/pagination surface stays isolated from the
operational-loop contract.

Public surface
(src/bigquery_agent_analytics/extractor_compilation/cli_revalidate.py):

* main(argv) — argparse-driven entry. Wired through
  pyproject.toml [project.scripts] as
  ``bqaa-revalidate-extractors``.
* Flags: --bundles-root, --events-jsonl,
  --reference-extractors-module, --thresholds-json
  (optional), --report-out.
* Exit codes: 0 pass / 1 threshold violation (report still
  written) / 2 usage-or-input error (report not written).

Reference module contract: the dotted-path module passed to
--reference-extractors-module must expose

    EXTRACTORS:     dict[str, Callable[[dict, Any], StructuredExtractionResult]]
    RESOLVED_GRAPH: ResolvedGraph     # from resolve(ontology, binding)
    SPEC:           Any = None        # optional

The CLI deliberately carries no ontology / binding flags
because the reference module is the operational contract that
defined both the event_type-to-callable mapping AND the spec
they validate against. One module, one contract.

Implementation notes:

* Fingerprint is auto-detected from the first bundle's
  manifest; mixed fingerprints under one --bundles-root
  fail-closed at exit 2 (a deployment mistake for
  revalidation).
* JSONL parsing is strict — a malformed line aborts with
  exit 2 naming the line number. The harness's per-event
  skipped_events path is for legitimately-shaped events
  without coverage, not corrupted input.
* Reference-module shape is checked at the CLI boundary
  (missing EXTRACTORS / RESOLVED_GRAPH; non-dict EXTRACTORS;
  non-string-or-empty keys; non-callable values) so the
  harness never sees a malformed registry.
* Threshold JSON is validated via the existing
  RevalidationThresholds.__post_init__ bounds check; unknown
  fields also rejected so a typo doesn't silently produce a
  no-op gate.
* Report JSON shape: {"report": ..., "threshold_check": null
  | {"ok": bool, "violations": [str, ...]}}.

Tests
(tests/test_extractor_compilation_cli_revalidate.py, 15 cases):

* TestCliEndToEnd (3): happy path (exit 0,
  threshold_check=null); threshold pass (exit 0, ok=True);
  threshold violation (exit 1, report still written with
  violations listed). All run against real BKA fixtures plus
  hand-built bundles whose compiled source either agrees with
  or drifts from the reference.
* TestCliUsageErrors (11): missing events file; malformed
  JSONL line; missing bundles root; mixed-fingerprint bundles
  root; empty bundles root; reference module not importable;
  reference module missing EXTRACTORS; reference module
  missing RESOLVED_GRAPH; bad EXTRACTORS shape; thresholds
  JSON with unknown field; thresholds JSON with
  out-of-range rate.
* test_console_script_entry_point_registered (1): locks the
  pyproject.toml entry so a typo fails CI rather than
  breaking the binary at user-install time.

Docs: docs/extractor_compilation_revalidate_cli.md describes
the contract, exit codes, report shape, and reference-module
surface. docs/README.md index entry + CHANGELOG entry added.

189 related extractor_compilation tests pass.
Addresses PR GoogleCloudPlatform#150 round-1 reviewer findings.

P1 - --report-out parent-directory preflight + write wrap.
The docs promise usage/input errors exit 2 with no report,
but ``--report-out missing/report.json`` used to raise
FileNotFoundError out of ``_write_report``. Now:

* ``_load_config`` preflight checks ``report_out.parent.is_dir()``
  before doing any work; missing parent surfaces as a clean
  ``_CliError`` with a "create it before running" message.
* ``_write_report`` wraps ``path.write_text(...)`` in
  try/except OSError so the late-failure cases (permissions,
  disk full, parent removed between preflight and write)
  also surface as ``_CliError`` rather than a traceback.

P2 - I/O exception wrapping on --events-jsonl and
--thresholds-json. The previous code only caught
JSONDecodeError; OSError (permission denied, file removed
between is_file check and open) and UnicodeError (invalid
UTF-8 in a tampered or wrong-encoding file) leaked as raw
tracebacks. Both readers now catch both exception types and
surface them as ``_CliError`` with the file path named.

Tests: 15 -> 18. Three new cases in ``TestCliUsageErrors``:

* test_report_out_parent_missing_exits_two_cleanly:
  ``--report-out tmp_path/does-not-exist/report.json`` exits
  2 with "does not exist" in the message; the missing
  parent dir is NOT created.
* test_events_jsonl_invalid_utf8_exits_two_cleanly: binary
  bytes (0xff 0xfe ...) in the JSONL file exits 2 with
  "UTF-8" in the message; no report leaked.
* test_thresholds_json_invalid_utf8_exits_two_cleanly: same
  defensive shape on the thresholds reader.

Docs: ``--report-out`` flag description spells out the
preflight + late-failure-wrap behavior; ``TestCliUsageErrors``
test count bumped to 14 plus three new lines for the new
cases.
Addresses PR GoogleCloudPlatform#150 round-2 reviewer finding.

P2 - argparse's default ArgumentParser.error() writes the
usage line + error message to stderr and then calls sys.exit(2),
which bypasses main()'s documented "returns an exit code"
contract. Tests calling ``main([])`` would catch a raised
SystemExit instead of receiving the documented
``EXIT_USAGE_ERROR`` return value.

Fix: subclass argparse.ArgumentParser as
``_NonExitingArgumentParser`` whose ``error()`` raises
``_CliError`` instead of calling sys.exit. The usage line is
still printed to stderr first (matches argparse's default
UX); main() catches ``_CliError`` from ``parse_args()``
through the same boundary that handles every other usage
failure and returns ``EXIT_USAGE_ERROR``.

``exit()`` is NOT overridden, so ``--help`` and ``--version``
still terminate via ``SystemExit(0)`` — that's the expected
terminal behavior for those flags. The override only catches
argument errors that argparse would otherwise route through
``sys.exit(2)``.

Tests: 18 -> 20. Two new cases in ``TestCliUsageErrors``:

* test_missing_required_flag_returns_two_not_systemexit:
  ``main([])`` returns 2 (no raised SystemExit); stderr
  contains the standard "the following arguments are
  required" wording plus the usage line.
* test_unrecognized_flag_returns_two_not_systemexit: all
  required flags supplied plus a bogus ``--no-such-flag``
  returns 2 with "unrecognized arguments" in stderr.

Docs: exit-code table calls out that ``main(argv)`` returns
``EXIT_USAGE_ERROR`` rather than raising on bad flags;
``--help`` / ``--version`` still ``SystemExit(0)`` behavior is
documented as intentional.
…xit-code table

PR GoogleCloudPlatform#150 round-3 P3 cleanup. The CLI's main() docstring,
_NonExitingArgumentParser docstring, and the exit-code
table in the doc page all claimed ``--version`` exited via
SystemExit(0) — but the parser doesn't actually define a
``--version`` action. ``main(["--version"])`` falls through
to the missing-required-args path and returns 2.

Fix: remove the ``--version`` claims to match reality. The
CLI inherits ``--help`` from argparse for free; ``--version``
is deliberately deferred until the package has a stable
version-emission strategy (the existing pyproject.toml
``version`` field would need wiring through
``importlib.metadata.version("bigquery-agent-analytics")``,
which is scope creep beyond the reviewer's actual finding).

No code changes; no test changes. All 19 CLI tests + 1
entry-point skip still pass.
@haiyuan-eng-google haiyuan-eng-google merged commit c172bcb into GoogleCloudPlatform:main May 12, 2026
9 checks passed
haiyuan-eng-google pushed a commit that referenced this pull request May 12, 2026
…ors (#75 follow-up) (#151)

* feat: --events-bq-query-file event source for bqaa-revalidate-extractors (#75 follow-up)

The CLI now accepts a BigQuery event source in addition to
--events-jsonl; the two are mutually exclusive and exactly one
must be supplied. Promised follow-up to the local-only landing
of PR #150.

Contract: the SQL must produce exactly one column named
``event_json`` (STRING) per row, containing a JSON-encoded
event dict — same shape --events-jsonl consumes line-by-line.
The CLI does NOT auto-shape bigquery.Row objects; the query
writer controls projection (typically via
``TO_JSON_STRING(STRUCT(...))``). Explicit non-goal: row-shape
auto-inference would couple the CLI to every caller's table
schema; the single-column contract keeps the CLI predictable.

New flags:

* --events-bq-query-file PATH (mutex with --events-jsonl;
  argparse mutex group with required=True so "both" / "neither"
  both return exit 2 via the existing _CliError boundary).
* --bq-project PROJECT_ID — OPTIONAL. When absent the BigQuery
  client falls back to Application Default Credentials /
  environment for project inference. If both the flag and the
  inferred project are empty, the CLI exits 2 with
  "Set --bq-project explicitly" rather than confusing the
  operator with a downstream BigQuery API error.
* --bq-location LOC — defaults to "US"; ignored under
  --events-jsonl.

Error handling — every failure mode surfaces as a clean exit 2
with the file path and (for row-level errors) the 0-based row
index named, so an operator can find the offender with
``LIMIT N OFFSET row_index``:

* BigQuery raises (auth, syntax, table-not-found, permission)
  → "BigQuery query failed: <Type>: <message>"
* Row missing the event_json column
  → "row N: missing required column 'event_json'"
* event_json value isn't a STRING (struct projected without
  TO_JSON_STRING) → "row N: 'event_json' must be STRING"
* event_json STRING isn't valid JSON
  → "row N: invalid JSON in 'event_json'"
* event_json decodes to non-object (array, scalar)
  → "row N: 'event_json' decodes to <type>, expected a JSON
  object"
* Empty / whitespace-only .sql file → "is empty"
* Invalid UTF-8 in .sql file → "not valid UTF-8"

Implementation notes:

* Client construction is centralized behind
  ``_make_bq_client(project, location)`` so unit tests inject
  fakes via ``monkeypatch.setattr(cli_revalidate,
  "_make_bq_client", ...)`` rather than wiring through every
  call site. The real factory tries
  ``bigquery.Client(project=...)`` first; falls back to
  ``bigquery.Client()`` if project is None; raises _CliError if
  client.project ends up falsy.
* ``_load_events_from_bq`` reads the .sql file (with the same
  UTF-8 / OSError wrapping pattern as _load_jsonl /
  _load_thresholds), constructs the client via the factory,
  runs ``client.query(sql).result()`` inside a single
  try/except that catches Exception (not BaseException, so
  KeyboardInterrupt / SystemExit still propagate), then
  iterates rows with row-indexed validation.
* ``_load_config`` dispatches on whichever event-source flag
  is set; argparse's mutex group enforces exactly one.

Tests: 20 -> 31 in tests/test_extractor_compilation_cli_revalidate.py.

* TestCliUsageErrors gains two cases for the mutex (both /
  neither event sources).
* New TestCliEventsBQ class (9 cases): happy path, ADC
  project inference, no-project-anywhere, query exceptions,
  every row-shape failure mode (missing column, non-string,
  invalid JSON, non-dict decode), empty .sql file.

Live BQ test in tests/test_extractor_compilation_cli_revalidate_bq_live.py,
gated behind BQAA_RUN_LIVE_TESTS=1 +
BQAA_RUN_LIVE_BQ_REVALIDATE_TESTS=1 + PROJECT_ID + DATASET_ID.
Creates a temp table, inserts two event_json rows shaped like
BKA decisions, runs the CLI, asserts the report is written
with both events as compiled_unchanged + parity_matches,
deletes the table on the way out.

Docs: CLI doc page describes the contract, error mapping,
project resolution, and adds a BigQuery usage example.
docs/README.md index entry + CHANGELOG entry updated.

* fix(revalidate-cli): wrap BQ client construction + enforce exactly-one-column + fix stale example SQL

Addresses PR #151 round-1 reviewer findings.

P1 - BigQuery client construction now wrapped in its own
try/except. Previously ``_make_bq_client(...)`` ran OUTSIDE
the try that wraps query failures, so auth / ADC / invalid-
credentials / network failures escaped as raw tracebacks
despite the docs promising clean CLI errors. The construction
wrap distinguishes the failure mode from query-time failures
via a separate prefix:

* "BigQuery client construction failed: ..." for auth/ADC
  errors before ``client.query`` runs.
* "BigQuery query failed: ..." for syntax/permission/table-
  not-found at query time.

Our own ``_CliError`` from project-inference failure inside
the factory passes through unchanged (no re-wrapping).

P2 - "Exactly one column named event_json" contract is now
enforced. Previously a query returning ``event_json,
extra_col`` was silently accepted because the row loop only
reads ``row["event_json"]``. Now validated against the first
row's keys before iteration; mismatch fails with exit 2
listing the offending column set so an operator can fix the
SQL.

Check is via ``sorted(rows[0].keys())`` (works for both
``bigquery.Row`` and dict fakes). Empty result sets are a
no-op — nothing to validate, harness produces a zero-event
report.

P3 - Stale example SQL fixed. The doc previously used
``WHERE event_timestamp BETWEEN @start_time AND @end_time``,
but the CLI doesn't accept query parameters; copying the
example as-is would fail at BigQuery. Replaced with literal
``TIMESTAMP('2026-05-01')`` / ``TIMESTAMP('2026-05-12')``
and explicit doc text: "The SQL file must be fully
self-contained — the CLI does not accept query parameters,
so substitute concrete literals before invoking."

Tests: 31 -> 33 in
tests/test_extractor_compilation_cli_revalidate.py.

* test_bq_client_construction_failure_returns_two:
  monkey-patches _make_bq_client to raise RuntimeError;
  asserts exit 2 with "BigQuery client construction failed"
  prefix + the underlying message.
* test_bq_extra_column_rejected: row with extra_col key
  fails with "exactly one column ... got ['event_json',
  'extra_col']"; report is NOT written.

All 32 CLI tests + 1 entry-point skip pass; pyink + isort
clean on the touched files.

* fix(revalidate-cli): validate column contract via job.schema (catches zero-row wrong-schema case)

PR #151 round-2 reviewer finding.

P2 - The previous "exactly one column" check only ran when
rows is non-empty. A query like ``SELECT event_json,
extra_col FROM t WHERE FALSE`` returns zero rows but still
has the wrong schema; the CLI silently skipped validation
and wrote a successful zero-event report — a misleading
"all clear" signal when the SQL itself is broken.

Fix: derive column names via a new
``_query_result_column_names(job, rows)`` helper that
prefers ``job.schema`` (real ``bigquery.QueryJob`` populates
this regardless of row count) and falls back to
``sorted(rows[0].keys())`` only when the job lacks a usable
schema (test fakes that don't simulate it). Returns ``None``
on the degenerate "fake + zero rows + no schema" case,
which the caller treats as "skip the contract check" — that
preserves existing fake-based tests without weakening the
real-BigQuery path (production always has schema).

The fake test client now supports an optional ``schema=``
parameter (list of objects exposing ``.name``) so tests can
simulate both correct and incorrect schemas independently
of the row payload.

Tests: 33 -> 35.

* test_bq_extra_column_rejected_on_empty_result_set:
  zero rows + ``schema=[event_json, extra_col]`` → exit 2
  with the column-set listed; report NOT written. The
  bug reproducer.
* test_bq_correct_schema_empty_result_set_succeeds:
  zero rows + ``schema=[event_json]`` → exit 0 with a
  zero-event report. Locks the design choice that an
  empty-but-well-shaped result is a valid revalidation
  outcome, not an error.

Docs: failure-mode table notes the schema-based check;
contract section + test list updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants