Stage identical baseline copies and test renames for lineage tracking by gigistark-google · Pull Request #122 · GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK

gigistark-google · 2026-05-06T18:54:13Z

Creates copies of relevant files for child pull request for better tracking of changes.

caohy1988

Review of PR #122 head `3468e9c`

This is the "stage identical baseline copies" PR in the three-PR stack (#121 → #122 → #123). The stated intent is to pre-stage four new files as byte-identical copies of existing source files so that git's rename detection can track the move when PR #123 deletes the originals and modifies the new files. The PR is mechanical by design.

Verified

V1 — Source copies are byte-identical to originals

diff <(grep -v "^#" evaluators.py) <(grep -v "^#" system_evaluator.py)              → no diff
diff <(grep -v "^#" grader_pipeline.py) <(grep -v "^#" aggregate_grader.py)         → no diff
diff <(grep -v "^#" multi_trial.py) <(grep -v "^#" multi_trial_performance_evaluator.py) → no diff
diff <(grep -v "^#" trace_evaluator.py) <(grep -v "^#" performance_evaluator.py)    → no diff

V2 — Test files are renamed (not copied), git rename detection succeeds

git diff --stat shows:

tests/{test_grader_pipeline.py => test_aggregate_grader.py}                      | 10 +--
tests/{test_multi_trial.py => test_multi_trial_performance_evaluator.py}         |  0
tests/{test_trace_evaluator.py => test_performance_evaluator.py}                 |  0
tests/{test_sdk_evaluators.py => test_system_evaluator.py}                       | 90 +++++++++++-----------

The => arrows confirm git tracked these as renames. The 90-line and 10-line edits to two of them are the rename refactor's textual changes (likely CodeEvaluator → SystemEvaluator and similar test-name updates) — that part is consistent with the stack's intent.

Concerns

C1 — Asymmetric copy strategy may cause class-identity issues if this PR lands alone

This PR is intended to be a transient state on the way to #123. In its current form, if it merged alone, the SDK would have two identical-but-distinct classes under different import paths:

bigquery_agent_analytics.evaluators.SystemEvaluator
bigquery_agent_analytics.system_evaluator.SystemEvaluator

These are NOT the same object — class identity (isinstance, is) would fail across the two paths. Anyone who does from bigquery_agent_analytics.evaluators import SystemEvaluator as E1 and from bigquery_agent_analytics.system_evaluator import SystemEvaluator as E2 gets E1 is E2 → False.

If the merge sequence is strictly 121 → 122 → 123 → main, this is fine — #123 promotes one as canonical and converts the other into a shim. But if #122 ever lands without #123 immediately after, downstream code that does isinstance checks across modules breaks silently.

Recommendation: either land #122 and #123 as a single merge (squash), or add a guard test in #122 that asserts identity-cross-path so the transient-broken state is caught before it can be merged in isolation.

(This is a process concern, not a code concern. The actual files are correctly staged.)

C2 — Stale base, 24 commits behind `origin/main`

Same base as PR #121 (292320b, 24 commits behind). When this lands, both PRs need to be rebased onto current main; the four copy files in this PR are based on the pre-rebase versions of their originals. After PR #121's rebase, the originals in main may have drifted, and the copies in PR #122 will need to be re-staged from the new originals. Otherwise the "byte-identical" invariant breaks.

C3 — `evaluators.py` and `system_evaluator.py` both export `SystemEvaluator` to `all`

After this PR, bigquery_agent_analytics.__init__.py does:

from .evaluators import SystemEvaluator  # one copy

…but if anyone imports from bigquery_agent_analytics.system_evaluator.SystemEvaluator, they get a different object. Until #123 lands, the two definitions diverge in identity even though their content is identical. See C1.

Verdict

Comment-only — the PR does exactly what it says (stage copies for lineage tracking), and the diffs verify the byte-identical claim. The concerns are about the merge-ordering contract with #123. Either:

Confirm in the PR description that #122 and #123 land together as a single merge (or as an immediately-sequenced pair), or
Add an identity-cross-path guard test to make the transient-broken state non-mergeable in isolation.

Plus the stale-base rebase concern (C2) applies to the whole stack.

caohy1988 · 2026-05-11T22:36:56Z

Fresh follow-up after the existing COMMENT review:

Medium — the staged modules are not actually exercised by the renamed tests.

The new files are staged (system_evaluator.py, performance_evaluator.py, aggregate_grader.py, multi_trial_performance_evaluator.py), but the renamed tests still import the original modules:

tests/test_system_evaluator.py imports from bigquery_agent_analytics.evaluators, not bigquery_agent_analytics.system_evaluator.
tests/test_performance_evaluator.py imports from bigquery_agent_analytics.trace_evaluator, not bigquery_agent_analytics.performance_evaluator.
tests/test_aggregate_grader.py imports from bigquery_agent_analytics.grader_pipeline, not bigquery_agent_analytics.aggregate_grader.
tests/test_multi_trial_performance_evaluator.py imports from bigquery_agent_analytics.multi_trial, not bigquery_agent_analytics.multi_trial_performance_evaluator.

So the byte-identical copies are present on disk, but this PR does not prove those new import paths work. That is separate from the already-noted class-identity issue. If #122 is intended to be mergeable as an intermediate stage, add smoke/identity tests like:

from bigquery_agent_analytics.system_evaluator import SystemEvaluator as NewSystemEvaluator
from bigquery_agent_analytics.evaluators import SystemEvaluator as OldSystemEvaluator

assert NewSystemEvaluator is OldSystemEvaluator

or make the tests import the staged modules directly. If #122 will never merge alone, this is another reason to squash #122 + #123 rather than land the intermediate state.

Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.

gigistark-google mentioned this pull request May 6, 2026

Refactor: Rename CodeEvaluator to SystemEvaluator #110

Closed

caohy1988 mentioned this pull request May 11, 2026

Refactor: Rename CodeEvaluator to SystemEvaluator #121

Open

caohy1988 reviewed May 11, 2026

View reviewed changes

caohy1988 mentioned this pull request May 11, 2026

Unify One-Sided & Side-by-Side Performance Metrics in the PerformanceEvaluator. #123

Open

gigistark-google added 2 commits May 12, 2026 19:58

Refactor: Rename CodeEvaluator to SystemEvaluator

8e3569d

Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.

Stage identical baseline copies and test renames for lineage tracking

e6d0c78

gigistark-google force-pushed the pr2_stage branch from 3468e9c to e6d0c78 Compare May 12, 2026 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stage identical baseline copies and test renames for lineage tracking#122

Stage identical baseline copies and test renames for lineage tracking#122
gigistark-google wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
gigistark-google:pr2_stage

gigistark-google commented May 6, 2026

Uh oh!

caohy1988 left a comment

Uh oh!

caohy1988 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gigistark-google commented May 6, 2026

Uh oh!

caohy1988 left a comment

Choose a reason for hiding this comment

Review of PR #122 head 3468e9c

Verified

V1 — Source copies are byte-identical to originals

V2 — Test files are renamed (not copied), git rename detection succeeds

Concerns

C1 — Asymmetric copy strategy may cause class-identity issues if this PR lands alone

C2 — Stale base, 24 commits behind origin/main

C3 — evaluators.py and system_evaluator.py both export SystemEvaluator to __all__

Verdict

Uh oh!

caohy1988 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Review of PR #122 head `3468e9c`

C2 — Stale base, 24 commits behind `origin/main`

C3 — `evaluators.py` and `system_evaluator.py` both export `SystemEvaluator` to `all`