Skip to content

Stage identical baseline copies and test renames for lineage tracking#122

Open
gigistark-google wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
gigistark-google:pr2_stage
Open

Stage identical baseline copies and test renames for lineage tracking#122
gigistark-google wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
gigistark-google:pr2_stage

Conversation

@gigistark-google
Copy link
Copy Markdown

Creates copies of relevant files for child pull request for better tracking of changes.

Copy link
Copy Markdown
Contributor

@caohy1988 caohy1988 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #122 head 3468e9c

This is the "stage identical baseline copies" PR in the three-PR stack (#121#122#123). The stated intent is to pre-stage four new files as byte-identical copies of existing source files so that git's rename detection can track the move when PR #123 deletes the originals and modifies the new files. The PR is mechanical by design.

Verified

V1 — Source copies are byte-identical to originals

diff <(grep -v "^#" evaluators.py) <(grep -v "^#" system_evaluator.py)              → no diff
diff <(grep -v "^#" grader_pipeline.py) <(grep -v "^#" aggregate_grader.py)         → no diff
diff <(grep -v "^#" multi_trial.py) <(grep -v "^#" multi_trial_performance_evaluator.py) → no diff
diff <(grep -v "^#" trace_evaluator.py) <(grep -v "^#" performance_evaluator.py)    → no diff

V2 — Test files are renamed (not copied), git rename detection succeeds

git diff --stat shows:

tests/{test_grader_pipeline.py => test_aggregate_grader.py}                      | 10 +--
tests/{test_multi_trial.py => test_multi_trial_performance_evaluator.py}         |  0
tests/{test_trace_evaluator.py => test_performance_evaluator.py}                 |  0
tests/{test_sdk_evaluators.py => test_system_evaluator.py}                       | 90 +++++++++++-----------

The => arrows confirm git tracked these as renames. The 90-line and 10-line edits to two of them are the rename refactor's textual changes (likely CodeEvaluatorSystemEvaluator and similar test-name updates) — that part is consistent with the stack's intent.

Concerns

C1 — Asymmetric copy strategy may cause class-identity issues if this PR lands alone

This PR is intended to be a transient state on the way to #123. In its current form, if it merged alone, the SDK would have two identical-but-distinct classes under different import paths:

  • bigquery_agent_analytics.evaluators.SystemEvaluator
  • bigquery_agent_analytics.system_evaluator.SystemEvaluator

These are NOT the same object — class identity (isinstance, is) would fail across the two paths. Anyone who does from bigquery_agent_analytics.evaluators import SystemEvaluator as E1 and from bigquery_agent_analytics.system_evaluator import SystemEvaluator as E2 gets E1 is E2 → False.

If the merge sequence is strictly 121 → 122 → 123 → main, this is fine — #123 promotes one as canonical and converts the other into a shim. But if #122 ever lands without #123 immediately after, downstream code that does isinstance checks across modules breaks silently.

Recommendation: either land #122 and #123 as a single merge (squash), or add a guard test in #122 that asserts identity-cross-path so the transient-broken state is caught before it can be merged in isolation.

(This is a process concern, not a code concern. The actual files are correctly staged.)

C2 — Stale base, 24 commits behind origin/main

Same base as PR #121 (292320b, 24 commits behind). When this lands, both PRs need to be rebased onto current main; the four copy files in this PR are based on the pre-rebase versions of their originals. After PR #121's rebase, the originals in main may have drifted, and the copies in PR #122 will need to be re-staged from the new originals. Otherwise the "byte-identical" invariant breaks.

C3 — evaluators.py and system_evaluator.py both export SystemEvaluator to __all__

After this PR, bigquery_agent_analytics.__init__.py does:

from .evaluators import SystemEvaluator  # one copy

…but if anyone imports from bigquery_agent_analytics.system_evaluator.SystemEvaluator, they get a different object. Until #123 lands, the two definitions diverge in identity even though their content is identical. See C1.

Verdict

Comment-only — the PR does exactly what it says (stage copies for lineage tracking), and the diffs verify the byte-identical claim. The concerns are about the merge-ordering contract with #123. Either:

  • Confirm in the PR description that #122 and #123 land together as a single merge (or as an immediately-sequenced pair), or
  • Add an identity-cross-path guard test to make the transient-broken state non-mergeable in isolation.

Plus the stale-base rebase concern (C2) applies to the whole stack.

@caohy1988
Copy link
Copy Markdown
Contributor

Fresh follow-up after the existing COMMENT review:

Medium — the staged modules are not actually exercised by the renamed tests.

The new files are staged (system_evaluator.py, performance_evaluator.py, aggregate_grader.py, multi_trial_performance_evaluator.py), but the renamed tests still import the original modules:

  • tests/test_system_evaluator.py imports from bigquery_agent_analytics.evaluators, not bigquery_agent_analytics.system_evaluator.
  • tests/test_performance_evaluator.py imports from bigquery_agent_analytics.trace_evaluator, not bigquery_agent_analytics.performance_evaluator.
  • tests/test_aggregate_grader.py imports from bigquery_agent_analytics.grader_pipeline, not bigquery_agent_analytics.aggregate_grader.
  • tests/test_multi_trial_performance_evaluator.py imports from bigquery_agent_analytics.multi_trial, not bigquery_agent_analytics.multi_trial_performance_evaluator.

So the byte-identical copies are present on disk, but this PR does not prove those new import paths work. That is separate from the already-noted class-identity issue. If #122 is intended to be mergeable as an intermediate stage, add smoke/identity tests like:

from bigquery_agent_analytics.system_evaluator import SystemEvaluator as NewSystemEvaluator
from bigquery_agent_analytics.evaluators import SystemEvaluator as OldSystemEvaluator

assert NewSystemEvaluator is OldSystemEvaluator

or make the tests import the staged modules directly. If #122 will never merge alone, this is another reason to squash #122 + #123 rather than land the intermediate state.

Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants