Stage identical baseline copies and test renames for lineage tracking#122
Stage identical baseline copies and test renames for lineage tracking#122gigistark-google wants to merge 2 commits into
Conversation
caohy1988
left a comment
There was a problem hiding this comment.
Review of PR #122 head 3468e9c
This is the "stage identical baseline copies" PR in the three-PR stack (#121 → #122 → #123). The stated intent is to pre-stage four new files as byte-identical copies of existing source files so that git's rename detection can track the move when PR #123 deletes the originals and modifies the new files. The PR is mechanical by design.
Verified
V1 — Source copies are byte-identical to originals
diff <(grep -v "^#" evaluators.py) <(grep -v "^#" system_evaluator.py) → no diff
diff <(grep -v "^#" grader_pipeline.py) <(grep -v "^#" aggregate_grader.py) → no diff
diff <(grep -v "^#" multi_trial.py) <(grep -v "^#" multi_trial_performance_evaluator.py) → no diff
diff <(grep -v "^#" trace_evaluator.py) <(grep -v "^#" performance_evaluator.py) → no diff
V2 — Test files are renamed (not copied), git rename detection succeeds
git diff --stat shows:
tests/{test_grader_pipeline.py => test_aggregate_grader.py} | 10 +--
tests/{test_multi_trial.py => test_multi_trial_performance_evaluator.py} | 0
tests/{test_trace_evaluator.py => test_performance_evaluator.py} | 0
tests/{test_sdk_evaluators.py => test_system_evaluator.py} | 90 +++++++++++-----------
The => arrows confirm git tracked these as renames. The 90-line and 10-line edits to two of them are the rename refactor's textual changes (likely CodeEvaluator → SystemEvaluator and similar test-name updates) — that part is consistent with the stack's intent.
Concerns
C1 — Asymmetric copy strategy may cause class-identity issues if this PR lands alone
This PR is intended to be a transient state on the way to #123. In its current form, if it merged alone, the SDK would have two identical-but-distinct classes under different import paths:
bigquery_agent_analytics.evaluators.SystemEvaluatorbigquery_agent_analytics.system_evaluator.SystemEvaluator
These are NOT the same object — class identity (isinstance, is) would fail across the two paths. Anyone who does from bigquery_agent_analytics.evaluators import SystemEvaluator as E1 and from bigquery_agent_analytics.system_evaluator import SystemEvaluator as E2 gets E1 is E2 → False.
If the merge sequence is strictly 121 → 122 → 123 → main, this is fine — #123 promotes one as canonical and converts the other into a shim. But if #122 ever lands without #123 immediately after, downstream code that does isinstance checks across modules breaks silently.
Recommendation: either land #122 and #123 as a single merge (squash), or add a guard test in #122 that asserts identity-cross-path so the transient-broken state is caught before it can be merged in isolation.
(This is a process concern, not a code concern. The actual files are correctly staged.)
C2 — Stale base, 24 commits behind origin/main
Same base as PR #121 (292320b, 24 commits behind). When this lands, both PRs need to be rebased onto current main; the four copy files in this PR are based on the pre-rebase versions of their originals. After PR #121's rebase, the originals in main may have drifted, and the copies in PR #122 will need to be re-staged from the new originals. Otherwise the "byte-identical" invariant breaks.
C3 — evaluators.py and system_evaluator.py both export SystemEvaluator to __all__
After this PR, bigquery_agent_analytics.__init__.py does:
from .evaluators import SystemEvaluator # one copy…but if anyone imports from bigquery_agent_analytics.system_evaluator.SystemEvaluator, they get a different object. Until #123 lands, the two definitions diverge in identity even though their content is identical. See C1.
Verdict
Comment-only — the PR does exactly what it says (stage copies for lineage tracking), and the diffs verify the byte-identical claim. The concerns are about the merge-ordering contract with #123. Either:
- Confirm in the PR description that #122 and #123 land together as a single merge (or as an immediately-sequenced pair), or
- Add an identity-cross-path guard test to make the transient-broken state non-mergeable in isolation.
Plus the stale-base rebase concern (C2) applies to the whole stack.
|
Fresh follow-up after the existing COMMENT review: Medium — the staged modules are not actually exercised by the renamed tests. The new files are staged (
So the byte-identical copies are present on disk, but this PR does not prove those new import paths work. That is separate from the already-noted class-identity issue. If #122 is intended to be mergeable as an intermediate stage, add smoke/identity tests like: from bigquery_agent_analytics.system_evaluator import SystemEvaluator as NewSystemEvaluator
from bigquery_agent_analytics.evaluators import SystemEvaluator as OldSystemEvaluator
assert NewSystemEvaluator is OldSystemEvaluatoror make the tests import the staged modules directly. If #122 will never merge alone, this is another reason to squash #122 + #123 rather than land the intermediate state. |
Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.
3468e9c to
e6d0c78
Compare
Creates copies of relevant files for child pull request for better tracking of changes.