Retrofit comparator suites to capture-not-abort; default mode shadow (PR 2/7)#1525
Merged
Merged
Conversation
…(PR 2/7)
Builds on the error-comparison engine (PR 1). Two changes:
1. Default ERROR_COMPARISON_MODE flipped from off to shadow. Shadow runs
deep error comparison and records DIFF rows but never fails CI (only a
re-thrown Throwable does), so the weekly run stays green while error
diffs become visible. An unrecognized/unset flag now resolves to shadow.
2. Existing suites capture errors instead of letting them propagate and
abort the case:
- StatementSelect / StatementOther / PreparedStatementTypes: each side's
executeQuery/execute is captured via Captures.resultOrThrowable and
handed to ResultSetComparator.compare, which dispatches on type (both
ResultSets -> cell diff; a Throwable involved -> deep error compare).
Previously a one-sided throw produced an ERROR row and aborted.
- StatementDml / VolumeOperations: replaced ad-hoc instanceof-Exception
checks (class + sometimes message only) with a gate-aware ErrorDiffs
bridge, gaining SQLState/code/message comparison when the gate is on.
New ErrorDiffs helper honors the ERROR_COMPARISON_MODE gate so `off` is a
true kill switch for DML/Volume too: deep comparison when enabled, minimal
legacy class-only check when off — consistent with the ResultSetComparator
path. Added Captures.resultOrThrowable / closeIfResultSet helpers.
Exception-class comparison remains intentionally strict (exact match); the
subclass-tolerance-vs-baseline decision is deferred to PR 7 (from shadow
data). Message normalization also deferred to PR 7.
ErrorComparatorTest: 17 tests (added ErrorDiffs gate coverage). Reviewed
with Isaac Review across two rounds.
NO_CHANGELOG=true (test-only comparator tooling).
Co-authored-by: Isaac
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
sreekanth-db
added a commit
that referenced
this pull request
Jul 3, 2026
## What PR 3 of the error / negative-case comparison series (builds on #1524, #1525). Adds the first batch of `NEGATIVE_*` suites — error-provoking inputs the positive suites never run — comparing each endpoint's **error behavior** (exception class, SQLState, vendor code, message) via the engine from PR 1/2, under the `shadow` default. All five are **read-only** and safe on the shared connections. | Suite | Cases | |---|---| | `NEGATIVE_STATEMENT_SELECT` | missing table/column, `SELET` syntax error, `1/0`, `CAST('x' AS INT)`, `executeQuery` on an INSERT (wrong method) | | `NEGATIVE_STATEMENT_OTHER` | SHOW/DESCRIBE/EXPLAIN/SET on bad targets, `getMoreResults()` after consumption, `getUpdateCount()` before `execute()` | | `NEGATIVE_PARAM_BINDING` | out-of-range `setInt` index, too-few params, type mismatch at execute, `setBigDecimal` over precision/scale | | `NEGATIVE_PREPARED_METADATA` | `clearParameters` then unbound execute; `getMetaData()` before execute on invalid SQL / missing table / missing column | | `NEGATIVE_TYPE_CONVERSION` | `getInt`/`getByte` overflow, non-numeric `getInt`, `getObject(col, Integer.class)` on VARCHAR, `getInt` on ARRAY, `getBigDecimal` on STRUCT | Each provider captures result-or-throwable per side and folds through the existing `ResultSetComparator` / `Captures.compareCall` path. Registered in the `TestSuite` enum (auto-included under `DEFAULT_PARAMS` via `allExcept`). README suite table updated. ## Note on `Endpoint.java` This PR also **Spotless-formats `config/Endpoint.java`**, which was committed to `comparator-v2` in a non-Spotless-clean state and reformats on every build (a phantom diff for anyone running `mvn spotless:apply`). Folding the fix in here stops it recurring. **No behavior change** in that file — formatting only. ## Live validation Ran against the peco serverless warehouse (Thrift-vs-SEA, `ERROR_COMPARISON_MODE=shadow`) via a narrowed local runner: **27/27 cases executed, no harness errors.** Observed divergences are **message-wording only** (same class / SQLState / server error code; Thrift's Hive-wrapper prose vs SEA's direct text) — exactly what message normalization in PR 7 will address — plus one genuine one-sided finding: `getMoreResults()` returns `Boolean` on one side and throws a `STATEMENT_CLOSED` exception on the other. ## Testing Live shadow run (above); clean compile; Isaac Review (`/review --uncommitted`) — 0 findings (all raised comments filtered as not-real, confirming the read-only / shared-connection design). NO_CHANGELOG=true (test-only comparator tooling, no driver behavior change). This pull request and its description were written by Isaac. Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
sreekanth-db
added a commit
that referenced
this pull request
Jul 3, 2026
…1527) ## What PR 4 of the error / negative-case comparison series (builds on #1524, #1525, #1526). Adds the error-provoking DDL, DML, and batch suites — each compares the two endpoints' **error behavior** (class, SQLState, vendor code, message) via the gate-aware `ErrorDiffs` bridge, under the `shadow` default. | Suite | Cases | |---|---| | `NEGATIVE_STATEMENT_DDL` | DROP/CREATE/ALTER on missing or duplicate objects (± `IF EXISTS`), missing schema/catalog, malformed syntax | | `NEGATIVE_STATEMENT_DML` | type-mismatch INSERT, NOT NULL violation, DML on a missing table, `executeUpdate` on a SELECT (wrong method), write overflow | | `NEGATIVE_STATEMENT_BATCH` | `executeBatch` partial/full failure and a ResultSet-producing command; when both sides throw `BatchUpdateException`, also compares per-element update counts | ## Isolation These suites mutate catalog objects, so each runs in **its own namespace** under `comparator_ddl_tests` (seeded fresh, dropped in a `finally`), keeping them safe on the shared connections. ## Included fix: NPE in the shared `ErrorDiffs.compare` The neither-threw case is now guarded **centrally** (returns empty, mirroring the deep path's `NOT_APPLICABLE`) before reaching `legacyClassOnly`, which dereferences the null `error()` of a RETURNED outcome. The new suites call `ErrorDiffs.compare` unconditionally, so an `off`-mode both-succeed case (e.g. `DROP TABLE IF EXISTS`) would otherwise NPE and abort. This touches the shared helper introduced in #1525; added a regression test. (Prior callers were unaffected because they guarded with `if threw`.) ## Live validation Ran against the peco serverless warehouse (Thrift-vs-SEA, `ERROR_COMPARISON_MODE=shadow`) **twice** — 21/21 cases, identical results both runs. Critically, positive `STATEMENT_DDL` / `STATEMENT_DML` run in the **same JVM after** the destructive negatives and still **PASS** — confirming no shared-connection poisoning. Observed divergences are the expected Thrift-Hive-wrapper-vs-SEA-direct-text kind (same server error codes: `CAST_INVALID_INPUT`, `SCHEMA_NOT_FOUND`, `TABLE_OR_VIEW_NOT_FOUND`, etc.), which message normalization in PR 7 will address. ## Testing `ErrorComparatorTest` — 18/18 (added the neither-threw guard regression test); two consistent live shadow runs; clean compile; Isaac Review (`/review --uncommitted`) — 0 findings after the NPE fix. NO_CHANGELOG=true (test-only comparator tooling, no driver behavior change). This pull request and its description were written by Isaac. Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
PR 2 of the error / negative-case comparison series (builds on #1524). Two changes:
1. Default
ERROR_COMPARISON_MODEflippedoff→shadow. Shadow runs deep error comparison and records DIFF rows but never fails CI (only a re-thrown Throwable does), so the weekly run stays green while error-shaped diffs become visible. An unset or unrecognized flag now resolves toshadow.2. Existing suites now capture errors instead of propagating (which aborted the case):
StatementSelect/StatementOther/PreparedStatementTypes— each side'sexecuteQuery/executeis captured viaCaptures.resultOrThrowableand handed toResultSetComparator.compare, which dispatches on runtime type (both ResultSets → cell diff; a Throwable involved → deep error comparison). Previously a one-sided throw produced anERRORrow and aborted the case.StatementDml/VolumeOperations— replaced ad-hocinstanceof Exceptionchecks (class + sometimes message only) with the gate-awareErrorDiffsbridge, gaining SQLState / vendor-code / message comparison when the gate is on.New
ErrorDiffshelper honors theERROR_COMPARISON_MODEgate sooffis a true kill switch for DML/Volume too: deep comparison when enabled, minimal legacy class-only check when off — consistent with theResultSetComparatorpath. AddedCaptures.resultOrThrowable/closeIfResultSet.Deliberate scope decisions
Behavior note
With the shadow default, error-provoking inputs in the retrofitted positive suites will now surface as DIFF rows in the CSV (visibility without failing CI) — the intended shadow behavior. The weekly report will show more error-shaped diffs than before.
Testing
ErrorComparatorTest— 17/17 passing, including newErrorDiffsgate coverage (off → legacy class-only, no SQLState/message diff; shadow → deep comparison; off still flags a differing class). Reviewed with Isaac Review (/review --uncommitted) across two rounds.NO_CHANGELOG=true (test-only comparator tooling, no driver behavior change).
This pull request and its description were written by Isaac.
This PR is assisted by Isaac Autopilot: View the full session spec, test plan, and artifacts.