Honor caller-provided rowType in HiveViewExpander.expandView#607
Honor caller-provided rowType in HiveViewExpander.expandView#607simbadzina wants to merge 1 commit into
Conversation
6d5714a to
55e0f7a
Compare
HiveViewExpander.expandView() previously discarded the rowType argument and returned whatever the re-parsed view body produced. When the caller-recorded row type and the expanded RelNode's row type differed in field order, downstream consumers that rely on positional access could land on the wrong RelNode-output position. Prefix-named sibling columns (one name a prefix of another) are particularly prone to swapping under such mismatches. This change wraps the expanded RelRoot in a LogicalProject that re-aligns fields to the caller's rowType by name (case-insensitive). The wrapper is a no-op when the orderings already match. If a caller-expected field is missing from the expanded body, or if arities differ, the method falls back to returning the original RelRoot unchanged. Adds: - Unit tests for alignToRowType covering aligned, case-only-difference, reorder, prefix-pair, missing-field, and arity-mismatch cases. - An integration test that drives the full expandView path with a deliberately reversed caller-provided rowType.
55e0f7a to
26a65b6
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes Hive view expansion so expanded view outputs can be reordered to match the caller-provided row type, preventing positional column mismatches in downstream view-on-view scenarios.
Changes:
- Adds
HiveViewExpander.alignToRowTypeto reorder expanded view fields by case-insensitive name when needed. - Adds unit coverage for no-op, reorder, prefix-sibling, missing-field, and arity fallback cases.
- Adds an integration fixture and test for the full
expandViewpath.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/HiveViewExpander.java |
Adds row-type alignment after view query conversion. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveViewExpanderTest.java |
Adds unit tests for alignment behavior. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java |
Adds integration regression test for caller row-type ordering. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/TestUtils.java |
Adds Hive test table/view fixtures used by the integration test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Wrap {@code root} in a {@link LogicalProject} that re-orders its output | ||
| * fields to match {@code expected} by name (case-insensitive). No-op when the | ||
| * orderings already agree. If a name is missing from {@code root} or arities | ||
| * differ, returns {@code root} unchanged and logs a warning. | ||
| */ |
There was a problem hiding this comment.
Is not the right behavior to throw an exception? There should not be a disagreement between the caller expectation and the view structure in the first place.
There was a problem hiding this comment.
Do you mean for when there is a mismatch like an missing field, or even when there is an ordering issue?
For the ordering, I think we can self-heal drift with the alignment code since there is a complete info.
I'm assuming a change like a new column could show up a mismatch, missing field.
Automatically taking the root would expose the column to consumers, whereas an error would break the view. I think for select * we'd want the new column to seamless appear.
Summary
When a downstream view does
SELECT * FROM upstreamandupstream's HMS-recorded row type disagrees with what its re-parsed body produces in field order, the downstream's positional access can land on the wrong field. The case that surfaces this most painfully is prefix-name sibling pairs:id(INTEGER)idHash(VARCHAR)idHash(VARCHAR)id(INTEGER)A case-insensitive lookup of
idagainst the body's row type can matchidHash— the resolver gets confused when a row type contains both a short name and a longer name that has it as a prefix. Every consumer of the view then silently reads values from the wrong source column.The root cause is that
HiveViewExpander.expandView()was discarding itsrowTypeargument and returning whatever the re-parsed view body produced, violating theViewExpandercontract that the returnedRelRoot's row type must match the caller-providedrowType.Fix
After
convertQuery, wrap the returnedRelRootin aLogicalProjectthat re-aligns fields to the caller-providedrowTypeby name (case-insensitive):flowchart LR C["Caller's rowType<br/>[id INTEGER, idHash VARCHAR]"] --> E B["Re-parsed body's rowType<br/>[idHash VARCHAR, id INTEGER]"] --> E[alignToRowType] E --> Q{Arity match<br/>and field names<br/>aligned by order?} Q -->|Yes| K[Return original RelRoot] Q -->|All names present<br/>but different order| W["Wrap in LogicalProject<br/>that re-binds by name<br/>(case-insensitive)"] Q -->|Arity differs<br/>or a name is missing| L[Log warn, return original]The safe-fallback paths preserve pre-fix behavior for callers whose expanded body genuinely produces a different shape, while making such cases visible in logs.
Test plan
alignToRowTypeinHiveViewExpanderTestcovering: already-aligned (no-op fast path), case-only differences (no-op), reordered fields (wraps inProject), single prefix-sibling pair (id/idHash), multiple prefix-sibling pairs in one row type (id/idHash,email/emailVerified,country/countryCodesimultaneously — mirrors the realistic failure shape at production scale), missing field (safe fallback), and different arity (safe fallback).testExpandViewAlignsToCallerRowTypeinHiveToRelConverterTestthat drives the fullexpandViewpath with a deliberately reversed caller-provided rowType.alignToRowTypetoreturn root;causes exactly the tests targeting the fix to fail (testReorderedFieldsWrapInProject,testPrefixSiblingsReorderedCorrectly,testMultiplePrefixSiblingsAllReorderedCorrectly,testExpandViewAlignsToCallerRowType); the safe-fallback and no-op tests still pass. Restoring the fix turns the suite green.:coral-hive:testand:coral-trino:testsuites pass with no regressions.Manual verification
Validated end-to-end against a real production view-on-view chain (Avro-backed base table → intermediate Hive view → downstream view) using internal tooling that drives the full Coral translation path. Confirmed:
Also characterized the failure surface by varying the upstream view shape: the bug fires specifically when the upstream lists the longer prefix-extension name before its shorter sibling (e.g.
idHashbeforeid).SELECT *upstreams and shorter-name-first explicit projections do not trigger the bug.🤖 Generated with Claude Code