HIVE-29413: Generalise column related APIs in Table.java by ramitg254 · Pull Request #6413 · apache/hive

ramitg254 · 2026-04-07T12:51:34Z

What changes were proposed in this pull request?

added getEffectivePartCols() in most places possible to avoid code duplication.

Why are the changes needed?

getPartCols() does not have support for iceberg tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ci tests and local build

deniskuzZ · 2026-04-09T20:12:19Z

@ramitg254 please take a look: 9e7535c. I would suggest following similar approach

ramitg254 · 2026-04-10T05:36:07Z

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

deniskuzZ · 2026-04-10T06:50:49Z

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

Where did I say that? The ask was to keep the original method unchanged. same here

ramitg254 · 2026-04-10T07:08:55Z

oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols()

ramitg254 · 2026-04-10T07:18:52Z

I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed.
So I was first focussing on making getPartCols() unified for iceberg tables as well.

please share your thoughts on this idea

# Conflicts: # iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out

Change-Id: I09ffba356ac47e3416c8b6717e8671d2cb6432b8

Change-Id: I6812d068be702edf01b158c3e7eacadeadbd5c2b

Change-Id: I6f5d81c5be00e080753efd5e8157b14fcfde2dde

Change-Id: I5a95791b4d756fe949ee4697204be9de3025c6b5

Change-Id: I852cf156a1a5ac20525bb9376f4cbb5783b8e1e3

Change-Id: Ib1542e38c65c7811074b97d7da7aaf780f3895dd

Change-Id: Ie3440239248870fb44c6e956d5be10271b28a1a0

Change-Id: Ic8f6d6c391ba49e12de7d04b7bb20542d879b0a3

Change-Id: I2017f7c771f8f611b5d8900d3e0276817842b41f

Change-Id: I87dc65eb942e3db1b1a0d9f2608f1b88b44b11e6

Change-Id: Ic0fefede8b079205c5580a31f59f195b6b1e94e0

deniskuzZ · 2026-06-18T14:40:42Z

    return Boolean.parseBoolean(properties.getProperty(hive_metastoreConstants.TABLE_IS_CTAS));
  }

+  public static boolean isTableTypeSet(Map<String, String> params) {


keep private

already used here as well https://github.com/apache/hive/pull/6413/changes#diff-93864ecf035fe51b92185015da842a56837cea89064813de39c278c6f8fed03cR1993
so kept public

comment and method name (i.e. isTableTypeSet ) are not in sync

// If source is Iceberg table set the schema and the partition spec

done, updated the comment:

// parameter table_type is set to "ICEBERG" in case of Iceberg tables // set the schema and the partition spec accordingly

deniskuzZ

LGTM, pending tests

Copilot

Pull request overview

This PR generalizes Hive’s column/partition-column handling (centered around ql.metadata.Table) to better support non-native tables (notably Iceberg), and updates several analyzers/rewriters to use the generalized APIs (getAllCols(), storage-handler partition keys, etc.) to reduce duplication and improve correctness.

Changes:

Generalize column/partition-column APIs and update call sites to use Table.getAllCols()/getPartCols() appropriately for non-native partitioning.
Refactor parts of query planning/rewriting (e.g., table scan schema construction, UPDATE/MERGE rewrites) to align column ordering using column-index lookups.
Relax/adjust some partition-column behaviors for non-native tables (e.g., allowing updates to Iceberg partition columns where they are regular columns).

Reviewed changes

Copilot reviewed 102 out of 103 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java	Broadens partition-spec input type; needs deterministic ordering when encoding cache tags.
ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java	Adds an overload to set lineage based on effective columns for non-native partitioning.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java	Generalizes partition spec type from `LinkedHashMap` to `Map`.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java	Refactors table scan row schema construction to align SerDe fields/partition cols by table column index.
ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java	Adjusts UPDATE validation to treat non-native partition columns as updatable regular columns.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/sql/MultiInsertSqlGenerator.java	Skips emitting native partition columns for non-native partitioning targets.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitUpdateRewriter.java	Reworks UPDATE rewrite value alignment using table column indices and non-native partition semantics.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitMergeRewriter.java	Adjusts MERGE rewrite value list sizing to use `getAllCols()`.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/MergeRewriter.java	Reworks MERGE value construction to align by column index and handle non-native partition columns.
ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java	Updates MERGE analyzer to use `getAllCols()` sizing (currently with a compilation issue).
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java	Introduces generalized column/partition APIs and name→index lookup helpers used by updated call sites.

Comments suppressed due to low confidence (1)

storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java:107

build(fullTableName, partDescMap) now accepts a generic Map, but the cache tag depends on the iteration order of partDescMap.entrySet(). If callers pass an unordered map (e.g. HashMap/Map.of), the same partition spec can produce different CacheTags, leading to cache misses / duplicate cache entries. Consider canonicalizing the partition spec order (e.g. sort by key) before encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ramitg254 · 2026-06-18T19:07:27Z

+  public Integer getColumnIndexByName(String colName) {
+    ensureColumnsIndexed();
+    TableColumn column = columnsByName.get(colName.toLowerCase());
+    return column != null ? column.index() : null;
+  }


I don't mind adding exception like:

if (column == null) { throw new SemanticException(ErrorMsg.INVALID_COLUMN.getMsg(" '" + colName + "'")); }

if we are doing this for getColumnIndexByName then we should do same for getFieldSchemaByName

but adding this will cascade into lot of places where it is getting used as getPartCols() also uses getFieldSchemaByName so if we throw exception then we need cascade it to every location where getPartCols() is used.

I think we can avoid it for now as currently we don't use it in any places where it can be null.
@deniskuzZ WDYT?

Change-Id: I5a34151d5ec0d4e1759eee5fcc6339b5fd6b92d3

sonarqubecloud · 2026-06-19T06:33:28Z

Quality Gate passed

Issues
15 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.2% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests unstable and removed tests pending labels Apr 7, 2026

ramitg254 force-pushed the HIVE-29413 branch from 0d2baee to d97e174 Compare April 8, 2026 13:31

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 8, 2026

ramitg254 force-pushed the HIVE-29413 branch from d97e174 to 9e87b12 Compare April 8, 2026 18:21

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 8, 2026

ramitg254 force-pushed the HIVE-29413 branch from 9e87b12 to 565a2eb Compare April 9, 2026 10:03

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 9, 2026

ramitg254 mentioned this pull request Apr 9, 2026

HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables #6337

Closed

asf-ci-hive added tests pending and removed tests unstable labels Apr 9, 2026

asf-ci-hive added tests unstable and removed tests pending labels Apr 9, 2026

asf-ci-hive added tests pending and removed tests unstable labels Apr 10, 2026

ramitg254 added 25 commits June 17, 2026 11:49

updated update implementation

954609b

updated partition pruning and query rewriting

f7544ed

changes related to metatable

6053034

corrected alter and semantic analyzer implementation

409cbe8

updated merge implementation and test output

83940d9

updated ctas create and tests output

7606f66

# Conflicts: # iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out

updated stats autogather and test output

ebb6085

updated getPartitionKeys

d897942

removed getStorageSchemaCols part-1

44b2be4

removed getStorageSchemaCols part-2

3e49846

removed getStorageSchemaCols part-3

150b8a2

removed workaround

4b03195

addressed sonar issues

e2277cf

non part cols retrieval made lazy

247c205

reviewed required changes

b17625d

Change-Id: I09ffba356ac47e3416c8b6717e8671d2cb6432b8

corrected partition.getCols for iceberg table

6693fca

Change-Id: I6812d068be702edf01b158c3e7eacadeadbd5c2b

added wrapper for lineage

2c30769

Change-Id: I6f5d81c5be00e080753efd5e8157b14fcfde2dde

reverted getPartitionKeys

ebb31ca

Change-Id: I5a95791b4d756fe949ee4697204be9de3025c6b5

refractor-1

e30d562

Change-Id: I852cf156a1a5ac20525bb9376f4cbb5783b8e1e3

refractor-2

f1d8876

Change-Id: Ib1542e38c65c7811074b97d7da7aaf780f3895dd

correction for rebased iceberg view commit recently merged to master

336347e

Change-Id: Ie3440239248870fb44c6e956d5be10271b28a1a0

removed isTableTypeSet and merged partition column comments

fe7bed3

Change-Id: Ic8f6d6c391ba49e12de7d04b7bb20542d879b0a3

updated conflicts for rebase

7238d7c

Change-Id: I2017f7c771f8f611b5d8900d3e0276817842b41f

reverted to user comment override and refractored

1359ac8

Change-Id: I87dc65eb942e3db1b1a0d9f2608f1b88b44b11e6

moved helpers to MetaStoreUtils

2f8945d

Change-Id: Ic0fefede8b079205c5580a31f59f195b6b1e94e0

deniskuzZ reviewed Jun 18, 2026

View reviewed changes

deniskuzZ approved these changes Jun 18, 2026

View reviewed changes

Copilot AI reviewed Jun 18, 2026

View reviewed changes

moved to HiveTableUtil

e37919d

Change-Id: I5a34151d5ec0d4e1759eee5fcc6339b5fd6b92d3

Conversation

ramitg254 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

deniskuzZ commented Apr 9, 2026

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ramitg254 Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramitg254 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramitg254 Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 19, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ramitg254 commented Apr 7, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

deniskuzZ commented Apr 10, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

ramitg254 Jun 18, 2026 •

edited

Loading

deniskuzZ Jun 18, 2026 •

edited

Loading

ramitg254 Jun 18, 2026 •

edited

Loading