Skip to content

HIVE-29413: Generalise column related APIs in Table.java#6413

Open
ramitg254 wants to merge 38 commits into
apache:masterfrom
ramitg254:HIVE-29413
Open

HIVE-29413: Generalise column related APIs in Table.java#6413
ramitg254 wants to merge 38 commits into
apache:masterfrom
ramitg254:HIVE-29413

Conversation

@ramitg254

@ramitg254 ramitg254 commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

added getEffectivePartCols() in most places possible to avoid code duplication.

Why are the changes needed?

getPartCols() does not have support for iceberg tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ci tests and local build

@deniskuzZ

Copy link
Copy Markdown
Member

@ramitg254 please take a look: 9e7535c. I would suggest following similar approach

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

@deniskuzZ

deniskuzZ commented Apr 10, 2026

Copy link
Copy Markdown
Member

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

Where did I say that? The ask was to keep the original method unchanged. same here

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols()

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed.
So I was first focussing on making getPartCols() unified for iceberg tables as well.

please share your thoughts on this idea

ramitg254 added 25 commits June 17, 2026 11:49
# Conflicts:
#	iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out
Change-Id: I09ffba356ac47e3416c8b6717e8671d2cb6432b8
Change-Id: I6812d068be702edf01b158c3e7eacadeadbd5c2b
Change-Id: I6f5d81c5be00e080753efd5e8157b14fcfde2dde
Change-Id: I5a95791b4d756fe949ee4697204be9de3025c6b5
Change-Id: I852cf156a1a5ac20525bb9376f4cbb5783b8e1e3
Change-Id: Ib1542e38c65c7811074b97d7da7aaf780f3895dd
Change-Id: Ie3440239248870fb44c6e956d5be10271b28a1a0
Change-Id: Ic8f6d6c391ba49e12de7d04b7bb20542d879b0a3
Change-Id: I2017f7c771f8f611b5d8900d3e0276817842b41f
Change-Id: I87dc65eb942e3db1b1a0d9f2608f1b88b44b11e6
Change-Id: Ic0fefede8b079205c5580a31f59f195b6b1e94e0
return Boolean.parseBoolean(properties.getProperty(hive_metastoreConstants.TABLE_IS_CTAS));
}

public static boolean isTableTypeSet(Map<String, String> params) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep private

@ramitg254 ramitg254 Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ deniskuzZ Jun 18, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment and method name (i.e. isTableTypeSet ) are not in sync

// If source is Iceberg table set the schema and the partition spec

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, updated the comment:

// parameter table_type is set to "ICEBERG" in case of Iceberg tables
// set the schema and the partition spec accordingly

@deniskuzZ deniskuzZ left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR generalizes Hive’s column/partition-column handling (centered around ql.metadata.Table) to better support non-native tables (notably Iceberg), and updates several analyzers/rewriters to use the generalized APIs (getAllCols(), storage-handler partition keys, etc.) to reduce duplication and improve correctness.

Changes:

  • Generalize column/partition-column APIs and update call sites to use Table.getAllCols()/getPartCols() appropriately for non-native partitioning.
  • Refactor parts of query planning/rewriting (e.g., table scan schema construction, UPDATE/MERGE rewrites) to align column ordering using column-index lookups.
  • Relax/adjust some partition-column behaviors for non-native tables (e.g., allowing updates to Iceberg partition columns where they are regular columns).

Reviewed changes

Copilot reviewed 102 out of 103 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java Broadens partition-spec input type; needs deterministic ordering when encoding cache tags.
ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java Adds an overload to set lineage based on effective columns for non-native partitioning.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java Generalizes partition spec type from LinkedHashMap to Map.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java Refactors table scan row schema construction to align SerDe fields/partition cols by table column index.
ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java Adjusts UPDATE validation to treat non-native partition columns as updatable regular columns.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/sql/MultiInsertSqlGenerator.java Skips emitting native partition columns for non-native partitioning targets.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitUpdateRewriter.java Reworks UPDATE rewrite value alignment using table column indices and non-native partition semantics.
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitMergeRewriter.java Adjusts MERGE rewrite value list sizing to use getAllCols().
ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/MergeRewriter.java Reworks MERGE value construction to align by column index and handle non-native partition columns.
ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java Updates MERGE analyzer to use getAllCols() sizing (currently with a compilation issue).
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java Introduces generalized column/partition APIs and name→index lookup helpers used by updated call sites.
Comments suppressed due to low confidence (1)

storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java:107

  • build(fullTableName, partDescMap) now accepts a generic Map, but the cache tag depends on the iteration order of partDescMap.entrySet(). If callers pass an unordered map (e.g. HashMap/Map.of), the same partition spec can produce different CacheTags, leading to cache misses / duplicate cache entries. Consider canonicalizing the partition spec order (e.g. sort by key) before encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Comment on lines +768 to +772
public Integer getColumnIndexByName(String colName) {
ensureColumnsIndexed();
TableColumn column = columnsByName.get(colName.toLowerCase());
return column != null ? column.index() : null;
}

@ramitg254 ramitg254 Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind adding exception like:

if (column == null) {
      throw new SemanticException(ErrorMsg.INVALID_COLUMN.getMsg(" '" + colName + "'"));
    }

if we are doing this for getColumnIndexByName then we should do same for getFieldSchemaByName

but adding this will cascade into lot of places where it is getting used as getPartCols() also uses getFieldSchemaByName so if we throw exception then we need cascade it to every location where getPartCols() is used.

I think we can avoid it for now as currently we don't use it in any places where it can be null.
@deniskuzZ WDYT?

Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
Change-Id: I5a34151d5ec0d4e1759eee5fcc6339b5fd6b92d3
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants