HIVE-29413: Generalise column related APIs in Table.java#6413
HIVE-29413: Generalise column related APIs in Table.java#6413ramitg254 wants to merge 38 commits into
Conversation
|
@ramitg254 please take a look: 9e7535c. I would suggest following similar approach |
|
but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols() |
Where did I say that? The ask was to keep the original method unchanged. same here |
|
oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols() |
|
I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed. please share your thoughts on this idea |
# Conflicts: # iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out
Change-Id: I09ffba356ac47e3416c8b6717e8671d2cb6432b8
Change-Id: I6812d068be702edf01b158c3e7eacadeadbd5c2b
Change-Id: I6f5d81c5be00e080753efd5e8157b14fcfde2dde
Change-Id: I5a95791b4d756fe949ee4697204be9de3025c6b5
Change-Id: I852cf156a1a5ac20525bb9376f4cbb5783b8e1e3
Change-Id: Ib1542e38c65c7811074b97d7da7aaf780f3895dd
Change-Id: Ie3440239248870fb44c6e956d5be10271b28a1a0
Change-Id: Ic8f6d6c391ba49e12de7d04b7bb20542d879b0a3
Change-Id: I2017f7c771f8f611b5d8900d3e0276817842b41f
Change-Id: I87dc65eb942e3db1b1a0d9f2608f1b88b44b11e6
Change-Id: Ic0fefede8b079205c5580a31f59f195b6b1e94e0
| return Boolean.parseBoolean(properties.getProperty(hive_metastoreConstants.TABLE_IS_CTAS)); | ||
| } | ||
|
|
||
| public static boolean isTableTypeSet(Map<String, String> params) { |
There was a problem hiding this comment.
already used here as well https://github.com/apache/hive/pull/6413/changes#diff-93864ecf035fe51b92185015da842a56837cea89064813de39c278c6f8fed03cR1993
so kept public
There was a problem hiding this comment.
comment and method name (i.e. isTableTypeSet ) are not in sync
// If source is Iceberg table set the schema and the partition spec
There was a problem hiding this comment.
done, updated the comment:
// parameter table_type is set to "ICEBERG" in case of Iceberg tables
// set the schema and the partition spec accordingly
There was a problem hiding this comment.
Pull request overview
This PR generalizes Hive’s column/partition-column handling (centered around ql.metadata.Table) to better support non-native tables (notably Iceberg), and updates several analyzers/rewriters to use the generalized APIs (getAllCols(), storage-handler partition keys, etc.) to reduce duplication and improve correctness.
Changes:
- Generalize column/partition-column APIs and update call sites to use
Table.getAllCols()/getPartCols()appropriately for non-native partitioning. - Refactor parts of query planning/rewriting (e.g., table scan schema construction, UPDATE/MERGE rewrites) to align column ordering using column-index lookups.
- Relax/adjust some partition-column behaviors for non-native tables (e.g., allowing updates to Iceberg partition columns where they are regular columns).
Reviewed changes
Copilot reviewed 102 out of 103 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java | Broadens partition-spec input type; needs deterministic ordering when encoding cache tags. |
| ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java | Adds an overload to set lineage based on effective columns for non-native partitioning. |
| ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java | Generalizes partition spec type from LinkedHashMap to Map. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java | Refactors table scan row schema construction to align SerDe fields/partition cols by table column index. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java | Adjusts UPDATE validation to treat non-native partition columns as updatable regular columns. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/sql/MultiInsertSqlGenerator.java | Skips emitting native partition columns for non-native partitioning targets. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitUpdateRewriter.java | Reworks UPDATE rewrite value alignment using table column indices and non-native partition semantics. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/SplitMergeRewriter.java | Adjusts MERGE rewrite value list sizing to use getAllCols(). |
| ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/MergeRewriter.java | Reworks MERGE value construction to align by column index and handle non-native partition columns. |
| ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java | Updates MERGE analyzer to use getAllCols() sizing (currently with a compilation issue). |
| ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java | Introduces generalized column/partition APIs and name→index lookup helpers used by updated call sites. |
Comments suppressed due to low confidence (1)
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java:107
build(fullTableName, partDescMap)now accepts a genericMap, but the cache tag depends on the iteration order ofpartDescMap.entrySet(). If callers pass an unordered map (e.g.HashMap/Map.of), the same partition spec can produce differentCacheTags, leading to cache misses / duplicate cache entries. Consider canonicalizing the partition spec order (e.g. sort by key) before encoding.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public Integer getColumnIndexByName(String colName) { | ||
| ensureColumnsIndexed(); | ||
| TableColumn column = columnsByName.get(colName.toLowerCase()); | ||
| return column != null ? column.index() : null; | ||
| } |
There was a problem hiding this comment.
I don't mind adding exception like:
if (column == null) {
throw new SemanticException(ErrorMsg.INVALID_COLUMN.getMsg(" '" + colName + "'"));
}
if we are doing this for getColumnIndexByName then we should do same for getFieldSchemaByName
but adding this will cascade into lot of places where it is getting used as getPartCols() also uses getFieldSchemaByName so if we throw exception then we need cascade it to every location where getPartCols() is used.
I think we can avoid it for now as currently we don't use it in any places where it can be null.
@deniskuzZ WDYT?
Change-Id: I5a34151d5ec0d4e1759eee5fcc6339b5fd6b92d3
|



What changes were proposed in this pull request?
added getEffectivePartCols() in most places possible to avoid code duplication.
Why are the changes needed?
getPartCols() does not have support for iceberg tables.
Does this PR introduce any user-facing change?
No
How was this patch tested?
ci tests and local build