From d92ac7f542fc5841960a9360fcaca2d8644c1187 Mon Sep 17 00:00:00 2001 From: Jacky Lee Date: Sat, 28 Mar 2026 16:43:34 +0800 Subject: [PATCH 1/4] Docs: Add Hive Metastore schema validation warnings for DROP COLUMN and REORDER When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and ALTER COLUMN REORDER fail because the Hive Metastore validates schema changes by comparing column types positionally. Dropping a middle column shifts subsequent columns, causing HMS to reject the change as an incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections) and flink-ddl.md (Hive catalog section) documenting the limitation, workaround (hive.metastore.disallow.incompatible.col.type.changes=false), and trade-off (Hive engine can no longer read the table). --- docs/docs/flink-ddl.md | 14 ++++++++++++++ docs/docs/spark-ddl.md | 22 ++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/docs/docs/flink-ddl.md b/docs/docs/flink-ddl.md index 756256f0df4f..448e2e45924d 100644 --- a/docs/docs/flink-ddl.md +++ b/docs/docs/flink-ddl.md @@ -45,6 +45,20 @@ The following properties can be set if using the Hive catalog: * `hive-conf-dir`: Path to a directory containing a `hive-site.xml` configuration file which will be used to provide custom Hive configuration values. The value of `hive.metastore.warehouse.dir` from `/hive-site.xml` (or hive configure file from classpath) will be overwritten with the `warehouse` value if setting both `hive-conf-dir` and `warehouse` when creating iceberg catalog. * `hadoop-conf-dir`: Path to a directory containing `core-site.xml` and `hdfs-site.xml` configuration files which will be used to provide custom Hadoop configuration values. +!!! warning "Hive Catalog Limitation" + The Hive Metastore validates schema changes by comparing column types **positionally** + (`hive.metastore.disallow.incompatible.col.type.changes`, default `true`). When using a Hive catalog, + schema evolution operations that change column positions — such as dropping a non-last column or + reordering columns — may fail regardless of which engine performs the change (Spark, Flink Java API, etc.). + + To work around this, set `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml` + or add the equivalent property to the Hive catalog configuration. + + **Trade-off:** After disabling this check, the Hive engine may no longer be able to read the table + correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, + Trino, etc.) will continue to work correctly, as they read schema from Iceberg metadata rather + than the Hive Metastore. + #### Hadoop catalog Iceberg also supports a directory-based catalog in HDFS that can be configured using `'catalog-type'='hadoop'`: diff --git a/docs/docs/spark-ddl.md b/docs/docs/spark-ddl.md index 0b3f7389a3fa..5ecf323eec92 100644 --- a/docs/docs/spark-ddl.md +++ b/docs/docs/spark-ddl.md @@ -302,6 +302,13 @@ ALTER TABLE prod.db.sample ALTER COLUMN col FIRST; ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col; ``` +!!! warning "Hive Catalog Limitation" + When using a **Hive catalog**, reordering columns may fail with an error from the Hive Metastore. + The Hive Metastore validates schema changes by comparing column types **positionally** — reordering + columns causes positional type mismatches, which it rejects as incompatible changes. + + See the [DROP COLUMN section](#alter-table-drop-column) for the workaround and trade-offs. + Nullability for a non-nullable column can be changed using `DROP NOT NULL`: ```sql @@ -323,6 +330,21 @@ ALTER TABLE prod.db.sample DROP COLUMN id; ALTER TABLE prod.db.sample DROP COLUMN point.z; ``` +!!! warning "Hive Catalog Limitation" + When using a **Hive catalog**, dropping a column that is not the last column in the table schema + may fail with an error from the Hive Metastore (HMS). This occurs because HMS validates schema + changes by comparing column types **positionally** — dropping a middle column shifts subsequent + columns, which HMS interprets as incompatible type changes + (`MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange`). + + To work around this, disable the HMS schema compatibility check by setting + `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing + `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. + + **Trade-off:** After applying this workaround, the Hive engine may no longer be able to read the table + correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, Trino, etc.) + will continue to work correctly, as they read schema from Iceberg metadata rather than HMS. + ## `ALTER TABLE` SQL extensions These commands are available in Spark 3 when using Iceberg [SQL extensions](spark-configuration.md#sql-extensions). From 8ed99054e39416b95ee7470d315b66b5b44c73b4 Mon Sep 17 00:00:00 2001 From: Jacky Lee Date: Tue, 31 Mar 2026 16:32:03 +0800 Subject: [PATCH 2/4] Docs: Clarify HMS workaround for embedded vs remote deployment --- docs/docs/flink-ddl.md | 7 +++++-- docs/docs/spark-ddl.md | 6 ++++-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/docs/flink-ddl.md b/docs/docs/flink-ddl.md index 448e2e45924d..118434ff9905 100644 --- a/docs/docs/flink-ddl.md +++ b/docs/docs/flink-ddl.md @@ -51,8 +51,11 @@ The following properties can be set if using the Hive catalog: schema evolution operations that change column positions — such as dropping a non-last column or reordering columns — may fail regardless of which engine performs the change (Spark, Flink Java API, etc.). - To work around this, set `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml` - or add the equivalent property to the Hive catalog configuration. + To work around this, disable the HMS schema compatibility check by setting + `hive.metastore.disallow.incompatible.col.type.changes=false`: + + - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. + - **Embedded HMS:** Add the equivalent property to the Hive catalog configuration. **Trade-off:** After disabling this check, the Hive engine may no longer be able to read the table correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, diff --git a/docs/docs/spark-ddl.md b/docs/docs/spark-ddl.md index 5ecf323eec92..cfe44e816614 100644 --- a/docs/docs/spark-ddl.md +++ b/docs/docs/spark-ddl.md @@ -338,8 +338,10 @@ ALTER TABLE prod.db.sample DROP COLUMN point.z; (`MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange`). To work around this, disable the HMS schema compatibility check by setting - `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing - `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. + `hive.metastore.disallow.incompatible.col.type.changes=false`: + + - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. + - **Embedded HMS:** Pass `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. **Trade-off:** After applying this workaround, the Hive engine may no longer be able to read the table correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, Trino, etc.) From 7c8417f1ecaf10b4fbec3534a76ba363cb1c94a9 Mon Sep 17 00:00:00 2001 From: Jacky Lee Date: Wed, 1 Apr 2026 13:21:24 +0800 Subject: [PATCH 3/4] Docs: add more warning for spark-ddl.md --- docs/docs/spark-ddl.md | 51 +++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/docs/docs/spark-ddl.md b/docs/docs/spark-ddl.md index cfe44e816614..5bfcbeb5e239 100644 --- a/docs/docs/spark-ddl.md +++ b/docs/docs/spark-ddl.md @@ -170,6 +170,27 @@ Iceberg has full `ALTER TABLE` support in Spark 3, including: In addition, [SQL extensions](spark-configuration.md#sql-extensions) can be used to add support for partition evolution and setting a table's write order +!!! warning "Hive Catalog Limitation" + The Hive Metastore (HMS) validates schema changes by comparing column types **positionally** + (`hive.metastore.disallow.incompatible.col.type.changes`, default `true`). Any schema evolution + operation that shifts column positions will fail when using a Hive catalog. Affected operations + include: + + - `ADD COLUMN` with `FIRST` or `AFTER` clauses + - `ALTER COLUMN` with `FIRST` or `AFTER` clauses (reordering) + - `DROP COLUMN` on a non-last column + + To work around this, disable the HMS schema compatibility check by setting + `hive.metastore.disallow.incompatible.col.type.changes=false`: + + - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. + - **Embedded HMS:** Pass `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. + + **Trade-off:** After disabling this check, the Hive engine may no longer be able to read the table + correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, + Trino, etc.) will continue to work correctly, as they read schema from Iceberg metadata rather + than HMS. + ### `ALTER TABLE ... RENAME TO` ```sql @@ -259,6 +280,11 @@ ALTER TABLE prod.db.sample ADD COLUMN nested.new_column bigint FIRST; ``` +!!! warning "Hive Catalog Limitation" + When using a Hive catalog, adding a column with `FIRST` or `AFTER` may fail due to HMS positional + schema validation. See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details + and workaround. + ### `ALTER TABLE ... RENAME COLUMN` Iceberg allows any field to be renamed. To rename a field, use `RENAME COLUMN`: @@ -303,11 +329,8 @@ ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col; ``` !!! warning "Hive Catalog Limitation" - When using a **Hive catalog**, reordering columns may fail with an error from the Hive Metastore. - The Hive Metastore validates schema changes by comparing column types **positionally** — reordering - columns causes positional type mismatches, which it rejects as incompatible changes. - - See the [DROP COLUMN section](#alter-table-drop-column) for the workaround and trade-offs. + When using a Hive catalog, reordering columns may fail due to HMS positional schema validation. + See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details and workaround. Nullability for a non-nullable column can be changed using `DROP NOT NULL`: @@ -331,21 +354,9 @@ ALTER TABLE prod.db.sample DROP COLUMN point.z; ``` !!! warning "Hive Catalog Limitation" - When using a **Hive catalog**, dropping a column that is not the last column in the table schema - may fail with an error from the Hive Metastore (HMS). This occurs because HMS validates schema - changes by comparing column types **positionally** — dropping a middle column shifts subsequent - columns, which HMS interprets as incompatible type changes - (`MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange`). - - To work around this, disable the HMS schema compatibility check by setting - `hive.metastore.disallow.incompatible.col.type.changes=false`: - - - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. - - **Embedded HMS:** Pass `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. - - **Trade-off:** After applying this workaround, the Hive engine may no longer be able to read the table - correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, Trino, etc.) - will continue to work correctly, as they read schema from Iceberg metadata rather than HMS. + When using a Hive catalog, dropping a non-last column may fail due to HMS positional schema + validation. See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details + and workaround. ## `ALTER TABLE` SQL extensions From 6d1a400a191aaacd27c0e745de6299b0fd24bb15 Mon Sep 17 00:00:00 2001 From: jackylee Date: Sun, 5 Apr 2026 17:32:34 +0800 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/docs/flink-ddl.md | 2 +- docs/docs/spark-ddl.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/docs/flink-ddl.md b/docs/docs/flink-ddl.md index 118434ff9905..7a45ed9099a5 100644 --- a/docs/docs/flink-ddl.md +++ b/docs/docs/flink-ddl.md @@ -51,7 +51,7 @@ The following properties can be set if using the Hive catalog: schema evolution operations that change column positions — such as dropping a non-last column or reordering columns — may fail regardless of which engine performs the change (Spark, Flink Java API, etc.). - To work around this, disable the HMS schema compatibility check by setting + To work around this, disable the Hive Metastore (HMS) schema compatibility check by setting `hive.metastore.disallow.incompatible.col.type.changes=false`: - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. diff --git a/docs/docs/spark-ddl.md b/docs/docs/spark-ddl.md index 5bfcbeb5e239..8adfd7310c19 100644 --- a/docs/docs/spark-ddl.md +++ b/docs/docs/spark-ddl.md @@ -282,7 +282,7 @@ ADD COLUMN nested.new_column bigint FIRST; !!! warning "Hive Catalog Limitation" When using a Hive catalog, adding a column with `FIRST` or `AFTER` may fail due to HMS positional - schema validation. See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details + schema validation. See the warning above for details and workaround. ### `ALTER TABLE ... RENAME COLUMN` @@ -330,7 +330,7 @@ ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col; !!! warning "Hive Catalog Limitation" When using a Hive catalog, reordering columns may fail due to HMS positional schema validation. - See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details and workaround. + See the Hive Catalog Limitation note above for details and workaround. Nullability for a non-nullable column can be changed using `DROP NOT NULL`: @@ -355,8 +355,8 @@ ALTER TABLE prod.db.sample DROP COLUMN point.z; !!! warning "Hive Catalog Limitation" When using a Hive catalog, dropping a non-last column may fail due to HMS positional schema - validation. See the [Hive Catalog Limitation](#hive-catalog-limitation) above for details - and workaround. + validation. See the earlier Hive Catalog Limitation warning above for details and + workaround. ## `ALTER TABLE` SQL extensions