diff --git a/docs/docs/flink-ddl.md b/docs/docs/flink-ddl.md index 756256f0df4f..7a45ed9099a5 100644 --- a/docs/docs/flink-ddl.md +++ b/docs/docs/flink-ddl.md @@ -45,6 +45,23 @@ The following properties can be set if using the Hive catalog: * `hive-conf-dir`: Path to a directory containing a `hive-site.xml` configuration file which will be used to provide custom Hive configuration values. The value of `hive.metastore.warehouse.dir` from `/hive-site.xml` (or hive configure file from classpath) will be overwritten with the `warehouse` value if setting both `hive-conf-dir` and `warehouse` when creating iceberg catalog. * `hadoop-conf-dir`: Path to a directory containing `core-site.xml` and `hdfs-site.xml` configuration files which will be used to provide custom Hadoop configuration values. +!!! warning "Hive Catalog Limitation" + The Hive Metastore validates schema changes by comparing column types **positionally** + (`hive.metastore.disallow.incompatible.col.type.changes`, default `true`). When using a Hive catalog, + schema evolution operations that change column positions — such as dropping a non-last column or + reordering columns — may fail regardless of which engine performs the change (Spark, Flink Java API, etc.). + + To work around this, disable the Hive Metastore (HMS) schema compatibility check by setting + `hive.metastore.disallow.incompatible.col.type.changes=false`: + + - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. + - **Embedded HMS:** Add the equivalent property to the Hive catalog configuration. + + **Trade-off:** After disabling this check, the Hive engine may no longer be able to read the table + correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, + Trino, etc.) will continue to work correctly, as they read schema from Iceberg metadata rather + than the Hive Metastore. + #### Hadoop catalog Iceberg also supports a directory-based catalog in HDFS that can be configured using `'catalog-type'='hadoop'`: diff --git a/docs/docs/spark-ddl.md b/docs/docs/spark-ddl.md index 0b3f7389a3fa..8adfd7310c19 100644 --- a/docs/docs/spark-ddl.md +++ b/docs/docs/spark-ddl.md @@ -170,6 +170,27 @@ Iceberg has full `ALTER TABLE` support in Spark 3, including: In addition, [SQL extensions](spark-configuration.md#sql-extensions) can be used to add support for partition evolution and setting a table's write order +!!! warning "Hive Catalog Limitation" + The Hive Metastore (HMS) validates schema changes by comparing column types **positionally** + (`hive.metastore.disallow.incompatible.col.type.changes`, default `true`). Any schema evolution + operation that shifts column positions will fail when using a Hive catalog. Affected operations + include: + + - `ADD COLUMN` with `FIRST` or `AFTER` clauses + - `ALTER COLUMN` with `FIRST` or `AFTER` clauses (reordering) + - `DROP COLUMN` on a non-last column + + To work around this, disable the HMS schema compatibility check by setting + `hive.metastore.disallow.incompatible.col.type.changes=false`: + + - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. + - **Embedded HMS:** Pass `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. + + **Trade-off:** After disabling this check, the Hive engine may no longer be able to read the table + correctly due to the schema mismatch in the Hive Metastore. Iceberg-aware engines (Spark, Flink, + Trino, etc.) will continue to work correctly, as they read schema from Iceberg metadata rather + than HMS. + ### `ALTER TABLE ... RENAME TO` ```sql @@ -259,6 +280,11 @@ ALTER TABLE prod.db.sample ADD COLUMN nested.new_column bigint FIRST; ``` +!!! warning "Hive Catalog Limitation" + When using a Hive catalog, adding a column with `FIRST` or `AFTER` may fail due to HMS positional + schema validation. See the warning above for details + and workaround. + ### `ALTER TABLE ... RENAME COLUMN` Iceberg allows any field to be renamed. To rename a field, use `RENAME COLUMN`: @@ -302,6 +328,10 @@ ALTER TABLE prod.db.sample ALTER COLUMN col FIRST; ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col; ``` +!!! warning "Hive Catalog Limitation" + When using a Hive catalog, reordering columns may fail due to HMS positional schema validation. + See the Hive Catalog Limitation note above for details and workaround. + Nullability for a non-nullable column can be changed using `DROP NOT NULL`: ```sql @@ -323,6 +353,11 @@ ALTER TABLE prod.db.sample DROP COLUMN id; ALTER TABLE prod.db.sample DROP COLUMN point.z; ``` +!!! warning "Hive Catalog Limitation" + When using a Hive catalog, dropping a non-last column may fail due to HMS positional schema + validation. See the earlier Hive Catalog Limitation warning above for details and + workaround. + ## `ALTER TABLE` SQL extensions These commands are available in Spark 3 when using Iceberg [SQL extensions](spark-configuration.md#sql-extensions).