Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814
Conversation
…nd REORDER When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and ALTER COLUMN REORDER fail because the Hive Metastore validates schema changes by comparing column types positionally. Dropping a middle column shifts subsequent columns, causing HMS to reject the change as an incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections) and flink-ddl.md (Hive catalog section) documenting the limitation, workaround (hive.metastore.disallow.incompatible.col.type.changes=false), and trade-off (Hive engine can no longer read the table).
docs/docs/spark-ddl.md
Outdated
|
|
||
| To work around this, disable the HMS schema compatibility check by setting | ||
| `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing | ||
| `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. |
There was a problem hiding this comment.
can not recall the detail, but I remember that overriding this config on the client-side does not work on some HMS versions.
There was a problem hiding this comment.
Hm, I have tried it for hive 2.3.9 and 3.1.3. Do you remember the HMS version you tried that failed?
There was a problem hiding this comment.
did you test with embedded HMS, or remote HMS? IIRC, Hive 2.3.9 with remote HMS does not work. I will contact our Hive team and sync the result today.
There was a problem hiding this comment.
I was testing with embedded HMS. I just asks the colleague in charge of Hive, these should be configured in the metadata store, unless user use embedded HMS just like I did.
I would update the doc, thanks.
There was a problem hiding this comment.
back here, I get a response that config hive.metastore.disallow.incompatible.col.type.changes client-side overriding requires HIVE-17832, HIVE-17942
There was a problem hiding this comment.
I have check remote HMS with version 3.1.3, which contains the HIVE-17832 and HIVE-17942, they still don't work. Thus, I updated the documentation as follows.
- Remote HMS: Set this property in the HMS server's hive-site.xml.
- Embedded HMS: Add the equivalent property to the Hive catalog configuration.
There was a problem hiding this comment.
you should apply these patches to the HMS client ...
There was a problem hiding this comment.
you should apply these patches to the HMS client ...
Oh. got it, would make a double check for hive client 3.x
There was a problem hiding this comment.
I have test with flink 1.20 and hive client 3.1.3, it doesn't work.
The hms client also needs to actively use setMetaConf to make it effective. However, neither Spark, Flink, nor Iceberg currently offer this type of operation, so configuring it directly in the job will not work.
There was a problem hiding this comment.
@jackylee-ch, I see, and have double-checked with our hive team, there are additional internal changes to make client-side overriding work. Sorry about the incomplete information.
wypoon
left a comment
There was a problem hiding this comment.
It may be better to add the warning in the ALTER TABLE section before any of the ALTER TABLE ... XXX commands.
There you can call out the ALTER TABLE ... ADD COLUMN, ALTER TABLE ... ALTER COLUMN, and ALTER TABLE ... DROP COLUMN commands, if they change the ordinal position of existing columns.
This way, the top level explanation and the workaround is presented first. Then under the individual examples, you can point them out as changing the ordinal position of existing columns (without needing to explain the workaround).
|
LGTM. I will leave the PR open for a couple of more days in case @manuzhang or other want to take a look. |
wypoon
left a comment
There was a problem hiding this comment.
This is better now. Thanks.
There was a problem hiding this comment.
Pull request overview
Adds documentation warnings about Hive Metastore (HMS) schema-evolution limitations when using an Iceberg Hive catalog, specifically for schema changes that shift column positions and can fail due to HMS positional validation.
Changes:
- Added a detailed “Hive Catalog Limitation” warning to Spark DDL docs, plus shorter per-operation warnings for
ADD COLUMN ... FIRST/AFTER,ALTER COLUMN ... FIRST/AFTER, andDROP COLUMN(non-last). - Added an engine-agnostic “Hive Catalog Limitation” warning to the Flink DDL Hive catalog configuration section, including a workaround and trade-off explanation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| docs/docs/spark-ddl.md | Adds warning admonitions describing HMS positional schema validation issues and suggested workaround/trade-offs for Spark DDL schema evolution. |
| docs/docs/flink-ddl.md | Adds a Hive catalog limitation warning in Flink DDL docs to make the HMS behavior and workaround visible in Flink context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| `hive.metastore.disallow.incompatible.col.type.changes=false`: | ||
|
|
||
| - **Remote HMS:** Set this property in the HMS server's `hive-site.xml`. | ||
| - **Embedded HMS:** Add the equivalent property to the Hive catalog configuration. |
There was a problem hiding this comment.
The “Embedded HMS” workaround is unclear in the context of this Flink Hive catalog section (where uri is required and configuration is typically provided via hive-conf-dir/classpath). Consider rephrasing this to explicitly instruct users to set the property in a hive-site.xml picked up via hive-conf-dir (or classpath) rather than suggesting an “embedded” metastore configuration path that may not apply.
| - **Embedded HMS:** Add the equivalent property to the Hive catalog configuration. | |
| - **When configuring the Hive catalog in Flink:** Set this property in a `hive-site.xml` | |
| that Flink picks up via `hive-conf-dir` or from the classpath. |
There was a problem hiding this comment.
The Embedded HMS means using HMS with derby, which is launching in the Flink JM.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
When using a Hive catalog, schema evolution operations that change column positions — such as ALTER TABLE ... DROP COLUMN (non-last column) and ALTER TABLE ... ALTER COLUMN ... FIRST/AFTER (reorder) — fail with InvalidOperationException from MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. This is because the Hive Metastore validates schema changes by comparing column types positionally (by index, not by name), controlled by hive.metastore.disallow.incompatible.col.type.changes (default true).
This limitation is not documented anywhere in the Iceberg docs, though the Iceberg test suite itself works around it by setting METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES=false (TestHiveMetastore.java:269), and a test comment(HiveTableTest.java:283-287) explicitly describes the issue.
This PR adds !!! warning admonition blocks to: