Skip to content

Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814

Open
jackylee-ch wants to merge 4 commits intoapache:mainfrom
jackylee-ch:docs-hive-catalog-schema-evolution-warning
Open

Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814
jackylee-ch wants to merge 4 commits intoapache:mainfrom
jackylee-ch:docs-hive-catalog-schema-evolution-warning

Conversation

@jackylee-ch
Copy link
Copy Markdown
Contributor

When using a Hive catalog, schema evolution operations that change column positions — such as ALTER TABLE ... DROP COLUMN (non-last column) and ALTER TABLE ... ALTER COLUMN ... FIRST/AFTER (reorder) — fail with InvalidOperationException from MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. This is because the Hive Metastore validates schema changes by comparing column types positionally (by index, not by name), controlled by hive.metastore.disallow.incompatible.col.type.changes (default true).

This limitation is not documented anywhere in the Iceberg docs, though the Iceberg test suite itself works around it by setting METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES=false (TestHiveMetastore.java:269), and a test comment(HiveTableTest.java:283-287) explicitly describes the issue.

This PR adds !!! warning admonition blocks to:

  • spark-ddl.md — under the DROP COLUMN section (full explanation, workaround, and trade-off) and the ALTER COLUMN reorder section (with cross-reference)
  • flink-ddl.md — under the Hive catalog configuration section (engine-agnostic warning)

…nd REORDER

When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and
ALTER COLUMN REORDER fail because the Hive Metastore validates schema
changes by comparing column types positionally. Dropping a middle column
shifts subsequent columns, causing HMS to reject the change as an
incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange.

Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections)
and flink-ddl.md (Hive catalog section) documenting the limitation,
workaround (hive.metastore.disallow.incompatible.col.type.changes=false),
and trade-off (Hive engine can no longer read the table).
@github-actions github-actions bot added the docs label Mar 28, 2026

To work around this, disable the HMS schema compatibility check by setting
`hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing
`--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark.
Copy link
Copy Markdown
Member

@pan3793 pan3793 Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can not recall the detail, but I remember that overriding this config on the client-side does not work on some HMS versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I have tried it for hive 2.3.9 and 3.1.3. Do you remember the HMS version you tried that failed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test with embedded HMS, or remote HMS? IIRC, Hive 2.3.9 with remote HMS does not work. I will contact our Hive team and sync the result today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing with embedded HMS. I just asks the colleague in charge of Hive, these should be configured in the metadata store, unless user use embedded HMS just like I did.

I would update the doc, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back here, I get a response that config hive.metastore.disallow.incompatible.col.type.changes client-side overriding requires HIVE-17832, HIVE-17942

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have check remote HMS with version 3.1.3, which contains the HIVE-17832 and HIVE-17942, they still don't work. Thus, I updated the documentation as follows.
- Remote HMS: Set this property in the HMS server's hive-site.xml.
- Embedded HMS: Add the equivalent property to the Hive catalog configuration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should apply these patches to the HMS client ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should apply these patches to the HMS client ...

Oh. got it, would make a double check for hive client 3.x

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have test with flink 1.20 and hive client 3.1.3, it doesn't work.

The hms client also needs to actively use setMetaConf to make it effective. However, neither Spark, Flink, nor Iceberg currently offer this type of operation, so configuring it directly in the job will not work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylee-ch, I see, and have double-checked with our hive team, there are additional internal changes to make client-side overriding work. Sorry about the incomplete information.

Copy link
Copy Markdown
Contributor

@wypoon wypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to add the warning in the ALTER TABLE section before any of the ALTER TABLE ... XXX commands.
There you can call out the ALTER TABLE ... ADD COLUMN, ALTER TABLE ... ALTER COLUMN, and ALTER TABLE ... DROP COLUMN commands, if they change the ordinal position of existing columns.
This way, the top level explanation and the workaround is presented first. Then under the individual examples, you can point them out as changing the ordinal position of existing columns (without needing to explain the workaround).

@jackylee-ch
Copy link
Copy Markdown
Contributor Author

cc @manuzhang @huaxingao

@huaxingao
Copy link
Copy Markdown
Contributor

LGTM. I will leave the PR open for a couple of more days in case @manuzhang or other want to take a look.

Copy link
Copy Markdown
Contributor

@wypoon wypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better now. Thanks.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation warnings about Hive Metastore (HMS) schema-evolution limitations when using an Iceberg Hive catalog, specifically for schema changes that shift column positions and can fail due to HMS positional validation.

Changes:

  • Added a detailed “Hive Catalog Limitation” warning to Spark DDL docs, plus shorter per-operation warnings for ADD COLUMN ... FIRST/AFTER, ALTER COLUMN ... FIRST/AFTER, and DROP COLUMN (non-last).
  • Added an engine-agnostic “Hive Catalog Limitation” warning to the Flink DDL Hive catalog configuration section, including a workaround and trade-off explanation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
docs/docs/spark-ddl.md Adds warning admonitions describing HMS positional schema validation issues and suggested workaround/trade-offs for Spark DDL schema evolution.
docs/docs/flink-ddl.md Adds a Hive catalog limitation warning in Flink DDL docs to make the HMS behavior and workaround visible in Flink context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

`hive.metastore.disallow.incompatible.col.type.changes=false`:

- **Remote HMS:** Set this property in the HMS server's `hive-site.xml`.
- **Embedded HMS:** Add the equivalent property to the Hive catalog configuration.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Embedded HMS” workaround is unclear in the context of this Flink Hive catalog section (where uri is required and configuration is typically provided via hive-conf-dir/classpath). Consider rephrasing this to explicitly instruct users to set the property in a hive-site.xml picked up via hive-conf-dir (or classpath) rather than suggesting an “embedded” metastore configuration path that may not apply.

Suggested change
- **Embedded HMS:** Add the equivalent property to the Hive catalog configuration.
- **When configuring the Hive catalog in Flink:** Set this property in a `hive-site.xml`
that Flink picks up via `hive-conf-dir` or from the classpath.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Embedded HMS means using HMS with derby, which is launching in the Flink JM.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants