fix(iceberg): normalize float/double stats before encoding#798
fix(iceberg): normalize float/double stats before encoding#798parisni wants to merge 3 commits intoapache:mainfrom
Conversation
vinishjail97
left a comment
There was a problem hiding this comment.
Can we add a test for this? I want to understand the bug.
|
Hi @vinishjail97 added a test. Also generalized the fix to any type evolution. problem is the old parquet footer have statistics in a format that differ on the current iceberg table. The idea is to best effort coerce them. |
|
@parisni if the issue is on the Hudi side then the proper fix is to move this to the Hudi side. Otherwise every target needs to understand the output types from Hudi. The stats are meant to match the schema type according to the docs for ranges: https://github.com/apache/incubator-xtable/blob/main/xtable-api/src/main/java/org/apache/xtable/model/stat/Range.java#L40 |
|
if the issue is on the Hudi side
Not sure about that. Xtable get the hudi stats from the parquet files, not from a hudi api. So my understanding is that it's xtable responsibility to coerce stats coming from the parquet footer in case type evolution did happen.
…On February 5, 2026 4:51:17 PM UTC, Tim Brown ***@***.***> wrote:
the-other-tim-brown left a comment (apache/incubator-xtable#798)
@parisni if the issue is on the Hudi side then the proper fix is to move this to the Hudi side. Otherwise every target needs to understand the output types from Hudi. The stats are meant to match the schema type according to the docs for ranges: https://github.com/apache/incubator-xtable/blob/main/xtable-api/src/main/java/org/apache/xtable/model/stat/Range.java#L40
--
Reply to this email directly or view it on GitHub:
#798 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
Yes, this is what I am saying. The HudiConversionSource needs to comply with the XTable Spec. The target is assuming the source will produce the range data according to the spec. |
Important Read
What is the purpose of the pull request
This pull request normalizes float and double stats before encoding Iceberg column bounds.
Brief change log
Verify this pull request
This pull request is a trivial rework / code cleanup without any test coverage.