Skip to content

fix(iceberg): normalize float/double stats before encoding#798

Open
parisni wants to merge 3 commits intoapache:mainfrom
leboncoin:pr-fix-iceberg-stats
Open

fix(iceberg): normalize float/double stats before encoding#798
parisni wants to merge 3 commits intoapache:mainfrom
leboncoin:pr-fix-iceberg-stats

Conversation

@parisni
Copy link
Contributor

@parisni parisni commented Feb 3, 2026

Important Read

  • GitHub issue: TBD

What is the purpose of the pull request

This pull request normalizes float and double stats before encoding Iceberg column bounds.

Brief change log

  • Normalize min/max stat values for FLOAT/DOUBLE before Conversions.toByteBuffer.
  • Added helper to coerce numeric stats to the expected primitive type.

Verify this pull request

This pull request is a trivial rework / code cleanup without any test coverage.

Copy link
Contributor

@vinishjail97 vinishjail97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this? I want to understand the bug.

@parisni
Copy link
Contributor Author

parisni commented Feb 5, 2026

Hi @vinishjail97 added a test. Also generalized the fix to any type evolution.
I got the problem on hudi tables that had type evolution such float->double or even string->int

problem is the old parquet footer have statistics in a format that differ on the current iceberg table. The idea is to best effort coerce them.

@the-other-tim-brown
Copy link
Contributor

@parisni if the issue is on the Hudi side then the proper fix is to move this to the Hudi side. Otherwise every target needs to understand the output types from Hudi. The stats are meant to match the schema type according to the docs for ranges: https://github.com/apache/incubator-xtable/blob/main/xtable-api/src/main/java/org/apache/xtable/model/stat/Range.java#L40

@parisni
Copy link
Contributor Author

parisni commented Feb 5, 2026 via email

@the-other-tim-brown
Copy link
Contributor

if the issue is on the Hudi side
Not sure about that. Xtable get the hudi stats from the parquet files, not from a hudi api. So my understanding is that it's xtable responsibility to coerce stats coming from the parquet footer in case type evolution did happen.

Yes, this is what I am saying. The HudiConversionSource needs to comply with the XTable Spec. The target is assuming the source will produce the range data according to the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments