Core, API, Spec: Extend partition statistics support to unpartitioned tables by ebyhr · Pull Request #15896 · apache/iceberg

ebyhr · 2026-04-06T06:55:40Z

Proposed Change

Currently, the partition statistics file is only supported for partitioned tables.

iceberg/core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java

Line 207 in b987e60

    
           Preconditions.checkArgument(Partitioning.isPartitioned(table), "Table must be partitioned");

While this design makes sense since it’s conceptually a “partition” statistics file, the information it contains is also valuable for unpartitioned tables.

For example, the Trino Iceberg connector currently needs to read manifest files (TableStatisticsReader) to build internal table statistics. If Iceberg provided a statistics file for unpartitioned tables as well, Trino (and other query engines) could leverage it directly, improving planning performance by avoiding expensive manifest reads.

Fixes Extend partition statistics support to unpartitioned tables #14375

ebyhr · 2026-04-06T07:27:36Z

format/spec.md

 | v1 | v2 | v3 | Field id, name | Type | Description |
 |----|----|----|----------------|------|-------------|
-| _required_ | _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table |
+| _required_ | _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table, empty for unpartitioned tables |


I'm happy to post this change to the mailing list if needed.

yes. any spec change has to go for vote.

And whats the use case? You want to know data files, delete files, file size for unpartitioned table? We can use metadata tables for that Or maybe the new v4 manifests can solve it (with less I/O)?

Also, since the name itself is partitions stats. It makes odd to generate the file for unpartitioned table.

@ajantha-bhat Thanks, I'll post after exploring another approach, since Parquet writer can't write an empty struct.

We want to obtain the record count more quickly and efficiently for CBO. As mentioned in the PR description, Trino currently needs to read manifest files to estimate table size.

Even if v4 manifests address this, migration would take a long time. We need a solution that works for tables with format versions < v4 as well.

It is still a behavior change for existing users.
Partition stats was computed only for partition tables. Now you are enabling for non-partition tables with NULL partition. Plus like I mentioned, it is still odd for non partition table to write this file as it is via compute_partition_stats method. Why does user has to call compute_partition_stats on non-partition table?

I can understand that you need a quick table level stats for CBO in Trino without doing I/O of all the manifests.

a) Can we introduce a new table level stats (along with NDV puffin file) in compute_table_stats? People can still refer it for both partition table and non-partition table if they need whole table level info?
b) Or Can we check if snapshot summary already has this table level stats you are looking for? (we don't have to do multiple IO of the files in that case). If not, can we enhance snapshot summary to include it?

ebyhr · 2026-04-06T07:55:56Z

This PR ran into a known issue in the Parquet writer, which cannot handle writing an empty struct. https://issues.apache.org/jira/browse/SPARK-20593

I'm going to change the value to NULL.

ajantha-bhat

I think we should explore other options as It is still a behavior change for existing users.

Partition stats was computed only for partition tables. Now you are enabling for non-partition tables with NULL partition. Plus like I mentioned, it is still odd for non partition table to write this file as it is via 'compute_partition_stats' method. Why does user has to call compute_partition_stats on non-partition table?

I can understand that you need a quick table level stats for CBO in Trino without doing I/O of all the manifests.

a) Can we introduce a new table level stats (along with NDV puffin file) in compute_table_stats? People can still refer it for both partition table and non-partition table if they need whole table level info?
b) Or Can we check if snapshot summary already has this table level stats you are looking for? (we don't have to do multiple IO of the files in that case). If not, can we enhance snapshot summary to include it?

github-actions bot added API core labels Apr 6, 2026

ebyhr force-pushed the ebi/unpartition-stats branch 2 times, most recently from 9bca81e to 0dd88d5 Compare April 6, 2026 07:06

ebyhr mentioned this pull request Apr 6, 2026

Extend partition statistics support to unpartitioned tables #14375

Open

6 tasks

ebyhr force-pushed the ebi/unpartition-stats branch from 0dd88d5 to da57f42 Compare April 6, 2026 07:10

github-actions bot added the Specification Issues that may introduce spec changes. label Apr 6, 2026

ebyhr force-pushed the ebi/unpartition-stats branch from da57f42 to 1cda0e7 Compare April 6, 2026 07:12

ebyhr changed the title ~~Core: Extend partition statistics support to unpartitioned tables~~ Core, API: Extend partition statistics support to unpartitioned tables Apr 6, 2026

ebyhr commented Apr 6, 2026

View reviewed changes

Core: Extend partition statistics support to unpartitioned tables

1a0c198

ebyhr force-pushed the ebi/unpartition-stats branch from 1cda0e7 to 1a0c198 Compare April 6, 2026 07:35

github-actions bot added the ORC label Apr 6, 2026

raunaqmorarka approved these changes Apr 6, 2026

View reviewed changes

fixup! Core: Extend partition statistics support to unpartitioned tables

638ab9f

ebyhr force-pushed the ebi/unpartition-stats branch from e6578a4 to 638ab9f Compare April 6, 2026 10:09

ebyhr changed the title ~~Core, API: Extend partition statistics support to unpartitioned tables~~ Core, API, Spec: Extend partition statistics support to unpartitioned tables Apr 6, 2026

ajantha-bhat suggested changes Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core, API, Spec: Extend partition statistics support to unpartitioned tables#15896

Core, API, Spec: Extend partition statistics support to unpartitioned tables#15896
ebyhr wants to merge 2 commits intoapache:mainfrom
ebyhr:ebi/unpartition-stats

ebyhr commented Apr 6, 2026 •

edited

Loading

Uh oh!

ebyhr Apr 6, 2026

Uh oh!

ajantha-bhat Apr 6, 2026

Uh oh!

ebyhr Apr 6, 2026

Uh oh!

ajantha-bhat Apr 6, 2026

Uh oh!

ebyhr commented Apr 6, 2026 •

edited

Loading

Uh oh!

ajantha-bhat left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ebyhr commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Change

Uh oh!

ebyhr Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ebyhr Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ebyhr commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajantha-bhat left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ebyhr commented Apr 6, 2026 •

edited

Loading

ebyhr commented Apr 6, 2026 •

edited

Loading