Summary
Apply a table's declared sort order when writing data files (append) and when compacting, so per-file column bounds are tight and non-overlapping across files. Sorting is currently listed as unsupported.
Current behavior
- README "Supported Features": Sorting ❌.
src/create.js accepts and records a sortOrder in table metadata (sort-orders / default-sort-order-id), but icebergAppend writes records in input order, and compaction does not enforce the sort order.
Proposed
- Append: order records within each written data file by the table's
default-sort-order (or a per-call override).
- Compaction / rewrite: produce sorted output files whose ranges are non-overlapping for the sort key.
- Keep it physical-only: row contents and counts unchanged.
Acceptance criteria
- Data files produced by append and compaction are ordered by the declared sort order.
- After compaction, per-file column bounds for the sort key are non-overlapping.
- Round-trip read returns identical rows (modulo order).
Why it matters
Sorted/clustered layout is what makes file-level and row-group-level pruning (companion issues) effective — without it, file and row-group bounds overlap and pruning can skip very little. Callers can pre-sort a single append's input as a partial workaround, but cross-file and compaction-time sort need engine support.
Summary
Apply a table's declared sort order when writing data files (append) and when compacting, so per-file column bounds are tight and non-overlapping across files. Sorting is currently listed as unsupported.
Current behavior
src/create.jsaccepts and records asortOrderin table metadata (sort-orders/default-sort-order-id), buticebergAppendwrites records in input order, and compaction does not enforce the sort order.Proposed
default-sort-order(or a per-call override).Acceptance criteria
Why it matters
Sorted/clustered layout is what makes file-level and row-group-level pruning (companion issues) effective — without it, file and row-group bounds overlap and pruning can skip very little. Callers can pre-sort a single append's input as a partial workaround, but cross-file and compaction-time sort need engine support.