Skip to content

Write: honor declared sort orders on append and compaction #22

Description

@philcunliffe

Summary

Apply a table's declared sort order when writing data files (append) and when compacting, so per-file column bounds are tight and non-overlapping across files. Sorting is currently listed as unsupported.

Current behavior

  • README "Supported Features": Sorting ❌.
  • src/create.js accepts and records a sortOrder in table metadata (sort-orders / default-sort-order-id), but icebergAppend writes records in input order, and compaction does not enforce the sort order.

Proposed

  • Append: order records within each written data file by the table's default-sort-order (or a per-call override).
  • Compaction / rewrite: produce sorted output files whose ranges are non-overlapping for the sort key.
  • Keep it physical-only: row contents and counts unchanged.

Acceptance criteria

  • Data files produced by append and compaction are ordered by the declared sort order.
  • After compaction, per-file column bounds for the sort key are non-overlapping.
  • Round-trip read returns identical rows (modulo order).

Why it matters

Sorted/clustered layout is what makes file-level and row-group-level pruning (companion issues) effective — without it, file and row-group bounds overlap and pruning can skip very little. Callers can pre-sort a single append's input as a partial workaround, but cross-file and compaction-time sort need engine support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions