Skip to content

Add Iceberg output format and Postgres ingest flow#2

Merged
kraftaa merged 9 commits intomainfrom
iceberg
Mar 8, 2026
Merged

Add Iceberg output format and Postgres ingest flow#2
kraftaa merged 9 commits intomainfrom
iceberg

Conversation

@kraftaa
Copy link
Copy Markdown
Owner

@kraftaa kraftaa commented Feb 25, 2026

Summary

This PR adds Apache Iceberg support to rustream and includes an ingest path for loading S3/local data into Postgres.
It is rebased on top of reliability improvements from main (idempotent incremental checkpoints + composite cursor paging).

What’s included

  1. Iceberg write support
  • new Iceberg writer pipeline and table write flow
  • load-or-create table behavior
  • commit via Iceberg transaction path
  1. Catalog integration
  • Iceberg catalog wiring and config support
  • warehouse/catalog initialization for Iceberg mode
  1. Sync integration
  • sync path supports format = iceberg
  • Parquet path remains available
  • incremental cursor logic from main preserved
  1. Ingest subcommand
  • adds S3/local -> Postgres ingest command path
  1. Docs/config updates
  • README updates for Iceberg + ingest usage
  • config examples updated for new format options

Important behavior note

  • Incremental correctness remains based on composite cursor (incremental_column + incremental_tiebreaker_column) and checkpointed progress state.
  • For Iceberg mode, progress is checkpointed after successful Iceberg commit.

Files to review first

  • src/iceberg.rs
  • src/catalog.rs
  • src/sync.rs
  • src/main.rs
  • src/config.rs
  • README.md

Validation performed

  • cargo fmt
  • cargo test

Risk / follow-up

  • Add integration tests for Iceberg commit + resume semantics (especially duplicate watermark scenarios).
  • Validate catalog compatibility across target environments (warehouse path/backends).

New `rustream ingest` subcommand reads Parquet/CSV files from local
filesystem or S3 and writes them to Postgres via batch INSERT/UPSERT.
Supports write modes (insert, upsert, truncate_insert), auto table
creation from file schema, glob-based file discovery, and SQLite-based
ingestion tracking to avoid reprocessing.
@kraftaa kraftaa self-assigned this Feb 25, 2026
@kraftaa kraftaa merged commit 2a13329 into main Mar 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant