Per-user Iceberg warehouse with bring-your-own S3 storage

### Feature Summary

A **warehouse** here is a top level entity in the catalog hierarchy (`Project → Warehouse → Namespace → Table`) that owns a set of namespaces (`results`, `runtime_stats`, `console_logs`) and the storage configuration (S3 bucket + credentials) backing their tables. This follows the [Lakekeeper warehouse concept](https://docs.lakekeeper.io/docs/nightly/concepts/).

Today Texera writes all execution outputs (`results`, `runtime_stats`, `console_logs`) into a single **global Iceberg warehouse**. One warehouse, all users share it, storage costs absorbed by the platform.

This issue proposes a **per-user warehouse** model: each user registers one or more warehouses, each backed by **their own S3 bucket** (Bring-Your-Own-S3). Storage cost follows the data owner; users get tenant-isolated namespaces and tables.

### Background / Motivation

- **Billing.** S3 cost should be attributed to the user who owns the data, not the platform.
- **Isolation.** Per-tenant namespaces/tables, no shared blast radius.
- **Builds on #4126**  — that issue introduced the REST Catalog Service (Lakekeeper) layer. This issue is the next step: make Lakekeeper multi-tenant.

### Scope

Per-user warehouses are scoped to the **Kubernetes deployment**. Local / single-node Docker Compose deployments continue to work as today: `PsqlCatalog` remains supported and unchanged, and `RestCatalog` mode keeps its current single global Lakekeeper warehouse (no per-user split).

### Proposed Solution or Design

#### Data model

```
User ─1:N→ Warehouse                (new)
User ─1:N→ ComputingUnit            (existing)
ComputingUnit ─1:N→ Execution       (existing)
Warehouse ─1:N→ Execution           (new association)
```

**ER diagram:** 

<img width="395" height="246" alt="Image" src="https://github.com/user-attachments/assets/559e205c-9208-486c-9240-b7cd451aac1c" />

#### Catalog hierarchy

Texera already has two `Catalog` implementations:

```
Catalog (interface)
├── PsqlCatalog          — backed by PostgreSQL
└── RestCatalog          — backed by any Iceberg REST Catalog service (Lakekeeper is one implementation of this)
```

This design uses **`RestCatalog` with Lakekeeper** as the REST Catalog service to deliver per-user warehouses. Lakekeeper owns S3 credentials in its own encrypted DB (Postgres); **Texera never persists raw S3 creds**, only the Lakekeeper warehouse UUID and non-secret metadata.

#### Flow A — Registering a warehouse

1. User fills the new Dashboard "Warehouse" tab with S3 bucket / endpoint / region / credentials.
2. Backend posts the credentials directly to Lakekeeper to create the warehouse. **Creds never touch the Texera DB.**
3. Lakekeeper returns the warehouse UUID; Texera stores the reference plus non-secret metadata.

**Sequence diagram:** 

<img width="946" height="433" alt="Image" src="https://github.com/user-attachments/assets/7306bb14-055f-459e-b20f-0800a24650e9" />

#### Flow B — Binding a warehouse to a CU

1. When the user creates a CU they pick which warehouse to use.
2. At execution time, Texera instantiates a `RestCatalog` for that CU using the warehouse's Lakekeeper UUID — no global singleton on the hot path.
3. Two-layer split at runtime:
   - **Catalog path** — `RestCatalog` talks to Lakekeeper for metadata operations (resolve table, create / commit snapshots, schema changes). Lakekeeper owns the warehouse → S3 path mapping.
   - **Data path** — the Iceberg client reads/writes Parquet **directly to the user's S3 bucket**, using short-lived credentials vended by Lakekeeper per request. Lakekeeper does not proxy S3 traffic.

Files land in the user's S3 bucket under the warehouse's root prefix, organized by namespace (`results` / `runtime_stats` / `console_logs`) and per-execution table.

**Sequence diagram (CU creation + RestCatalog instantiation):** 

<img width="923" height="399" alt="Image" src="https://github.com/user-attachments/assets/5e96ee80-c91a-4e51-834b-ed5c3b6b6f8c" />

For execution diagram please check: #4126

### Open questions

- Should a user own multiple warehouses, or exactly one? (Schema allows many)
- Shared CU: when User A runs a workflow on a CU owned by User B, whose warehouse stores the results? In other words, should we allow User A store results into User B's Warehouse.
- Warehouse deletion semantics: hard-delete the Lakekeeper catalog and leave S3 data orphaned in the user's bucket (Texera has no write access to user buckets), or soft-archive the catalog so existing executions stay readable until the user explicitly purges?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-user Iceberg warehouse with bring-your-own S3 storage #5135

Feature Summary

Background / Motivation

Scope

Proposed Solution or Design

Data model

Catalog hierarchy

Flow A — Registering a warehouse

Flow B — Binding a warehouse to a CU

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Per-user Iceberg warehouse with bring-your-own S3 storage #5135

Description

Feature Summary

Background / Motivation

Scope

Proposed Solution or Design

Data model

Catalog hierarchy

Flow A — Registering a warehouse

Flow B — Binding a warehouse to a CU

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions