Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/postgres-kit-parity-tszod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"smooai-clickhouse-kit": minor
---

Add TS + Zod code emit (codegen feature) for schema/consumer parity with postgres-kit.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# @smooai/clickhouse-kit

## Unreleased

### Patch Changes

- Docs/metadata: neutralized external-toolkit references in the ROADMAP, package description/keywords, and source comments (now described as a schema-as-code toolkit with Zod schema emitters). No API change.

### Minor Changes

- TS + Zod code emit behind the `codegen` cargo feature (`src/codegen.rs`) — from a `TableSpec`, emit a TS row `interface`, a Zod **select** schema, and a Zod **insert** schema (columns with a ClickHouse `DEFAULT` become `.optional()`), for schema/consumer parity with `postgres-kit`. Mirrors the retired TS package's `createSelectSchema`/`createInsertSchema` output style: `camelCase` keys, 4-space formatting, and the same ClickHouse→TS/Zod type mapping (`String`/`UUID`/dates→`string`/`z.string()`, ints/floats→`number`/`z.number()`, `Bool`→`boolean`, `Array(String)`→`string[]`/`z.array(z.string())`, `Map(String,String)`→`Record<string,string>`/`z.record(z.string(),z.string())`, `JSON`→`unknown`/`z.unknown()`, `Nullable(T)`→optional `T | null`/`.nullable()`, `LowCardinality(T)` transparent → `T`).

## 0.2.0

### Minor Changes
Expand Down
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ let ddl = to_create_table_sql(&table, &SchemaLimits::default())?;

Every identifier is validated (`^[A-Za-z_][A-Za-z0-9_]*$` + a length bound, backtick-quoted on render), column counts are bounded, and `ORDER BY` entries must be real columns — so a malicious table/column name can't inject SQL.

Need an explicit precision/timezone? Use the parametrised `DateTime64` type — `{"datetime64": {"precision": 3, "timezone": "UTC"}}` renders `DateTime64(3, 'UTC')` (bare `"DateTime64"` still renders `DateTime64(3)`). Precision (`0..=9`) and the timezone charset (`^[A-Za-z0-9_+/-]{1,64}$`, the IANA shape) are validated before they reach SQL, so an untrusted timezone string can't inject.

## The flexible (hybrid) table

The most-reused multi-tenant shape in one call — your mandatory + promoted typed columns, plus an `attrs Map(String, String)` catch-all and a `raw String`:
Expand All @@ -76,6 +78,40 @@ let table = flexible_table(
)?;
```

## Production-table DDL: partitioning, TTL, indexes, settings

Real production tables need `PARTITION BY`, a TTL policy, data-skipping indexes, and `SETTINGS`. `TableSpec` (and `FlexibleConfig`) carry these as additive fields, rendered in canonical ClickHouse clause order — `ENGINE` → `PARTITION BY` → `ORDER BY` → `TTL` → `SETTINGS`, with `INDEX` lines inside the column parens:

```rust
use clickhouse_kit::{IndexSpec, TableSpec, TtlMove, TtlSpec};

let table = TableSpec {
// ...columns, engine, order_by...
partition_by: Some("(organization_id, toDate(started_at))".into()),
indexes: vec![IndexSpec {
name: "idx_trace_id".into(),
expression: "trace_id".into(),
type_def: "bloom_filter(0.01)".into(),
granularity: 1,
}],
ttl: Some(TtlSpec {
column: "started_at".into(),
move_to_volume_after: Some(TtlMove { interval: "14 DAY".into(), volume: "cold".into() }),
delete_after: Some("180 DAY".into()),
}),
settings: vec![
("storage_policy".into(), "'hot_cold'".into()),
("index_granularity".into(), "8192".into()),
],
// ...
};
// TTL toDateTime(started_at) + INTERVAL 14 DAY TO VOLUME 'cold', toDateTime(started_at) + INTERVAL 180 DAY DELETE
```

A `DateTime64` TTL column is automatically wrapped in `toDateTime(...)`. All four fields are optional/empty by default, so existing specs render exactly as before.

**Safety posture:** these knobs are **app-controlled raw fragments** emitted verbatim — `partition_by`, the index `expression`/`type_def`, the TTL `interval`/`volume`/`delete_after`, and the settings RHS values are _not_ validated, exactly like `engine`. Only identifiers are validated: the index `name`, and the TTL `column` (which must also be a real column in the table). Never build the raw fragments from untrusted input.

## Ingest: flatten + coerce

Shape an arbitrary record to a (possibly dynamic) table — known keys land in their columns, the long tail flattens into `attrs`, and `raw` captures the original:
Expand Down
4 changes: 2 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## v0.1 (shipped) — static, developer-authored schemas

"Drizzle for ClickHouse": a developer authors a table once, at compile time, as a literal — `clickhouseTable(name, columns, options)` → `toCreateTableSql` (DDL) + inferred row type (`InferSelect`) + `createSelectSchema`/`createInsertSchema` (drizzle-zod) + forward-only migrations (`generate`/`migrate`/`check`, no auto-diff). Column system: `ch.*` (`ChColumn`). Minimal, TS-only, forward-only, MIT.
A schema-as-code toolkit for ClickHouse: a developer authors a table once, at compile time, as a literal — `clickhouseTable(name, columns, options)` → `toCreateTableSql` (DDL) + inferred row type (`InferSelect`) + `createSelectSchema`/`createInsertSchema` (Zod schema emitters) + forward-only migrations (`generate`/`migrate`/`check`, no auto-diff). Column system: `ch.*` (`ChColumn`). Minimal, TS-only, forward-only, MIT.

## v0.2 — the safe foundation for flexible, user-driven, multi-tenant schemas

Expand Down Expand Up @@ -38,4 +38,4 @@ ClickHouse has two schema populations with different natural owners. This mirror

Started: the Rust **safety core** (`crates/clickhouse-kit/src/safety.rs`) — `validate_identifier`/`quote_identifier`, the `ColumnTypeSpec` allowlist (+ `to_ch_type`/`is_datetime64`), bounds + reserved — plus runtime **table DDL generation** (`table.rs`: `to_create_table_sql` from an untrusted spec, with identifier/allowlist/bounds/dup guards). Verified **end-to-end against a real ClickHouse** via testcontainers (generate DDL → apply → introspect `system.columns` → insert/select round-trip); the ported adversarial unit suite (injection, disallowed types, bounds, dup columns) is green too. CI runs unit + the testcontainers integration.

**Full surface landed (built via a 4-way Rust fan-out, lead-integrated):** `flexible_table` (the hybrid), `flatten_record` + `coerce_to_table`, `diff_columns` + `alter_add_columns_sql` (additive-only), and the I/O layer — a driver-agnostic `ChExecutor` trait + `run_migrations` (forward-only) + `check_drift` — with a second testcontainers integration exercising migrate + drift against a real ClickHouse. Plus the **TS→Rust bridge** (`codegen` + `introspect`): `ch_type_to_rust` / `rust_row_struct` map a ClickHouse table to a `#[derive(Row)]` struct, and `introspect_row_struct(&exec, table, name)` does live-table → Rust source in one call (verified by a third testcontainers integration: create table → introspect → generated struct). **41 unit + 3 real-ClickHouse integration tests, clippy `-D warnings` clean.** Published to crates.io as **`smooai-clickhouse-kit`** (manual `publish-crate.yml`, `SMOOAI_CARGO_REGISTRY_TOKEN`). Rows stay Serde-native.
**Full surface landed:** the runtime toolkit — `flexible_table`, `flatten_record`/`coerce_to_table`, `diff_columns`/`alter_add_columns_sql` (additive-only), the driver-agnostic `ChExecutor` trait + `run_migrations` (forward-only) + `check_drift` — plus **production-table DDL** on `TableSpec` (`partition_by`/`ttl`/`indexes`/`settings`, parametrized `DateTime64(p, tz)`; SMOODEV-2115). And **codegen both directions**: `ch_type_to_rust`/`rust_row_struct` + `introspect_row_struct` (live ClickHouse → Rust `#[derive(Row)]` — the TS→Rust bridge), and `emit_row_interface`/`emit_select_schema`/`emit_insert_schema`/`emit_ts_module` (a `TableSpec` → TS interface + Zod select/insert schemas). Verified against real ClickHouse via three testcontainers integrations (DDL round-trip; migrate + drift; introspect → codegen); clippy `-D warnings` clean. Published to crates.io as **`smooai-clickhouse-kit` 0.3.0** (manual `publish-crate.yml`, `SMOOAI_CARGO_REGISTRY_TOKEN`). Rows stay Serde-native.
2 changes: 1 addition & 1 deletion crates/clickhouse-kit/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion crates/clickhouse-kit/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "smooai-clickhouse-kit"
version = "0.2.0"
version = "0.3.0"
edition = "2021"
rust-version = "1.75"
license = "MIT"
Expand Down
36 changes: 36 additions & 0 deletions crates/clickhouse-kit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ let ddl = to_create_table_sql(&table, &SchemaLimits::default())?;

Every identifier is validated (`^[A-Za-z_][A-Za-z0-9_]*$` + a length bound, backtick-quoted on render), column counts are bounded, and `ORDER BY` entries must be real columns — so a malicious table/column name can't inject SQL.

Need an explicit precision/timezone? Use the parametrised `DateTime64` type — `{"datetime64": {"precision": 3, "timezone": "UTC"}}` renders `DateTime64(3, 'UTC')` (bare `"DateTime64"` still renders `DateTime64(3)`). Precision (`0..=9`) and the timezone charset (`^[A-Za-z0-9_+/-]{1,64}$`, the IANA shape) are validated before they reach SQL, so an untrusted timezone string can't inject.

## The flexible (hybrid) table

The most-reused multi-tenant shape in one call — your mandatory + promoted typed columns, plus an `attrs Map(String, String)` catch-all and a `raw String`:
Expand All @@ -76,6 +78,40 @@ let table = flexible_table(
)?;
```

## Production-table DDL: partitioning, TTL, indexes, settings

Real production tables need `PARTITION BY`, a TTL policy, data-skipping indexes, and `SETTINGS`. `TableSpec` (and `FlexibleConfig`) carry these as additive fields, rendered in canonical ClickHouse clause order — `ENGINE` → `PARTITION BY` → `ORDER BY` → `TTL` → `SETTINGS`, with `INDEX` lines inside the column parens:

```rust
use clickhouse_kit::{IndexSpec, TableSpec, TtlMove, TtlSpec};

let table = TableSpec {
// ...columns, engine, order_by...
partition_by: Some("(organization_id, toDate(started_at))".into()),
indexes: vec![IndexSpec {
name: "idx_trace_id".into(),
expression: "trace_id".into(),
type_def: "bloom_filter(0.01)".into(),
granularity: 1,
}],
ttl: Some(TtlSpec {
column: "started_at".into(),
move_to_volume_after: Some(TtlMove { interval: "14 DAY".into(), volume: "cold".into() }),
delete_after: Some("180 DAY".into()),
}),
settings: vec![
("storage_policy".into(), "'hot_cold'".into()),
("index_granularity".into(), "8192".into()),
],
// ...
};
// TTL toDateTime(started_at) + INTERVAL 14 DAY TO VOLUME 'cold', toDateTime(started_at) + INTERVAL 180 DAY DELETE
```

A `DateTime64` TTL column is automatically wrapped in `toDateTime(...)`. All four fields are optional/empty by default, so existing specs render exactly as before.

**Safety posture:** these knobs are **app-controlled raw fragments** emitted verbatim — `partition_by`, the index `expression`/`type_def`, the TTL `interval`/`volume`/`delete_after`, and the settings RHS values are _not_ validated, exactly like `engine`. Only identifiers are validated: the index `name`, and the TTL `column` (which must also be a real column in the table). Never build the raw fragments from untrusted input.

## Ingest: flatten + coerce

Shape an arbitrary record to a (possibly dynamic) table — known keys land in their columns, the long tail flattens into `attrs`, and `raw` captures the original:
Expand Down
Loading
Loading