diff --git a/README.md b/README.md index 99f1dc6..4408117 100644 --- a/README.md +++ b/README.md @@ -5,9 +5,14 @@ [![CI](https://github.com/SmooAI/clickhouse-kit/actions/workflows/rust.yml/badge.svg)](https://github.com/SmooAI/clickhouse-kit/actions/workflows/rust.yml) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE) -**A safe-by-construction schema toolkit for ClickHouse — built for user-defined, multi-tenant schemas.** +**A safe-by-construction schema toolkit for ClickHouse — for user-defined, multi-tenant schemas, with a TypeScript→Rust bridge for the schemas you author by hand.** -When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. `smooai-clickhouse-kit` owns that boundary so the happy path makes **SQL injection and unbounded tables impossible, not merely discouraged** — an allowlisted type system, identifier validation, DDL generation, forward-only migrations, additive evolution, and drift detection. Rows stay [Serde](https://serde.rs)-native (use the [`clickhouse`](https://crates.io/crates/clickhouse) crate's `#[derive(Row)]`), so the kit never reimplements row mapping. +The kit has two jobs: + +1. **Runtime toolkit (user-defined / multi-tenant tables).** When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. The kit owns that boundary so the happy path makes **SQL injection and unbounded tables impossible, not merely discouraged** — an allowlisted type system, identifier validation, DDL generation, `flexible_table`, forward-only migrations, and additive evolution. +2. **TS→Rust bridge (developer-authored tables).** When TypeScript owns a table's schema, `introspect` reads the live ClickHouse back into Rust and `codegen` emits the `#[derive(Row)]` struct, with `check_drift` asserting the Rust view ≡ the live DB. No more hand-copied row structs drifting from the schema. + +Either way, rows stay [Serde](https://serde.rs)-native (use the [`clickhouse`](https://crates.io/crates/clickhouse) crate's `#[derive(Row)]`) — the kit never reimplements row mapping. ```toml [dependencies] @@ -98,6 +103,27 @@ let drift = check_drift(&exec, &[table]).await?; For growing a per-tenant table, `diff_columns` + `alter_add_columns_sql` emit a guarded, **additive-only** `ALTER TABLE … ADD COLUMN IF NOT EXISTS …` (identifiers quoted; types from your trusted spec, never from the live DB). +## TS→Rust bridge: generate Rust rows from a TS-authored table + +When the schema lives in TypeScript, you don't hand-write (and re-sync) the Rust row struct — introspect the live table and generate it: + +```rust +use clickhouse_kit::introspect_row_struct; + +// Reads system.columns for `events` and emits the Rust source: +let src = introspect_row_struct(&exec, "events", "EventRow").await?; +// #[derive(Debug, Clone, clickhouse::Row, serde::Serialize, serde::Deserialize)] +// pub struct EventRow { +// pub id: String, // UUID +// pub org: String, // LowCardinality(String) +// pub n: u64, +// pub tags: Vec, +// pub attrs: std::collections::HashMap, +// } +``` + +`ch_type_to_rust` / `rust_row_struct` are also exposed directly. Pair this with `check_drift` in CI to assert the generated Rust view stays ≡ the live (TS-owned) schema — so the Rust side can never silently diverge. + ## Design - **Safe by construction.** The type allowlist is unrepresentable-by-default; identifiers are validated + quoted; tables are bounded. The dangerous bits are impossible, not discouraged. diff --git a/ROADMAP.md b/ROADMAP.md index b430d57..6c12fcd 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -28,15 +28,14 @@ - **MIT, generic.** Frame every primitive as "multi-tenant ClickHouse," never coupled to one app. - **Safe by construction.** Every runtime/user-facing primitive validates input; the happy path makes SQL injection and unbounded tables impossible, not merely discouraged. -## Next: Rust-canonical (the customer-shape work runs in Rust) +## Source-of-truth model: TypeScript for static, Rust for dynamic (split by population) -The flexible / multi-tenant surface (runtime construction, the safety layer, flatten/coerce, additive evolution) is consumed by **Rust** services (api-prime, audit-logs, the Ask-Your-Data data platform) — that's where untrusted customer schema input is turned into SQL, and _safe-by-construction only counts in the process holding the input_. So the canonical implementation moves to Rust: +ClickHouse has two schema populations with different natural owners. This mirrors `smooai-postgres-kit`'s TS-source reframe for the developer-authored set, while keeping the runtime engine canonical where TypeScript can't reach: -- **Canonical Rust crate** (`crates/clickhouse-kit`, crates.io, MIT) — rows are Serde-native (`#[derive(Row)]` via the `clickhouse` crate); the crate adds the allowlisted type system, identifier safety, DDL generation, runtime/`flexible` construction, additive evolution, migrations, and drift. The allowlist is _stronger_ than the TS version: disallowed types are **unrepresentable** (no enum variant), so untrusted JSON naming them fails to deserialize at the boundary. -- **No TS consumers → the Rust crate is the standalone product.** Published as **`smooai-clickhouse-kit`** on crates.io (imports as `clickhouse_kit`). There is nothing on the TS side that needs to ride this, so there is **no WASM/npm binding**; the Rust services consume the crate directly. The original TS package served as the reference spec and is retired/static-only. -- **The TS v0.1/v0.2 is the reference spec** the Rust port mirrors (the adversarial safety tests translate almost line-for-line). -- **Tradeoff recorded:** TS compile-time `$inferSelect` row inference can't come from a WASM/runtime core — static TS-authored tables move to codegen'd types. Non-issue for dynamic tables (shapes unknown at compile time). +- **Static, developer-authored tables** (observability, metrics, billing) → **TypeScript is the source of truth** (the `@smooai/clickhouse-kit` TS authoring DX). The Rust crate is the **TS→Rust bridge**: `introspect` reads the live ClickHouse → Rust, `codegen` (`rust_row_struct` / `ch_type_to_rust`) emits the `#[derive(Row)]` struct, and `check_drift` asserts the Rust view ≡ the live (TS-owned) schema — so the Rust side never hand-copies or silently diverges. +- **Dynamic, customer-defined / multi-tenant tables** (Ask-Your-Data, custom tables, audit) → **Rust is canonical**. Created at runtime from untrusted input; safe-by-construction only counts in the process holding the input. The allowlisted type system, identifier safety, DDL gen, `flexible_table`, forward-only migrations, and additive evolution live here. The allowlist is **unrepresentable-by-default** — disallowed types have no enum variant, so untrusted JSON naming them fails to deserialize at the boundary. +- **Crate:** `smooai-clickhouse-kit` on crates.io (imports as `clickhouse_kit`); rows stay Serde-native. **No WASM/npm binding** — the TS side authors static schemas in its own kit; the Rust side bridges + owns the dynamic engine. Started: the Rust **safety core** (`crates/clickhouse-kit/src/safety.rs`) — `validate_identifier`/`quote_identifier`, the `ColumnTypeSpec` allowlist (+ `to_ch_type`/`is_datetime64`), bounds + reserved — plus runtime **table DDL generation** (`table.rs`: `to_create_table_sql` from an untrusted spec, with identifier/allowlist/bounds/dup guards). Verified **end-to-end against a real ClickHouse** via testcontainers (generate DDL → apply → introspect `system.columns` → insert/select round-trip); the ported adversarial unit suite (injection, disallowed types, bounds, dup columns) is green too. CI runs unit + the testcontainers integration. -**Full surface landed (built via a 4-way Rust fan-out, lead-integrated):** `flexible_table` (the hybrid), `flatten_record` + `coerce_to_table`, `diff_columns` + `alter_add_columns_sql` (additive-only), and the I/O layer — a driver-agnostic `ChExecutor` trait + `run_migrations` (forward-only) + `check_drift` — with a second testcontainers integration exercising migrate + drift against a real ClickHouse. **38 unit + 2 real-ClickHouse integration tests, clippy `-D warnings` clean.** Published to crates.io as **`smooai-clickhouse-kit`** (manual `publish-crate.yml`, `SMOOAI_CARGO_REGISTRY_TOKEN`). No WASM binding — there are no TS consumers; the Rust services consume the crate directly. Rows stay Serde-native. +**Full surface landed (built via a 4-way Rust fan-out, lead-integrated):** `flexible_table` (the hybrid), `flatten_record` + `coerce_to_table`, `diff_columns` + `alter_add_columns_sql` (additive-only), and the I/O layer — a driver-agnostic `ChExecutor` trait + `run_migrations` (forward-only) + `check_drift` — with a second testcontainers integration exercising migrate + drift against a real ClickHouse. Plus the **TS→Rust bridge** (`codegen` + `introspect`): `ch_type_to_rust` / `rust_row_struct` map a ClickHouse table to a `#[derive(Row)]` struct, and `introspect_row_struct(&exec, table, name)` does live-table → Rust source in one call (verified by a third testcontainers integration: create table → introspect → generated struct). **41 unit + 3 real-ClickHouse integration tests, clippy `-D warnings` clean.** Published to crates.io as **`smooai-clickhouse-kit`** (manual `publish-crate.yml`, `SMOOAI_CARGO_REGISTRY_TOKEN`). Rows stay Serde-native. diff --git a/crates/clickhouse-kit/Cargo.toml b/crates/clickhouse-kit/Cargo.toml index 530687f..6242bd1 100644 --- a/crates/clickhouse-kit/Cargo.toml +++ b/crates/clickhouse-kit/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "smooai-clickhouse-kit" -version = "0.1.0" +version = "0.2.0" edition = "2021" rust-version = "1.75" license = "MIT" diff --git a/crates/clickhouse-kit/README.md b/crates/clickhouse-kit/README.md index 99f1dc6..4408117 100644 --- a/crates/clickhouse-kit/README.md +++ b/crates/clickhouse-kit/README.md @@ -5,9 +5,14 @@ [![CI](https://github.com/SmooAI/clickhouse-kit/actions/workflows/rust.yml/badge.svg)](https://github.com/SmooAI/clickhouse-kit/actions/workflows/rust.yml) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE) -**A safe-by-construction schema toolkit for ClickHouse — built for user-defined, multi-tenant schemas.** +**A safe-by-construction schema toolkit for ClickHouse — for user-defined, multi-tenant schemas, with a TypeScript→Rust bridge for the schemas you author by hand.** -When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. `smooai-clickhouse-kit` owns that boundary so the happy path makes **SQL injection and unbounded tables impossible, not merely discouraged** — an allowlisted type system, identifier validation, DDL generation, forward-only migrations, additive evolution, and drift detection. Rows stay [Serde](https://serde.rs)-native (use the [`clickhouse`](https://crates.io/crates/clickhouse) crate's `#[derive(Row)]`), so the kit never reimplements row mapping. +The kit has two jobs: + +1. **Runtime toolkit (user-defined / multi-tenant tables).** When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. The kit owns that boundary so the happy path makes **SQL injection and unbounded tables impossible, not merely discouraged** — an allowlisted type system, identifier validation, DDL generation, `flexible_table`, forward-only migrations, and additive evolution. +2. **TS→Rust bridge (developer-authored tables).** When TypeScript owns a table's schema, `introspect` reads the live ClickHouse back into Rust and `codegen` emits the `#[derive(Row)]` struct, with `check_drift` asserting the Rust view ≡ the live DB. No more hand-copied row structs drifting from the schema. + +Either way, rows stay [Serde](https://serde.rs)-native (use the [`clickhouse`](https://crates.io/crates/clickhouse) crate's `#[derive(Row)]`) — the kit never reimplements row mapping. ```toml [dependencies] @@ -98,6 +103,27 @@ let drift = check_drift(&exec, &[table]).await?; For growing a per-tenant table, `diff_columns` + `alter_add_columns_sql` emit a guarded, **additive-only** `ALTER TABLE … ADD COLUMN IF NOT EXISTS …` (identifiers quoted; types from your trusted spec, never from the live DB). +## TS→Rust bridge: generate Rust rows from a TS-authored table + +When the schema lives in TypeScript, you don't hand-write (and re-sync) the Rust row struct — introspect the live table and generate it: + +```rust +use clickhouse_kit::introspect_row_struct; + +// Reads system.columns for `events` and emits the Rust source: +let src = introspect_row_struct(&exec, "events", "EventRow").await?; +// #[derive(Debug, Clone, clickhouse::Row, serde::Serialize, serde::Deserialize)] +// pub struct EventRow { +// pub id: String, // UUID +// pub org: String, // LowCardinality(String) +// pub n: u64, +// pub tags: Vec, +// pub attrs: std::collections::HashMap, +// } +``` + +`ch_type_to_rust` / `rust_row_struct` are also exposed directly. Pair this with `check_drift` in CI to assert the generated Rust view stays ≡ the live (TS-owned) schema — so the Rust side can never silently diverge. + ## Design - **Safe by construction.** The type allowlist is unrepresentable-by-default; identifiers are validated + quoted; tables are bounded. The dangerous bits are impossible, not discouraged. diff --git a/crates/clickhouse-kit/src/codegen.rs b/crates/clickhouse-kit/src/codegen.rs new file mode 100644 index 0000000..af2bf56 --- /dev/null +++ b/crates/clickhouse-kit/src/codegen.rs @@ -0,0 +1,168 @@ +//! TS→Rust bridge codegen. TypeScript owns the (static) schema; this turns a +//! ClickHouse table's live/spec columns into a Rust **row struct** — `#[derive(Row, +//! Deserialize)]` — so the Rust services get faithful, drift-checked rows for the +//! TS-authored tables instead of hand-writing them (the class of bug that bit +//! api-prime's hand-copied structs). Pair with `introspect` + `check_drift`: the +//! generated rows are asserted ≡ the live ClickHouse in CI. +//! +//! The mapping is a faithful **scaffold**: ClickHouse temporal types map to +//! `String` (the works-everywhere default over the HTTP/RowBinary boundary) — a +//! consumer may refine those to `time`/`chrono` types behind the `clickhouse` +//! crate's feature flags. + +/// Strip a single-arg wrapper like `Nullable(...)` / `Array(...)`, returning the inner. +fn strip_wrapper<'a>(t: &'a str, name: &str) -> Option<&'a str> { + let prefix = format!("{name}("); + t.strip_prefix(&prefix) + .and_then(|rest| rest.strip_suffix(')')) +} + +/// Split a `Map(...)` inner on its top-level comma (respecting nested parens). +fn split_top_comma(inner: &str) -> Option<(&str, &str)> { + let mut depth = 0usize; + for (i, c) in inner.char_indices() { + match c { + '(' => depth += 1, + ')' => depth = depth.saturating_sub(1), + ',' if depth == 0 => return Some((inner[..i].trim(), inner[i + 1..].trim())), + _ => {} + } + } + None +} + +/// Map a ClickHouse type string to the Rust type a `clickhouse`-crate row uses. +/// Wrappers recurse; unknown scalars fall back to `String` (safe over the wire). +pub fn ch_type_to_rust(ch_type: &str) -> String { + let t = ch_type.trim(); + if let Some(inner) = strip_wrapper(t, "Nullable") { + return format!("Option<{}>", ch_type_to_rust(inner)); + } + if let Some(inner) = strip_wrapper(t, "LowCardinality") { + return ch_type_to_rust(inner); + } + if let Some(inner) = strip_wrapper(t, "Array") { + return format!("Vec<{}>", ch_type_to_rust(inner)); + } + if let Some(inner) = strip_wrapper(t, "Map") { + if let Some((k, v)) = split_top_comma(inner) { + return format!( + "std::collections::HashMap<{}, {}>", + ch_type_to_rust(k), + ch_type_to_rust(v) + ); + } + } + // Scalar — match on the base type, ignoring any `(...)` parameters. + let base = t.split('(').next().unwrap_or(t).trim(); + match base { + "Bool" => "bool", + "UInt8" => "u8", + "UInt16" => "u16", + "UInt32" => "u32", + "UInt64" => "u64", + "Int8" => "i8", + "Int16" => "i16", + "Int32" => "i32", + "Int64" => "i64", + "Float32" => "f32", + "Float64" => "f64", + // String, UUID, FixedString, Date*, DateTime*, IPv4/6, Enum*, JSON, and + // anything unrecognized → String (the safe over-the-wire default). + _ => "String", + } + .to_string() +} + +/// Rust raw-ident escape for column names that collide with Rust keywords. +fn rust_field_ident(name: &str) -> String { + const KEYWORDS: &[&str] = &[ + "as", "break", "const", "continue", "crate", "else", "enum", "extern", "false", "fn", + "for", "if", "impl", "in", "let", "loop", "match", "mod", "move", "mut", "pub", "ref", + "return", "self", "static", "struct", "super", "trait", "true", "type", "unsafe", "use", + "where", "while", "async", "await", "dyn", + ]; + if KEYWORDS.contains(&name) { + format!("r#{name}") + } else { + name.to_string() + } +} + +/// Emit a Rust row struct for a table's columns — `(column_name, clickhouse_type)` +/// pairs. Derives the `clickhouse` crate's `Row` + serde, so it deserializes +/// straight from a query. The emitted source references `clickhouse::Row` +/// (a dev/consumer dependency); this function only produces the string. +pub fn rust_row_struct(struct_name: &str, columns: &[(String, String)]) -> String { + let mut out = String::new(); + out.push_str( + "#[derive(Debug, Clone, clickhouse::Row, serde::Serialize, serde::Deserialize)]\n", + ); + out.push_str(&format!("pub struct {struct_name} {{\n")); + for (name, ch_type) in columns { + let field = rust_field_ident(name); + // Preserve the exact column name for (de)serialization when the field was escaped. + if field != *name { + out.push_str(&format!(" #[serde(rename = \"{name}\")]\n")); + } + out.push_str(&format!(" pub {field}: {},\n", ch_type_to_rust(ch_type))); + } + out.push_str("}\n"); + out +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn maps_scalars() { + assert_eq!(ch_type_to_rust("String"), "String"); + assert_eq!(ch_type_to_rust("UInt64"), "u64"); + assert_eq!(ch_type_to_rust("Int32"), "i32"); + assert_eq!(ch_type_to_rust("Float64"), "f64"); + assert_eq!(ch_type_to_rust("Bool"), "bool"); + assert_eq!(ch_type_to_rust("UUID"), "String"); + assert_eq!(ch_type_to_rust("DateTime64(3)"), "String"); + } + + #[test] + fn maps_wrappers_and_containers() { + assert_eq!(ch_type_to_rust("Nullable(String)"), "Option"); + assert_eq!(ch_type_to_rust("LowCardinality(String)"), "String"); + assert_eq!( + ch_type_to_rust("LowCardinality(Nullable(String))"), + "Option" + ); + assert_eq!(ch_type_to_rust("Array(String)"), "Vec"); + assert_eq!(ch_type_to_rust("Array(UInt32)"), "Vec"); + assert_eq!( + ch_type_to_rust("Map(String, String)"), + "std::collections::HashMap" + ); + assert_eq!( + ch_type_to_rust("Map(String, Array(UInt8))"), + "std::collections::HashMap>" + ); + } + + #[test] + fn emits_row_struct_with_keyword_escape() { + let cols = vec![ + ("id".to_string(), "UUID".to_string()), + ("count".to_string(), "UInt64".to_string()), + ("type".to_string(), "LowCardinality(String)".to_string()), + ("tags".to_string(), "Array(String)".to_string()), + ]; + let src = rust_row_struct("EventRow", &cols); + assert!(src.contains( + "#[derive(Debug, Clone, clickhouse::Row, serde::Serialize, serde::Deserialize)]" + )); + assert!(src.contains("pub struct EventRow {")); + assert!(src.contains("pub id: String,")); + assert!(src.contains("pub count: u64,")); + assert!(src.contains("#[serde(rename = \"type\")]")); + assert!(src.contains("pub r#type: String,")); + assert!(src.contains("pub tags: Vec,")); + } +} diff --git a/crates/clickhouse-kit/src/introspect.rs b/crates/clickhouse-kit/src/introspect.rs new file mode 100644 index 0000000..c82dab0 --- /dev/null +++ b/crates/clickhouse-kit/src/introspect.rs @@ -0,0 +1,29 @@ +//! The live→Rust half of the TS→Rust bridge. TypeScript authors the (static) +//! schema and ClickHouse holds it; this reads it back into Rust — columns for +//! `check_drift`, and a generated row struct via `codegen` — so the Rust side is a +//! faithful, drift-checked view of the TS-owned schema and can never silently +//! diverge. + +use crate::client::{ChError, ChExecutor}; +use crate::codegen::rust_row_struct; +use crate::evolve::LiveColumn; + +/// Introspect a table's live columns (name + ClickHouse type) from `system.columns`. +pub async fn introspect_columns( + exec: &impl ChExecutor, + table: &str, +) -> Result, ChError> { + exec.fetch_columns(table).await +} + +/// Introspect a live table and generate its Rust row struct source — the bridge +/// one-liner (a TS-authored ClickHouse table → a Rust `#[derive(Row)]` struct). +pub async fn introspect_row_struct( + exec: &impl ChExecutor, + table: &str, + struct_name: &str, +) -> Result { + let cols = introspect_columns(exec, table).await?; + let pairs: Vec<(String, String)> = cols.into_iter().map(|c| (c.name, c.type_name)).collect(); + Ok(rust_row_struct(struct_name, &pairs)) +} diff --git a/crates/clickhouse-kit/src/lib.rs b/crates/clickhouse-kit/src/lib.rs index 45ed75b..56671d7 100644 --- a/crates/clickhouse-kit/src/lib.rs +++ b/crates/clickhouse-kit/src/lib.rs @@ -1,29 +1,37 @@ -//! # clickhouse-kit +//! # smooai-clickhouse-kit (imports as `clickhouse_kit`) //! -//! Safe-by-construction schema toolkit for ClickHouse — the Rust-canonical port of -//! [`@smooai/clickhouse-kit`](https://github.com/SmooAI/clickhouse-kit). Rows are -//! Serde-native (use the `clickhouse` crate's `#[derive(Row)]`); this crate adds -//! what Serde doesn't: an allowlisted type system for **user-defined / multi-tenant** -//! schemas, identifier safety, DDL generation, and additive evolution. +//! A safe-by-construction ClickHouse schema toolkit with two jobs: //! -//! The safety layer lives here because that's where untrusted customer input is -//! turned into SQL — safe-by-construction only counts in the process holding the -//! input. See the repo `ROADMAP.md`. +//! - **TS→Rust bridge** for developer-authored (static) tables: TypeScript owns +//! the schema; [`introspect`] reads the live ClickHouse back into Rust and +//! [`codegen`] emits `#[derive(Row)]` structs, with [`check_drift`](crate::drift) +//! asserting the Rust view ≡ the live DB. Rows stay Serde-native. +//! - **Runtime toolkit** for user-defined / multi-tenant (dynamic) tables: an +//! allowlisted type system, identifier safety, DDL generation, [`flexible_table`], +//! forward-only migrations, and additive evolution — the safe-by-construction path +//! for turning untrusted customer input into SQL (that guarantee only counts in +//! the process holding the input, which is why this layer is canonical in Rust). +//! +//! See the repo `ROADMAP.md`. pub mod client; +pub mod codegen; pub mod drift; pub mod evolve; pub mod flatten; pub mod flexible; +pub mod introspect; pub mod migrate; pub mod safety; pub mod table; pub use client::{ChError, ChExecutor}; +pub use codegen::{ch_type_to_rust, rust_row_struct}; pub use drift::{check_drift, Drift, DriftResult}; pub use evolve::{alter_add_columns_sql, diff_columns, ColumnDiff, LiveColumn}; pub use flatten::{coerce_to_table, flatten_record, CoerceResult, FlattenOptions}; pub use flexible::{flexible_table, FlexibleConfig}; +pub use introspect::{introspect_columns, introspect_row_struct}; pub use migrate::{run_migrations, split_sql_statements, MigrationRunResult}; pub use safety::{ assert_column_count, assert_not_reserved, quote_identifier, validate_identifier, diff --git a/crates/clickhouse-kit/tests/integration_bridge.rs b/crates/clickhouse-kit/tests/integration_bridge.rs new file mode 100644 index 0000000..1661f1f --- /dev/null +++ b/crates/clickhouse-kit/tests/integration_bridge.rs @@ -0,0 +1,111 @@ +//! TS→Rust bridge integration: introspect a real ClickHouse table and generate its +//! Rust row struct. Proves the live → Rust codegen path end-to-end. Gated behind +//! `#[ignore]` (Docker); CI runs it. + +use clickhouse::Client; +use clickhouse_kit::{introspect_row_struct, ChError, ChExecutor, LiveColumn}; +use std::future::Future; +use testcontainers_modules::{clickhouse::ClickHouse, testcontainers::runners::AsyncRunner}; + +struct Exec(Client); + +#[derive(clickhouse::Row, serde::Deserialize)] +struct ColRow { + name: String, + #[serde(rename = "type")] + ty: String, +} + +#[derive(clickhouse::Row, serde::Deserialize)] +struct StrRow { + v: String, +} + +#[allow(clippy::manual_async_fn)] +impl ChExecutor for Exec { + fn command(&self, sql: &str) -> impl Future> + Send { + async move { + self.0 + .query(sql) + .execute() + .await + .map_err(|e| ChError::Backend(e.to_string())) + } + } + + fn fetch_strings( + &self, + sql: &str, + ) -> impl Future, ChError>> + Send { + async move { + let rows = self + .0 + .query(sql) + .fetch_all::() + .await + .map_err(|e| ChError::Backend(e.to_string()))?; + Ok(rows.into_iter().map(|r| r.v).collect()) + } + } + + fn fetch_columns( + &self, + table: &str, + ) -> impl Future, ChError>> + Send { + async move { + let q = format!("SELECT name, type FROM system.columns WHERE database = currentDatabase() AND table = '{table}' ORDER BY position"); + let rows = self + .0 + .query(&q) + .fetch_all::() + .await + .map_err(|e| ChError::Backend(e.to_string()))?; + Ok(rows + .into_iter() + .map(|r| LiveColumn { + name: r.name, + type_name: r.ty, + }) + .collect()) + } + } +} + +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +#[ignore = "requires Docker (ClickHouse testcontainer)"] +async fn introspect_then_codegen_row_struct() { + let node = ClickHouse::default() + .start() + .await + .expect("start clickhouse"); + let port = node.get_host_port_ipv4(8123).await.expect("http port"); + let exec = Exec(Client::default().with_url(format!("http://127.0.0.1:{port}"))); + + exec.command( + "CREATE TABLE events (id UUID, org LowCardinality(String), n UInt64, ratio Float64, tags Array(String), attrs Map(String, String)) ENGINE = MergeTree() ORDER BY (id)", + ) + .await + .expect("create table"); + + // Live ClickHouse table -> generated Rust row struct. + let src = introspect_row_struct(&exec, "events", "EventRow") + .await + .expect("introspect + codegen"); + + assert!( + src.contains( + "#[derive(Debug, Clone, clickhouse::Row, serde::Serialize, serde::Deserialize)]" + ), + "{src}" + ); + assert!(src.contains("pub struct EventRow {"), "{src}"); + assert!(src.contains("pub id: String,"), "{src}"); // UUID -> String + assert!(src.contains("pub org: String,"), "{src}"); // LowCardinality(String) -> String + assert!(src.contains("pub n: u64,"), "{src}"); + assert!(src.contains("pub ratio: f64,"), "{src}"); + assert!(src.contains("pub tags: Vec,"), "{src}"); + assert!( + src.contains("pub attrs: std::collections::HashMap,"), + "{src}" + ); +} diff --git a/src/__tests__/migrate.test.ts b/src/__tests__/migrate.test.ts index 533076c..af655eb 100644 --- a/src/__tests__/migrate.test.ts +++ b/src/__tests__/migrate.test.ts @@ -38,7 +38,7 @@ describe("splitSqlStatements", () => { describe("runClickHouseMigrations", () => { let dir: string; beforeEach(() => { - dir = mkdtempSync(path.join(tmpdir(), "chkit-")); + dir = mkdtempSync(path.join(tmpdir(), "smooai-ch-kit-")); writeFileSync(path.join(dir, "0001_a.sql"), "CREATE TABLE a (x String) ENGINE = Memory;\n"); writeFileSync(path.join(dir, "0002_b.sql"), "CREATE TABLE b (y String) ENGINE = Memory;\n"); });