Snowflake -> ClickHouse Equivalent Concepts#6244
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
4 Skipped Deployments
|
…into snowflake-equivalent-concepts
| | Account | [Warehouse](/cloud/reference/warehouses) | Each service scales compute independently; storage is shared at the warehouse level. Tier and billing are set at the organization level, not per warehouse. | | ||
| | Database | [Database](/sql-reference/statements/create/database) | Logical container for tables. Snowflake uses a Database → Schema → Table hierarchy; ClickHouse flattens this to Database → Table. See [Schemas](#schemas) below. | | ||
|
|
||
| :::note[Warehouse terminology] |
There was a problem hiding this comment.
def let's keep this and link to the warehouses page
| | Row access policy | [Row policy](/sql-reference/statements/create/row-policy) — a `WHERE`-style expression evaluated per user | Row policies apply transparently to every query against the table. | | ||
| | Sequence | [`generateSerialID`](/sql-reference/functions/other-functions#generateSerialID) for a Keeper-backed sequential counter; [`generateSnowflakeID`](/sql-reference/functions/uuid-functions#generateSnowflakeID) or [`generateUUIDv7`](/sql-reference/functions/uuid-functions#generateUUIDv7) for distributed unique IDs | `generateSerialID` is the closest match to an auto-incrementing sequence: a named, monotonic counter coordinated through ClickHouse Keeper. The UUID functions suit high-throughput unique IDs that don't need a shared counter. | | ||
|
|
||
| :::note[Time Travel and backups] |
There was a problem hiding this comment.
have we thought about just having this part and not including these features in the table? to lessen up the mentions?
|
|
||
| | Snowflake | ClickHouse | Notes | | ||
| |---|---|---| | ||
| | Primary key (advisory) | Primary key — drives the on-disk sort order and the [sparse primary index](/guides/best-practices/sparse-primary-indexes) | Where Snowflake's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit `LIMIT`. Neither system enforces uniqueness. | |
There was a problem hiding this comment.
we should explicitly call out the fact that our primary key does not have to be unique. That's like an industry standard (that PKs are unique)
| | Snowflake | ClickHouse | Notes | | ||
| |---|---|---| | ||
| | Primary key (advisory) | Primary key — drives the on-disk sort order and the [sparse primary index](/guides/best-practices/sparse-primary-indexes) | Where Snowflake's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit `LIMIT`. Neither system enforces uniqueness. | | ||
| | Foreign key (advisory) | Wide tables or [dictionaries](/dictionary) for lookups | ClickHouse doesn't accept foreign-key declarations even as advisory hints. | |
There was a problem hiding this comment.
Are we talking about foreign key constraints or...? I'm confused by this because foreign keys to me are just the join key
| | Search Optimization Service | Secondary indexes — [bloom-filter](/engines/table-engines/mergetree-family/mergetree#bloom-filter), token-bloom, [minmax](/engines/table-engines/mergetree-family/mergetree#minmax) | ClickHouse asks you to pick the index type per column and tune its parameters; there's no automatic equivalent. | | ||
| | Cortex Search / Snowflake Cortex Search | [Full-text index](/engines/table-engines/mergetree-family/textindexes) | Token index over string columns for in-database search. | | ||
| | `VECTOR` data type and vector search | [`Array(Float32)`](/sql-reference/data-types/array) or [`Array(BFloat16)`](/sql-reference/data-types/float#bfloat16) with a [vector ANN index](/engines/table-engines/mergetree-family/annindexes); or [`QBit`](/sql-reference/data-types/qbit) for tunable-precision search | ClickHouse has no dedicated `VECTOR` type. Embeddings store as `Array(Float32)`, or `Array(BFloat16)` to halve storage, with an ANN index accelerating approximate nearest-neighbor lookups. `QBit` keeps full precision while letting you trade bits for speed at query time. | | ||
| | Materialized view | [Incremental MV](/materialized-view/incremental-materialized-view) — updates on each insert into a base table | Source-shape rules differ; review both before porting an existing MV. Cost is paid at insert time in ClickHouse. | |
There was a problem hiding this comment.
fun fact - Snowflake views are extremely limited and don't even support joins :)
| | Network policies (IP allowlist) | IP allowlists and [private connectivity](/cloud/security/connectivity/private-networking) — PrivateLink (AWS, Azure) and Private Service Connect (GCP) for ingress restriction | Private connectivity is available across the three major clouds. | | ||
| | Tri-Secret Secure (customer-managed keys) | [CMEK](/cloud/security/cmek) on the service | Supports key rotation and revocation. See the CMEK page for the current list of supported cloud providers. | | ||
| | Object tagging (governance metadata) | — | ClickHouse exposes metadata via `system.*` tables rather than user-defined tags. | | ||
| | Data classification (sensitive-data detection) | — | Not a managed feature; external tools (e.g. DataHub) cover this layer. | |
There was a problem hiding this comment.
We do support tagging but it's definitely not to level of Snowflake
…into snowflake-equivalent-concepts
…into snowflake-equivalent-concepts
…/04_equivalent-concepts.md Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
…/04_equivalent-concepts.md Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
…ickHouse/clickhouse-docs into snowflake-equivalent-concepts
morsapaes
left a comment
There was a problem hiding this comment.
Didn't manage to review the whole PR yet, but adding some suggestions for what I was able to review this week.
|
|
||
| ## Schemas {#schemas} | ||
|
|
||
| A Snowflake schema serves multiple roles and has no single equivalent in ClickHouse. |
There was a problem hiding this comment.
A schema in Snowflake is technically equivalent to a database in ClickHouse.
| in Snowflake. | ||
| ::: | ||
|
|
||
| ## Schemas {#schemas} |
There was a problem hiding this comment.
Not sure this section makes sense as is. It should probably be a subsection of the one above that explains how to map the Snowflake namespace hierarchy to the more restrictive hierarchy we use in ClickHouse. There are know issues when users migrate, e.g. with integrations like dbt.
| | Warehouse size (XS through 6X-Large) | Vertical [autoscaling](/cloud/features/autoscaling/vertical) bounds | Sizing is configured as min/max memory and CPU bounds rather than discrete t-shirt sizes; setting min = max effectively fixes the size. | | ||
| | Multi-cluster warehouse | Manual [horizontal scaling](/cloud/features/autoscaling/horizontal) | ClickHouse scales replica count rather than cluster count. There's no direct equivalent to Snowflake's auto-scaling policies (`Standard`/`Economy`); horizontal replica count is set manually. | | ||
| | Auto-suspend / auto-resume | Service [idling](/cloud/features/autoscaling/idling) | Compute stops when there's no work, restarts on the next query. | | ||
| | Resource monitors (credit-quota spend caps) | [Workloads](/operations/workload-scheduling) for runtime scheduling; per-query limits (memory, threads, execution time) | ClickHouse workloads cover runtime resource scheduling but not spend caps; there's no primitive that suspends a service on hitting a credit threshold. | |
There was a problem hiding this comment.
We do have billing thresholds and threshold-based notifications, might be worth mentioning. These are only informational, though; we don't cap or restrict usage.
|
|
||
| ## Billing and pricing model {#billing} | ||
|
|
||
| ClickHouse Cloud meters compute as per-minute [compute units (8 GiB RAM, 2 vCPU)](/cloud/manage/billing/overview#how-is-compute-metered) rather than as credits scaled by warehouse size, charges for storage as compressed bytes without Time Travel or Fail-safe overhead, and bills backups as a separate line item rather than bundling them into retention windows. Most Snowflake "serverless compute" features (Snowpipe, Search Optimization, Auto-clustering, materialized view refresh, Cortex) are bundled into service compute on ClickHouse; [ClickPipes](/integrations/clickpipes) is the explicit exception and is [metered separately](/cloud/reference/billing/clickpipes). As in Snowflake, ClickHouse Cloud charges for public internet egress and cross-region data transfer and offers committed-spend discounts. See [ClickHouse Cloud pricing](/cloud/manage/billing/overview) for current rates, tiers, and commitment options. |
There was a problem hiding this comment.
Can we just follow the current documentation? This paragraph on billing is pretty convoluted. Here, we simply say:
ClickHouse Cloud bills based on the usage of compute, storage, data transfer (egress over the internet and cross-region), and ClickPipes.
More direct to understand. Backups are lumped into storage costs, I don't think we need to (or should) mention them upfront.
|
|
||
| ## Storage and tables {#storage-tables} | ||
|
|
||
| In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and `ORDER BY` / `PARTITION BY` / `TTL` clauses configure physical layout and retention. Many Snowflake per-feature settings map to a clause in the ClickHouse `CREATE TABLE` statement. Physical schema design also differs between platforms; see the [migration guide](./02_migration_guide.md) for design tradeoffs. |
There was a problem hiding this comment.
"Merge" is a ClickHouse-specific concept, +1 on simplifying this sentence. It feels like we've described this a million times, we should be able to reuse existing descriptions:
In ClickHouse, you define storage and data layout upfront at table creation time. A CREATE TABLE statement specifies not only the columns and data types, but also the table engine and the sorting and indexing strategy through an ORDER BY clause. The ORDER BY clause is equivalent to a Snowflake clustering key: it defines how data is sorted on disk and indexed. In ClickHouse, unlike Snowflake, you don't incur additional background costs for maintaining the sort order once the table is created. This gives you direct control over query performance and storage costs.
Other clauses like PARTITION BY or TTL are available for partitioning, retention, and other data management strategies, as needed. Many of the settings you configure per-feature in Snowflake map to these clauses in a single CREATE TABLE statement. See the migration guide for design tradeoffs.
Summary
Snowflake -> ClickHouse equivalent concepts page to strengthen our migration story.
Checklist