Skip to content

Snowflake -> ClickHouse Equivalent Concepts#6244

Draft
dhtclk wants to merge 17 commits into
mainfrom
snowflake-equivalent-concepts
Draft

Snowflake -> ClickHouse Equivalent Concepts#6244
dhtclk wants to merge 17 commits into
mainfrom
snowflake-equivalent-concepts

Conversation

@dhtclk

@dhtclk dhtclk commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Snowflake -> ClickHouse equivalent concepts page to strengthen our migration story.

Checklist

@vercel

vercel Bot commented May 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Error Error Comment Jun 11, 2026 8:39pm
4 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-jp Ignored Ignored Jun 11, 2026 8:39pm
clickhouse-docs-ko Ignored Ignored Preview Jun 11, 2026 8:39pm
clickhouse-docs-ru Ignored Ignored Preview Jun 11, 2026 8:39pm
clickhouse-docs-zh Ignored Ignored Preview Jun 11, 2026 8:39pm

Request Review

@Blargian Blargian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhtclk structure looks great - left some comments around accuracy of some statements. We will need a few more pairs of eyes on this as well.

| Account | [Warehouse](/cloud/reference/warehouses) | Each service scales compute independently; storage is shared at the warehouse level. Tier and billing are set at the organization level, not per warehouse. |
| Database | [Database](/sql-reference/statements/create/database) | Logical container for tables. Snowflake uses a Database → Schema → Table hierarchy; ClickHouse flattens this to Database → Table. See [Schemas](#schemas) below. |

:::note[Warehouse terminology]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def let's keep this and link to the warehouses page

| Row access policy | [Row policy](/sql-reference/statements/create/row-policy) — a `WHERE`-style expression evaluated per user | Row policies apply transparently to every query against the table. |
| Sequence | [`generateSerialID`](/sql-reference/functions/other-functions#generateSerialID) for a Keeper-backed sequential counter; [`generateSnowflakeID`](/sql-reference/functions/uuid-functions#generateSnowflakeID) or [`generateUUIDv7`](/sql-reference/functions/uuid-functions#generateUUIDv7) for distributed unique IDs | `generateSerialID` is the closest match to an auto-incrementing sequence: a named, monotonic counter coordinated through ClickHouse Keeper. The UUID functions suit high-throughput unique IDs that don't need a shared counter. |

:::note[Time Travel and backups]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we thought about just having this part and not including these features in the table? to lessen up the mentions?


| Snowflake | ClickHouse | Notes |
|---|---|---|
| Primary key (advisory) | Primary key — drives the on-disk sort order and the [sparse primary index](/guides/best-practices/sparse-primary-indexes) | Where Snowflake's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit `LIMIT`. Neither system enforces uniqueness. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should explicitly call out the fact that our primary key does not have to be unique. That's like an industry standard (that PKs are unique)

| Snowflake | ClickHouse | Notes |
|---|---|---|
| Primary key (advisory) | Primary key — drives the on-disk sort order and the [sparse primary index](/guides/best-practices/sparse-primary-indexes) | Where Snowflake's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit `LIMIT`. Neither system enforces uniqueness. |
| Foreign key (advisory) | Wide tables or [dictionaries](/dictionary) for lookups | ClickHouse doesn't accept foreign-key declarations even as advisory hints. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking about foreign key constraints or...? I'm confused by this because foreign keys to me are just the join key

| Search Optimization Service | Secondary indexes — [bloom-filter](/engines/table-engines/mergetree-family/mergetree#bloom-filter), token-bloom, [minmax](/engines/table-engines/mergetree-family/mergetree#minmax) | ClickHouse asks you to pick the index type per column and tune its parameters; there's no automatic equivalent. |
| Cortex Search / Snowflake Cortex Search | [Full-text index](/engines/table-engines/mergetree-family/textindexes) | Token index over string columns for in-database search. |
| `VECTOR` data type and vector search | [`Array(Float32)`](/sql-reference/data-types/array) or [`Array(BFloat16)`](/sql-reference/data-types/float#bfloat16) with a [vector ANN index](/engines/table-engines/mergetree-family/annindexes); or [`QBit`](/sql-reference/data-types/qbit) for tunable-precision search | ClickHouse has no dedicated `VECTOR` type. Embeddings store as `Array(Float32)`, or `Array(BFloat16)` to halve storage, with an ANN index accelerating approximate nearest-neighbor lookups. `QBit` keeps full precision while letting you trade bits for speed at query time. |
| Materialized view | [Incremental MV](/materialized-view/incremental-materialized-view) — updates on each insert into a base table | Source-shape rules differ; review both before porting an existing MV. Cost is paid at insert time in ClickHouse. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun fact - Snowflake views are extremely limited and don't even support joins :)

| Network policies (IP allowlist) | IP allowlists and [private connectivity](/cloud/security/connectivity/private-networking) — PrivateLink (AWS, Azure) and Private Service Connect (GCP) for ingress restriction | Private connectivity is available across the three major clouds. |
| Tri-Secret Secure (customer-managed keys) | [CMEK](/cloud/security/cmek) on the service | Supports key rotation and revocation. See the CMEK page for the current list of supported cloud providers. |
| Object tagging (governance metadata) | — | ClickHouse exposes metadata via `system.*` tables rather than user-defined tags. |
| Data classification (sensitive-data detection) | — | Not a managed feature; external tools (e.g. DataHub) cover this layer. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support tagging but it's definitely not to level of Snowflake

…/04_equivalent-concepts.md

Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
…/04_equivalent-concepts.md

Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>

@morsapaes morsapaes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't manage to review the whole PR yet, but adding some suggestions for what I was able to review this week.


## Schemas {#schemas}

A Snowflake schema serves multiple roles and has no single equivalent in ClickHouse.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A schema in Snowflake is technically equivalent to a database in ClickHouse.

in Snowflake.
:::

## Schemas {#schemas}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this section makes sense as is. It should probably be a subsection of the one above that explains how to map the Snowflake namespace hierarchy to the more restrictive hierarchy we use in ClickHouse. There are know issues when users migrate, e.g. with integrations like dbt.

| Warehouse size (XS through 6X-Large) | Vertical [autoscaling](/cloud/features/autoscaling/vertical) bounds | Sizing is configured as min/max memory and CPU bounds rather than discrete t-shirt sizes; setting min = max effectively fixes the size. |
| Multi-cluster warehouse | Manual [horizontal scaling](/cloud/features/autoscaling/horizontal) | ClickHouse scales replica count rather than cluster count. There's no direct equivalent to Snowflake's auto-scaling policies (`Standard`/`Economy`); horizontal replica count is set manually. |
| Auto-suspend / auto-resume | Service [idling](/cloud/features/autoscaling/idling) | Compute stops when there's no work, restarts on the next query. |
| Resource monitors (credit-quota spend caps) | [Workloads](/operations/workload-scheduling) for runtime scheduling; per-query limits (memory, threads, execution time) | ClickHouse workloads cover runtime resource scheduling but not spend caps; there's no primitive that suspends a service on hitting a credit threshold. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have billing thresholds and threshold-based notifications, might be worth mentioning. These are only informational, though; we don't cap or restrict usage.


## Billing and pricing model {#billing}

ClickHouse Cloud meters compute as per-minute [compute units (8 GiB RAM, 2 vCPU)](/cloud/manage/billing/overview#how-is-compute-metered) rather than as credits scaled by warehouse size, charges for storage as compressed bytes without Time Travel or Fail-safe overhead, and bills backups as a separate line item rather than bundling them into retention windows. Most Snowflake "serverless compute" features (Snowpipe, Search Optimization, Auto-clustering, materialized view refresh, Cortex) are bundled into service compute on ClickHouse; [ClickPipes](/integrations/clickpipes) is the explicit exception and is [metered separately](/cloud/reference/billing/clickpipes). As in Snowflake, ClickHouse Cloud charges for public internet egress and cross-region data transfer and offers committed-spend discounts. See [ClickHouse Cloud pricing](/cloud/manage/billing/overview) for current rates, tiers, and commitment options.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just follow the current documentation? This paragraph on billing is pretty convoluted. Here, we simply say:

ClickHouse Cloud bills based on the usage of compute, storage, data transfer (egress over the internet and cross-region), and ClickPipes.

More direct to understand. Backups are lumped into storage costs, I don't think we need to (or should) mention them upfront.


## Storage and tables {#storage-tables}

In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and `ORDER BY` / `PARTITION BY` / `TTL` clauses configure physical layout and retention. Many Snowflake per-feature settings map to a clause in the ClickHouse `CREATE TABLE` statement. Physical schema design also differs between platforms; see the [migration guide](./02_migration_guide.md) for design tradeoffs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Merge" is a ClickHouse-specific concept, +1 on simplifying this sentence. It feels like we've described this a million times, we should be able to reuse existing descriptions:

In ClickHouse, you define storage and data layout upfront at table creation time. A CREATE TABLE statement specifies not only the columns and data types, but also the table engine and the sorting and indexing strategy through an ORDER BY clause. The ORDER BY clause is equivalent to a Snowflake clustering key: it defines how data is sorted on disk and indexed. In ClickHouse, unlike Snowflake, you don't incur additional background costs for maintaining the sort order once the table is created. This gives you direct control over query performance and storage costs.

Other clauses like PARTITION BY or TTL are available for partitioning, retention, and other data management strategies, as needed. Many of the settings you configure per-feature in Snowflake map to these clauses in a single CREATE TABLE statement. See the migration guide for design tradeoffs.

@dhtclk dhtclk added the Don't Merge Don't merge yet label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Don't Merge Don't merge yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants