diff --git a/docs-mintlify/docs.json b/docs-mintlify/docs.json index ce4e970f9f1b1..b9aab32aa5cb7 100644 --- a/docs-mintlify/docs.json +++ b/docs-mintlify/docs.json @@ -636,7 +636,8 @@ "recipes/pre-aggregations/disabling-pre-aggregations", "recipes/pre-aggregations/using-originalsql-and-rollups-effectively", "recipes/pre-aggregations/refreshing-select-partitions", - "recipes/pre-aggregations/joining-multiple-data-sources" + "recipes/pre-aggregations/joining-multiple-data-sources", + "recipes/pre-aggregations/mixed-refresh-cadences-rollup-join" ] }, { diff --git a/docs-mintlify/recipes/index.mdx b/docs-mintlify/recipes/index.mdx index 4b9406ea8797b..ac6e5eddd03e9 100644 --- a/docs-mintlify/recipes/index.mdx +++ b/docs-mintlify/recipes/index.mdx @@ -108,6 +108,9 @@ pre-aggregations, configuration, APIs, and AI. Join data from different warehouses with cross-database rollup joins. + + Combine a slow-changing fact rollup with a frequently-refreshed lookup rollup using a rollup join. + ## Configuration diff --git a/docs-mintlify/recipes/pre-aggregations/mixed-refresh-cadences-rollup-join.mdx b/docs-mintlify/recipes/pre-aggregations/mixed-refresh-cadences-rollup-join.mdx new file mode 100644 index 0000000000000..be59c9cb0b208 --- /dev/null +++ b/docs-mintlify/recipes/pre-aggregations/mixed-refresh-cadences-rollup-join.mdx @@ -0,0 +1,387 @@ +--- +title: Joining pre-aggregations with different refresh cadences +description: Use a rollup join to combine a heavy, slowly-refreshed fact pre-aggregation with a lightweight, frequently-refreshed lookup pre-aggregation so derived metrics stay fresh without rebuilding the full fact table. +--- + +## Use case + +A common modeling problem is computing a metric that depends on two inputs +which change at very different rates: + +- A **large fact table** that is expensive to aggregate and only needs to be + refreshed on a slow cadence (for example, daily or hourly). +- A **small lookup table** whose values are applied to each fact row and that + needs to be refreshed much more frequently than the fact aggregation. + +Some examples of this pattern: + +- Converting an amount column to a target currency using the latest foreign + exchange (FX) rates — either a single-currency column multiplied by a rate, + or an `amount` and `currency` column resolved with a `CASE` statement. +- Re-pricing inventory or order lines with a frequently-updated price list. +- Applying a frequently-tuned scoring weight, tax rate, or commission rate to + historical events. + +Combining both inputs into a single [rollup](/reference/data-modeling/pre-aggregations#rollup) +forces the entire pre-aggregation to refresh whenever the lookup values +change, which is wasteful. In the recipe below, we'll learn how to use a +[rollup join](/reference/data-modeling/pre-aggregations#rollup_join) to keep +each pre-aggregation on its own refresh schedule while still serving the +combined, derived query from pre-aggregations. + +We'll walk through the FX conversion variant as a concrete example, but the +same pattern applies to any of the use cases above. + + + `rollup_join` has several constraints documented on the + [pre-aggregations reference page](/reference/data-modeling/pre-aggregations#rollup_join). + In particular: it is currently in Preview, it is designed for joining data + across data sources, it can only join two rollups, and it is ephemeral — + set freshness controls on the referenced rollups rather than on the + `rollup_join` itself. + + The rollup on the right side of the join is also bounded by the number of + physical Cube Store partitions it can have, which depends on your Cube + Store compute tier. Note that these are Cube Store physical partitions — + not Cube logical partitions — and each one is roughly 50 MB in size and + contains up to ~1M rows. The exact per-tier limits are still being + finalized, but expect on the order of 5, 10, and 20 physical partitions + for S, M, and L tiers respectively. + + +## Data modeling + +We have two cubes: `orders`, which stores per-order amounts in the original +transaction currency, and `fx_rates`, which stores the latest exchange rate +from each currency to USD. + +The `orders` table looks like this: + +| id | currency | amount | created_at | +|----|----------|---------|---------------------| +| 1 | EUR | 120.00 | 2026-05-18 09:14:22 | +| 2 | GBP | 75.50 | 2026-05-18 11:02:47 | +| 3 | EUR | 245.10 | 2026-05-19 08:31:05 | +| 4 | USD | 310.00 | 2026-05-19 10:18:33 | +| 5 | GBP | 89.99 | 2026-05-19 12:44:51 | + +The `fx_rates` table looks like this: + +| currency | rate_to_usd | +|----------|-------------| +| EUR | 1.085 | +| GBP | 1.262 | +| USD | 1.000 | + +First, define a `rollup` pre-aggregation on `orders` that aggregates the +amount by `currency` and day. This is the heavy pre-aggregation, so we set a +slow `refresh_key` — for example, every day: + + + +```yaml title="YAML" +cubes: + - name: orders + + sql_table: public.orders + + joins: + - name: fx_rates + sql: "{CUBE}.currency = {fx_rates.currency}" + relationship: many_to_one + + dimensions: + - name: id + sql: id + type: number + primary_key: true + + - name: currency + sql: currency + type: string + + - name: created_at + sql: created_at + type: time + + measures: + - name: amount + sql: amount + type: sum + + pre_aggregations: + - name: orders_rollup + type: rollup + measures: + - amount + dimensions: + - currency + time_dimension: created_at + granularity: day + refresh_key: + every: 1 day + indexes: + - name: currency_index + columns: + - currency +``` + +```javascript title="JavaScript" +cube(`orders`, { + sql_table: `public.orders`, + + joins: { + fx_rates: { + sql: `${CUBE}.currency = ${fx_rates.currency}`, + relationship: `many_to_one` + } + }, + + dimensions: { + id: { + sql: `id`, + type: `number`, + primary_key: true + }, + + currency: { + sql: `currency`, + type: `string` + }, + + created_at: { + sql: `created_at`, + type: `time` + } + }, + + measures: { + amount: { + sql: `amount`, + type: `sum` + } + }, + + pre_aggregations: { + orders_rollup: { + type: `rollup`, + measures: [amount], + dimensions: [currency], + time_dimension: created_at, + granularity: `day`, + refresh_key: { + every: `1 day` + }, + indexes: { + currency_index: { + columns: [currency] + } + } + } + } +}) +``` + + + +Next, define a `rollup` pre-aggregation on `fx_rates`. This pre-aggregation is +small (one row per currency) and cheap to rebuild, so we give it a much +faster `refresh_key` than the orders rollup — for example, every hour: + + + +```yaml title="YAML" +cubes: + - name: fx_rates + + sql_table: public.fx_rates + + dimensions: + - name: currency + sql: currency + type: string + primary_key: true + + - name: rate_to_usd + sql: rate_to_usd + type: number + + pre_aggregations: + - name: fx_rates_rollup + type: rollup + dimensions: + - currency + - rate_to_usd + refresh_key: + every: 1 hour + indexes: + - name: currency_index + columns: + - currency +``` + +```javascript title="JavaScript" +cube(`fx_rates`, { + sql_table: `public.fx_rates`, + + dimensions: { + currency: { + sql: `currency`, + type: `string`, + primary_key: true + }, + + rate_to_usd: { + sql: `rate_to_usd`, + type: `number` + } + }, + + pre_aggregations: { + fx_rates_rollup: { + type: `rollup`, + dimensions: [currency, rate_to_usd], + refresh_key: { + every: `1 hour` + }, + indexes: { + currency_index: { + columns: [currency] + } + } + } + } +}) +``` + + + + + Both pre-aggregations must include an index on the join key (`currency` in + this example) for the `rollup_join` to match. The `fx_rates_rollup` also + needs `rate_to_usd` as a dimension so it's available downstream. + + +Finally, define a `rollup_join` pre-aggregation on `orders` that references +both rollups. This is an ephemeral pre-aggregation — it doesn't materialize +its own data, so it doesn't need a `refresh_key`. Cube serves queries from it +by joining the two underlying rollups on the fly: + + + +```yaml title="YAML" +cubes: + - name: orders + # ... + + pre_aggregations: + # ... + + - name: orders_with_fx_rollup + type: rollup_join + measures: + - amount + dimensions: + - currency + - fx_rates.rate_to_usd + time_dimension: created_at + granularity: day + rollups: + - fx_rates.fx_rates_rollup + - orders_rollup +``` + +```javascript title="JavaScript" +cube(`orders`, { + // ... + + pre_aggregations: { + // ... + + orders_with_fx_rollup: { + type: `rollup_join`, + measures: [amount], + dimensions: [currency, fx_rates.rate_to_usd], + time_dimension: created_at, + granularity: `day`, + rollups: [fx_rates.fx_rates_rollup, orders_rollup] + } + } +}) +``` + + + +To expose the USD-converted amount, add a derived measure on `orders` that +multiplies the order amount by the FX rate from the joined cube: + + + +```yaml title="YAML" +cubes: + - name: orders + # ... + + measures: + # ... + + - name: amount_usd + sql: "{CUBE.amount} * {fx_rates.rate_to_usd}" + type: number +``` + +```javascript title="JavaScript" +cube(`orders`, { + // ... + + measures: { + // ... + + amount_usd: { + sql: `${CUBE.amount} * ${fx_rates.rate_to_usd}`, + type: `number` + } + } +}) +``` + + + +## Query + +Let's query daily sales in USD by currency: + +```json +{ + "measures": ["orders.amount_usd"], + "dimensions": ["orders.currency"], + "timeDimensions": [ + { + "dimension": "orders.created_at", + "granularity": "day" + } + ] +} +``` + +## Result + +Cube serves the query from `orders_with_fx_rollup`, joining the cached +`orders_rollup` (refreshed daily) with the cached `fx_rates_rollup` +(refreshed hourly). The heavy aggregation never rebuilds when FX rates +change, but the converted totals always reflect the latest rates. + +```javascript +[ + { + "orders.created_at.day": "2026-05-19T00:00:00.000", + "orders.currency": "EUR", + "orders.amount_usd": "12450.32" + }, + { + "orders.created_at.day": "2026-05-19T00:00:00.000", + "orders.currency": "GBP", + "orders.amount_usd": "8930.17" + } +] +``` diff --git a/docs-mintlify/reference/data-modeling/pre-aggregations.mdx b/docs-mintlify/reference/data-modeling/pre-aggregations.mdx index 46e983ee14ee1..eb87efd60b62a 100644 --- a/docs-mintlify/reference/data-modeling/pre-aggregations.mdx +++ b/docs-mintlify/reference/data-modeling/pre-aggregations.mdx @@ -219,11 +219,17 @@ version. Rollup join is designed to join data across different data sources. -To join data within the same data source you can list members from different cubes within a regular rollup. -Rollup join can be used only to join two tables, and one can't contain more than -one partition so that the join can operate correctly. It means that at least one -table should be less than 1M of rows. There's work in progress to remove those -limitations. +To join data within the same data source you can list members from different +cubes within a regular rollup. Rollup join can be used only to join two +rollups. + +The rollup on the right side of the join is bounded by the number of physical +Cube Store partitions it can have, which depends on your Cube Store compute +tier. Note that these are Cube Store physical partitions — not Cube logical +partitions — and each one is roughly 50 MB in size and contains up to ~1M +rows. The exact per-tier limits are still being finalized, but expect on the +order of 5, 10, and 20 physical partitions for S, M, and L tiers +respectively.