diff --git a/documentation/query/sql/sample-by.md b/documentation/query/sql/sample-by.md index 954fa2272..ca5c9caa9 100644 --- a/documentation/query/sql/sample-by.md +++ b/documentation/query/sql/sample-by.md @@ -4,33 +4,29 @@ sidebar_label: SAMPLE BY description: SAMPLE BY SQL keyword reference documentation. --- -`SAMPLE BY` is used on [time-series data](/blog/what-is-time-series-data/) to summarize large datasets into -aggregates of homogeneous time chunks as part of a -[SELECT statement](/docs/query/sql/select/). +`SAMPLE BY` aggregates time-series data into time-based groups (hourly, daily, etc.) and applies aggregate functions to each group. It is QuestDB's primary tool for downsampling and summarizing large datasets. Supports interpolation for missing intervals via `FILL`. -To use `SAMPLE BY`, a table column needs to be specified as a -[designated timestamp](/docs/concepts/designated-timestamp/). - -Users performing `SAMPLE BY` queries on datasets **with missing data** may make -use of the [FILL](#fill-options) keyword to specify a fill behavior. +Requires a [designated timestamp](/docs/concepts/designated-timestamp/) column. ## Syntax -### SAMPLE BY keywords - -![Flow chart showing the syntax of the SAMPLE BY keywords](/images/docs/diagrams/sampleBy.svg) - -### FROM-TO keywords - -![Flow chart showing the syntax of the FROM-TO keywords](/images/docs/diagrams/fromTo.svg) - -### FILL keywords - -![Flow chart showing the syntax of the FILL keyword](/images/docs/diagrams/fill.svg) +```questdb-sql +SELECT columns, aggregate_functions +FROM table +[WHERE conditions] +SAMPLE BY + [FROM timestamp TO timestamp] + [FILL(fill_option)] + [ALIGN TO CALENDAR [TIME ZONE tz] [WITH OFFSET offset] | ALIGN TO FIRST OBSERVATION] +``` -### ALIGN TO keywords +### Key clauses -![Flow chart showing the syntax of the ALIGN TO keywords](/images/docs/diagrams/alignToCalTimeZone.svg) +- **`SAMPLE BY `**: Groups rows into time buckets (e.g., `1h`, `5m`, `1d`) +- **`FROM ... TO`**: Defines explicit time boundaries for filling missing intervals +- **`FILL(option)`**: Specifies how to handle missing time buckets (`NONE`, `NULL`, `PREV`, `LINEAR`, or constant) +- **`ALIGN TO CALENDAR`**: Aligns buckets to calendar boundaries (default). Supports `TIME ZONE` and `WITH OFFSET` +- **`ALIGN TO FIRST OBSERVATION`**: Aligns buckets relative to the first row's timestamp ## Sample units @@ -44,6 +40,7 @@ Where the unit for sampled groups may be one of the following: | unit | description | | ---- | ----------- | +| `n` | nanosecond | | `U` | microsecond | | `T` | millisecond | | `s` | second | @@ -63,74 +60,19 @@ FROM trades SAMPLE BY 1h; ``` -## FROM-TO - -:::note - -Versions prior to QuestDB 8.1.0 do not have access to this extension. - -Please see the new blog for more information. - -::: - -When using `SAMPLE BY` with `FILL`, you can fill missing rows within the result set with pre-determined values. - -However, this method will only fill rows between existing data in the data set and cannot fill rows outside of this range. - -To fill outside the bounds of the existing data, you can specify a fill range using a `FROM-TO` clause. The boundary -timestamps are expected in UTC. - -Note that `FROM-TO` clause can be used only on non-keyed SAMPLE BY queries, i.e. queries that have no grouping columns -other than the timestamp. - -#### Syntax - -Specify the shape of the query using `FROM` and `TO`: - -```questdb-sql title='Pre-filling trade data' demo -SELECT timestamp as ts, count() -FROM trades -SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); -``` - -If no rows exist at the start of the range, QuestDB automatically fills in these rows. - -This is distinct from the `WHERE` clause with a simple rule of thumb - -`WHERE` controls what data flows in, `FROM-TO` controls what data flows out. - -Use both `FROM` and `TO` in isolation to pre-fill or post-fill data. If `FROM` is not provided, then the lower bound is the start of the dataset, aligned to calendar. The opposite is true omitting `TO`. - -#### `WHERE` clause optimisation - -If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp, -QuestDB will add one for you, matching the `FROM-TO` interval. - -This means that the query will run optimally, and avoid touching data not relevant to the result. - -Therefore, we compile the prior query into something similar to this: - -```questdb-sql title='Pre-filling trade data with WHERE optimisation' demo -SELECT timestamp as ts, count() -FROM trades -WHERE timestamp >= '2009-01-01' - AND timestamp < '2009-01-10' -SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); -``` - -#### Limitations - -Here are the current limits to this feature. +## Fill options -- This syntax is not compatible with `FILL(PREV)` or `FILL(LINEAR)`. -- This syntax is for `ALIGN TO CALENDAR` only (default alignment). -- Does not consider any specified `OFFSET`. -- This syntax is for non-keyed `SAMPLE BY` i.e. only designated timestamp and aggregate columns. +`FILL` specifies how to handle time intervals with no data. By default, missing intervals are skipped. Use `FILL` to interpolate or substitute values. -## Fill options +| fillOption | Description | +| ---------- | ------------------------------------------------------------------------------------------------------------------------- | +| `NONE` | No fill applied. If there is no data, the time sample will be skipped in the results. A table could be missing intervals. | +| `NULL` | Fills with `NULL` values. | +| `PREV` | Fills using the previous value. | +| `LINEAR` | Fills by linear interpolation of the 2 surrounding points. | +| `x` | Fills with a constant value - where `x` is the desired value, for example `FILL(100.05)`. | -The `FILL` keyword is optional and expects one or more `fillOption` strategies -which will be applied to one or more aggregate columns. The following -restrictions apply: +The following restrictions apply: - Keywords denoting fill strategies may not be combined. Only one option from `NONE`, `NULL`, `PREV`, `LINEAR` and constants may be used. @@ -146,14 +88,6 @@ SAMPLE BY 1h FILL(LINEAR) ALIGN TO ... ``` -| fillOption | Description | -| ---------- | ------------------------------------------------------------------------------------------------------------------------- | -| `NONE` | No fill applied. If there is no data, the time sample will be skipped in the results. A table could be missing intervals. | -| `NULL` | Fills with `NULL` values. | -| `PREV` | Fills using the previous value. | -| `LINEAR` | Fills by linear interpolation of the 2 surrounding points. | -| `x` | Fills with a constant value - where `x` is the desired value, for example `FILL(100.05)`. | - Consider an example table named `prices` which has no records during the entire third hour (`2021-01-01T03`): @@ -263,6 +197,49 @@ In the above query `min(price)` aggregate will get `FILL(NULL)` strategy applied, `max(price)` will get `FILL(10)`, and `avg(price)` will get `FILL(PREV)`. +## FROM-TO + +`FILL` only fills intervals between existing data points. To fill outside the bounds of existing data, use `FROM-TO` to specify explicit time boundaries. + +```questdb-sql title='Pre-filling trade data' demo +SELECT timestamp as ts, count() +FROM trades +SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); +``` + +If no rows exist at the start of the range, QuestDB automatically fills in these rows. + +This is distinct from the `WHERE` clause with a simple rule of thumb - +`WHERE` controls what data flows in, `FROM-TO` controls what data flows out. + +Use both `FROM` and `TO` in isolation to pre-fill or post-fill data. If `FROM` is not provided, then the lower bound is the start of the dataset, aligned to calendar. The opposite is true omitting `TO`. + +The boundary timestamps are expected in UTC. `FROM-TO` can only be used on non-keyed SAMPLE BY queries (queries with no grouping columns other than the timestamp). + +### `WHERE` clause optimisation + +If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp, +QuestDB will add one for you, matching the `FROM-TO` interval. + +This means that the query will run optimally, and avoid touching data not relevant to the result. + +Therefore, we compile the prior query into something similar to this: + +```questdb-sql title='Pre-filling trade data with WHERE optimisation' demo +SELECT timestamp as ts, count() +FROM trades +WHERE timestamp >= '2009-01-01' + AND timestamp < '2009-01-10' +SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); +``` + +### Limitations + +- Not compatible with `FILL(PREV)` or `FILL(LINEAR)`. +- Only works with `ALIGN TO CALENDAR` (default alignment). +- Does not consider any specified `OFFSET`. +- Only for non-keyed `SAMPLE BY` (designated timestamp and aggregate columns only). See [Fill keyed queries with arbitrary intervals](/docs/cookbook/sql/time-series/fill-keyed-arbitrary-interval/) for a workaround. + ## Sample calculation The default time calculation of sampled groups is an absolute value, in other