You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/tests.md
+51-29Lines changed: 51 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,20 @@
1
1
# Testing
2
2
3
-
Testing allows you to protect your project from regression by continuously verifying the output of each model matches your expectations. Unlike [audits](audits.md), tests are executed either on demand (for example, as part of a CI/CD job) or every time a new [plan](plans.md) is created.
3
+
Testing allows you to protect your project from regression by continuously verifying that the output of each model matches your expectations. Unlike [audits](audits.md), tests are executed either on demand (for example, as part of a CI/CD job) or every time a new [plan](plans.md) is created.
4
4
5
5
Similar to unit testing in software development, SQLMesh evaluates the model's logic against predefined inputs and then compares the output to expected outcomes provided as part of each test.
6
6
7
7
A comprehensive suite of tests can empower data practitioners to work with confidence, as it allows them to ensure models behave as expected after changes have been applied to them.
8
8
9
9
## Creating tests
10
10
11
-
Test suites are defined using YAML format within `.yaml` files in the `tests/` folder of your SQLMesh project. Each test within a suite file contains the following attributes:
11
+
Test suites are defined using the YAML format. Each suite is a file whose name must begin with `test`, end in either `.yaml` or `.yml`, and is stored under the `tests/` folder of your SQLMesh project.
12
+
13
+
Tests within a suite file contain the following attributes:
12
14
13
15
* The unique name of a test
14
16
* The name of the model targeted by this test
15
-
* Test inputs, which are defined per external table or upstream model referenced by the target model. Each test input consists of the following:
17
+
* Test inputs, which are defined per upstream model or external table referenced by the target model. Each test input consists of the following:
16
18
* The name of an upstream model or external table
17
19
* The list of rows defined as a mapping from a column name to a value associated with it
18
20
* Expected outputs, which are defined as follows:
@@ -21,8 +23,6 @@ Test suites are defined using YAML format within `.yaml` files in the `tests/` f
21
23
*[Optional] The dictionary of values for macro variables that will be set during model testing
22
24
* There are three special macros that can be overridden, `start`, `end`, and `execution_time`. Overriding each will allow you to override the date macros in your SQL queries. For example, setting execution_time: 2022-01-01 -> execution_ds in your queries.
23
25
24
-
A column may be omitted from a row (either input or output), in which case it will be implicitly added with the value `NULL`. For example, this can be useful when specifying input data for wide tables where some columns may not be required to define a test.
25
-
26
26
The YAML format is defined as follows:
27
27
28
28
```yaml linenums="1"
@@ -47,7 +47,7 @@ The YAML format is defined as follows:
47
47
<macro_variable_name>: <macro_variable_value>
48
48
```
49
49
50
-
Note: the`rows` key is optional in the above format, so the following would also be valid:
50
+
The`rows` key is optional in the above format, so the following would also be valid:
51
51
52
52
```
53
53
<unique_test_name>:
@@ -58,6 +58,26 @@ Note: the `rows` key is optional in the above format, so the following would als
58
58
...
59
59
```
60
60
61
+
### Omitting Columns
62
+
63
+
Defining the complete inputs and outputs for wide tables, i.e. tables with many columns, can become cumbersome. Therefore, if certain columns can be safely ignored they may be omitted from any row and their value will be treated as `NULL` for that row.
64
+
65
+
Additionally, it's possible to test only a subset of the output columns by setting `partial` to `true` for the rows of interest:
66
+
67
+
```yaml linenums="1"
68
+
...
69
+
outputs:
70
+
query:
71
+
partial: true
72
+
rows:
73
+
- <column_name>: <column_value>
74
+
...
75
+
```
76
+
77
+
This is useful when we can't treat the missing columns as `NULL`, but still want to ignore them.
78
+
79
+
When `partial` is set, the rows need to be defined as a mapping under the `rows` key and the tested columns are only those that are referenced in them.
80
+
61
81
### Example
62
82
63
83
In this example, we'll use the `sqlmesh_example.full_model` model, which is provided as part of the `sqlmesh init` command and defined as follows:
@@ -108,9 +128,9 @@ test_example_full_model:
108
128
num_orders: 1
109
129
```
110
130
111
-
Note that `ds` is redundant in the above test, since it is not referenced in `full_model`, so it may be omitted.
131
+
The `ds` column is not needed in the above test, since it is not referenced in `full_model`, so it may be omitted.
112
132
113
-
Let's also assume that we are only interested in testing the `num_orders` output column, i.e. we only care about the `id` input column of `sqlmesh_example.incremental_model`. Then, we could rewrite the above test more compactly as follows:
133
+
If we were only interested in testing the `num_orders` column, we could only specify input values for the `id` column of `sqlmesh_example.incremental_model`, thus rewriting the above test more compactly as follows:
114
134
115
135
```yaml linenums="1"
116
136
test_example_full_model:
@@ -127,7 +147,7 @@ test_example_full_model:
127
147
- num_orders: 3
128
148
```
129
149
130
-
Leaving out the input column `item_id` means that it will be implicitly added in all input rows with a `NULL` value. Thus, we expect the corresponding output column to only contain `NULL` values, which is indeed reflected in the above test since the `item_id` column is also omitted from `query`'s rows.
150
+
Since [omitted columns](#omitting-columns) are treated as `NULL`, this test also implicitly asserts that both the input and the output `item_id` columns are `NULL`, which is correct.
131
151
132
152
### Testing CTEs
133
153
@@ -185,11 +205,13 @@ test_example_full_model:
185
205
186
206
## Automatic test generation
187
207
188
-
Creating tests manually is a cumbersome and, ironically, error-prone process, especially as the number of rows and columns of the involved models grows. To address this, SQLMesh provides the [`create_test` command](../reference/cli.md#create_test), which can be used to automatically create tests for a given model.
208
+
Creating tests manually can be repetitive and error-prone, which is why SQLMesh also provides a way to automate this process using the [`create_test` command](../reference/cli.md#create_test).
209
+
210
+
This command can generate a complete test for a given model, as long as the tables of the upstream models it references exist in the project's data warehouse and are already populated with data.
189
211
190
212
### Example
191
213
192
-
Since we already have a test for `sqlmesh_example.full_model`, in this example we'll show how to generate a test for `sqlmesh_example.incremental_model`, which is provided as part of the `sqlmesh init` command and defined as follows:
214
+
In this example, we'll show how to generate a test for `sqlmesh_example.incremental_model`, which is another model provided as part of the `sqlmesh init` command and defined as follows:
193
215
194
216
```sql linenums="1"
195
217
MODEL (
@@ -213,34 +235,35 @@ WHERE
213
235
214
236
```
215
237
216
-
As one may expect, we need to start by specifying what the input data are for `sqlmesh_example.seed_model`. The `create_test` command achieves this by executing a user-supplied query against the target warehouse of the SQLMesh project to produce the input rows of the aforementioned model.
238
+
Firstly, we need to specify the input data for the upstream model `sqlmesh_example.seed_model`. The `create_test` command starts by executing a user-supplied query against the project's data warehouse and uses the returned data to produce the test's input rows.
217
239
218
-
Let's assume that we're only interested in specifying three input rows for `sqlmesh_example.seed_model`. One way to do that is by executing the following query:
240
+
For instance, the following query will return three rows from the table corresponding to the model `sqlmesh_example.seed_model`:
219
241
220
242
```sql linenums="1"
221
243
SELECT * FROM sqlmesh_example.seed_model LIMIT 3
222
244
```
223
245
224
-
However, notice that `sqlmesh_example.incremental_model` also contains a filter which references the `@start_ds` and `@end_ds` [macro variables](macros/macro_variables.md). To ensure that the produced test will always pass, we modify the above query to constrain the value range of the `ds` column:
246
+
Next, notice that `sqlmesh_example.incremental_model` contains a filter which references the `@start_ds` and `@end_ds` [macro variables](macros/macro_variables.md).
247
+
248
+
To make the generated test deterministic and thus ensure that it will always succeed, we need to define these variables and modify the above query to constrain `ds` accordingly.
249
+
250
+
If we set `@start_ds` to `'2020-01-01'` and `@end_ds` to `'2020-01-04'`, the above query needs to be changed to:
225
251
226
252
```sql linenums="1"
227
-
-- The dates '2020-01-01' and '2020-01-04' have been picked arbitrarily
228
253
SELECT * FROM sqlmesh_example.seed_model WHERE ds BETWEEN '2020-01-01' AND '2020-01-04' LIMIT 3
229
254
```
230
255
231
-
We will also define these variables in the test, so that the filter of `sqlmesh_example.incremental_model` matches that range after it's been rendered.
256
+
Finally, combining this query with the proper macro variable definitions, we can compute the expected output for the model's query in order to generate the complete test.
232
257
233
-
Finally, we don't have to specify the output `query` attribute, since we can compute its values given the input data produced by the above query.
234
-
235
-
The following command captures all of the above:
258
+
This can be achieved using the following command:
236
259
237
260
```bash
238
261
$ sqlmesh create_test sqlmesh_example.incremental_model --query sqlmesh_example.seed_model "select * from sqlmesh_example.seed_model where ds between '2020-01-01' and '2020-01-04' limit 3" --var start '2020-01-01' --var end '2020-01-04'
239
262
```
240
263
241
-
Running this command produces the following new test, which is located at `tests/test_incremental_model.yaml`:
264
+
Running this creates the following new test, located at `tests/test_incremental_model.yaml`:
242
265
243
-
```yaml linenums="1" hl_lines="16-22"
266
+
```yaml linenums="1"
244
267
test_incremental_model:
245
268
model: sqlmesh_example.incremental_model
246
269
inputs:
@@ -270,7 +293,7 @@ test_incremental_model:
270
293
end: '2020-01-04'
271
294
```
272
295
273
-
As shown below, we now have two passing tests:
296
+
As shown below, we now have two passing tests. Hooray!
274
297
275
298
```bash
276
299
$ sqlmesh test
@@ -281,8 +304,6 @@ Ran 2 tests in 0.024s
281
304
OK
282
305
```
283
306
284
-
Note: since the `sqlmesh create_test` command executes queries directly in the target warehouse, the tables of the involved models must be built first, otherwise the queries will fail.
285
-
286
307
## Running tests
287
308
288
309
### Automatic testing with plan
@@ -292,6 +313,7 @@ Tests run automatically every time a new [plan](plans.md) is created.
292
313
### Manual testing with the CLI
293
314
294
315
You can execute tests on demand using the `sqlmesh test` command as follows:
316
+
295
317
```bash
296
318
$ sqlmesh test
297
319
.
@@ -321,18 +343,18 @@ Ran 1 test in 0.012s
321
343
FAILED (failures=1)
322
344
```
323
345
324
-
Note: when there are many differing columns, the corresponding DataFrame will be truncated by default, but it can be fully rendered using the `-v` option (verbose) of the `sqlmesh test` command.
346
+
Note: when there are many differing columns, the corresponding DataFrame will be truncated by default, but it can be fully displayed using the `-v` (verbose) option of the `sqlmesh test` command.
325
347
326
348
### Testing for specific models
327
349
328
350
To run a specific model test, pass in the suite file name followed by `::` and the name of the test:
329
351
330
-
```
331
-
sqlmesh test tests/test_full_model.yaml::test_example_full_model
352
+
```bash
353
+
$ sqlmesh test tests/test_full_model.yaml::test_example_full_model
332
354
```
333
355
334
356
You can also run tests that match a pattern or substring using a glob pathname expansion syntax:
Copy file name to clipboardExpand all lines: docs/integrations/github.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -213,8 +213,7 @@ on:
213
213
- created
214
214
```
215
215
216
-
Note: `issue_comment`event will not work until this change in merged to your main branch.
217
-
Therefore to enable this you will need to make the change in a branch, merge, and then future branches will support the deploy command.
216
+
Note: the `issue_comment` event will not work until this change is merged into your main branch. Therefore, to enable this you will need to make the change in a branch, merge, and then future branches will support the deploy command.
218
217
219
218
### Desynchronized Production Code and Data Configuration
0 commit comments