Skip to content

Commit c73f3e5

Browse files
authored
Chore: update unit test docs, fix minor github ci/cd bot docs typos (#2327)
* Chore: update unit test docs, fix minor github ci/cd bot docs typos * First within -> under * Get rid of comma * Typo * PR feedback * Fixup
1 parent 9238997 commit c73f3e5

File tree

2 files changed

+52
-31
lines changed

2 files changed

+52
-31
lines changed

docs/concepts/tests.md

Lines changed: 51 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
# Testing
22

3-
Testing allows you to protect your project from regression by continuously verifying the output of each model matches your expectations. Unlike [audits](audits.md), tests are executed either on demand (for example, as part of a CI/CD job) or every time a new [plan](plans.md) is created.
3+
Testing allows you to protect your project from regression by continuously verifying that the output of each model matches your expectations. Unlike [audits](audits.md), tests are executed either on demand (for example, as part of a CI/CD job) or every time a new [plan](plans.md) is created.
44

55
Similar to unit testing in software development, SQLMesh evaluates the model's logic against predefined inputs and then compares the output to expected outcomes provided as part of each test.
66

77
A comprehensive suite of tests can empower data practitioners to work with confidence, as it allows them to ensure models behave as expected after changes have been applied to them.
88

99
## Creating tests
1010

11-
Test suites are defined using YAML format within `.yaml` files in the `tests/` folder of your SQLMesh project. Each test within a suite file contains the following attributes:
11+
Test suites are defined using the YAML format. Each suite is a file whose name must begin with `test`, end in either `.yaml` or `.yml`, and is stored under the `tests/` folder of your SQLMesh project.
12+
13+
Tests within a suite file contain the following attributes:
1214

1315
* The unique name of a test
1416
* The name of the model targeted by this test
15-
* Test inputs, which are defined per external table or upstream model referenced by the target model. Each test input consists of the following:
17+
* Test inputs, which are defined per upstream model or external table referenced by the target model. Each test input consists of the following:
1618
* The name of an upstream model or external table
1719
* The list of rows defined as a mapping from a column name to a value associated with it
1820
* Expected outputs, which are defined as follows:
@@ -21,8 +23,6 @@ Test suites are defined using YAML format within `.yaml` files in the `tests/` f
2123
* [Optional] The dictionary of values for macro variables that will be set during model testing
2224
* There are three special macros that can be overridden, `start`, `end`, and `execution_time`. Overriding each will allow you to override the date macros in your SQL queries. For example, setting execution_time: 2022-01-01 -> execution_ds in your queries.
2325

24-
A column may be omitted from a row (either input or output), in which case it will be implicitly added with the value `NULL`. For example, this can be useful when specifying input data for wide tables where some columns may not be required to define a test.
25-
2626
The YAML format is defined as follows:
2727

2828
```yaml linenums="1"
@@ -47,7 +47,7 @@ The YAML format is defined as follows:
4747
<macro_variable_name>: <macro_variable_value>
4848
```
4949
50-
Note: the `rows` key is optional in the above format, so the following would also be valid:
50+
The `rows` key is optional in the above format, so the following would also be valid:
5151

5252
```
5353
<unique_test_name>:
@@ -58,6 +58,26 @@ Note: the `rows` key is optional in the above format, so the following would als
5858
...
5959
```
6060
61+
### Omitting Columns
62+
63+
Defining the complete inputs and outputs for wide tables, i.e. tables with many columns, can become cumbersome. Therefore, if certain columns can be safely ignored they may be omitted from any row and their value will be treated as `NULL` for that row.
64+
65+
Additionally, it's possible to test only a subset of the output columns by setting `partial` to `true` for the rows of interest:
66+
67+
```yaml linenums="1"
68+
...
69+
outputs:
70+
query:
71+
partial: true
72+
rows:
73+
- <column_name>: <column_value>
74+
...
75+
```
76+
77+
This is useful when we can't treat the missing columns as `NULL`, but still want to ignore them.
78+
79+
When `partial` is set, the rows need to be defined as a mapping under the `rows` key and the tested columns are only those that are referenced in them.
80+
6181
### Example
6282

6383
In this example, we'll use the `sqlmesh_example.full_model` model, which is provided as part of the `sqlmesh init` command and defined as follows:
@@ -108,9 +128,9 @@ test_example_full_model:
108128
num_orders: 1
109129
```
110130
111-
Note that `ds` is redundant in the above test, since it is not referenced in `full_model`, so it may be omitted.
131+
The `ds` column is not needed in the above test, since it is not referenced in `full_model`, so it may be omitted.
112132

113-
Let's also assume that we are only interested in testing the `num_orders` output column, i.e. we only care about the `id` input column of `sqlmesh_example.incremental_model`. Then, we could rewrite the above test more compactly as follows:
133+
If we were only interested in testing the `num_orders` column, we could only specify input values for the `id` column of `sqlmesh_example.incremental_model`, thus rewriting the above test more compactly as follows:
114134

115135
```yaml linenums="1"
116136
test_example_full_model:
@@ -127,7 +147,7 @@ test_example_full_model:
127147
- num_orders: 3
128148
```
129149

130-
Leaving out the input column `item_id` means that it will be implicitly added in all input rows with a `NULL` value. Thus, we expect the corresponding output column to only contain `NULL` values, which is indeed reflected in the above test since the `item_id` column is also omitted from `query`'s rows.
150+
Since [omitted columns](#omitting-columns) are treated as `NULL`, this test also implicitly asserts that both the input and the output `item_id` columns are `NULL`, which is correct.
131151

132152
### Testing CTEs
133153

@@ -185,11 +205,13 @@ test_example_full_model:
185205

186206
## Automatic test generation
187207

188-
Creating tests manually is a cumbersome and, ironically, error-prone process, especially as the number of rows and columns of the involved models grows. To address this, SQLMesh provides the [`create_test` command](../reference/cli.md#create_test), which can be used to automatically create tests for a given model.
208+
Creating tests manually can be repetitive and error-prone, which is why SQLMesh also provides a way to automate this process using the [`create_test` command](../reference/cli.md#create_test).
209+
210+
This command can generate a complete test for a given model, as long as the tables of the upstream models it references exist in the project's data warehouse and are already populated with data.
189211

190212
### Example
191213

192-
Since we already have a test for `sqlmesh_example.full_model`, in this example we'll show how to generate a test for `sqlmesh_example.incremental_model`, which is provided as part of the `sqlmesh init` command and defined as follows:
214+
In this example, we'll show how to generate a test for `sqlmesh_example.incremental_model`, which is another model provided as part of the `sqlmesh init` command and defined as follows:
193215

194216
```sql linenums="1"
195217
MODEL (
@@ -213,34 +235,35 @@ WHERE
213235
214236
```
215237

216-
As one may expect, we need to start by specifying what the input data are for `sqlmesh_example.seed_model`. The `create_test` command achieves this by executing a user-supplied query against the target warehouse of the SQLMesh project to produce the input rows of the aforementioned model.
238+
Firstly, we need to specify the input data for the upstream model `sqlmesh_example.seed_model`. The `create_test` command starts by executing a user-supplied query against the project's data warehouse and uses the returned data to produce the test's input rows.
217239

218-
Let's assume that we're only interested in specifying three input rows for `sqlmesh_example.seed_model`. One way to do that is by executing the following query:
240+
For instance, the following query will return three rows from the table corresponding to the model `sqlmesh_example.seed_model`:
219241

220242
```sql linenums="1"
221243
SELECT * FROM sqlmesh_example.seed_model LIMIT 3
222244
```
223245

224-
However, notice that `sqlmesh_example.incremental_model` also contains a filter which references the `@start_ds` and `@end_ds` [macro variables](macros/macro_variables.md). To ensure that the produced test will always pass, we modify the above query to constrain the value range of the `ds` column:
246+
Next, notice that `sqlmesh_example.incremental_model` contains a filter which references the `@start_ds` and `@end_ds` [macro variables](macros/macro_variables.md).
247+
248+
To make the generated test deterministic and thus ensure that it will always succeed, we need to define these variables and modify the above query to constrain `ds` accordingly.
249+
250+
If we set `@start_ds` to `'2020-01-01'` and `@end_ds` to `'2020-01-04'`, the above query needs to be changed to:
225251

226252
```sql linenums="1"
227-
-- The dates '2020-01-01' and '2020-01-04' have been picked arbitrarily
228253
SELECT * FROM sqlmesh_example.seed_model WHERE ds BETWEEN '2020-01-01' AND '2020-01-04' LIMIT 3
229254
```
230255

231-
We will also define these variables in the test, so that the filter of `sqlmesh_example.incremental_model` matches that range after it's been rendered.
256+
Finally, combining this query with the proper macro variable definitions, we can compute the expected output for the model's query in order to generate the complete test.
232257

233-
Finally, we don't have to specify the output `query` attribute, since we can compute its values given the input data produced by the above query.
234-
235-
The following command captures all of the above:
258+
This can be achieved using the following command:
236259

237260
```bash
238261
$ sqlmesh create_test sqlmesh_example.incremental_model --query sqlmesh_example.seed_model "select * from sqlmesh_example.seed_model where ds between '2020-01-01' and '2020-01-04' limit 3" --var start '2020-01-01' --var end '2020-01-04'
239262
```
240263

241-
Running this command produces the following new test, which is located at `tests/test_incremental_model.yaml`:
264+
Running this creates the following new test, located at `tests/test_incremental_model.yaml`:
242265

243-
```yaml linenums="1" hl_lines="16-22"
266+
```yaml linenums="1"
244267
test_incremental_model:
245268
model: sqlmesh_example.incremental_model
246269
inputs:
@@ -270,7 +293,7 @@ test_incremental_model:
270293
end: '2020-01-04'
271294
```
272295

273-
As shown below, we now have two passing tests:
296+
As shown below, we now have two passing tests. Hooray!
274297

275298
```bash
276299
$ sqlmesh test
@@ -281,8 +304,6 @@ Ran 2 tests in 0.024s
281304
OK
282305
```
283306

284-
Note: since the `sqlmesh create_test` command executes queries directly in the target warehouse, the tables of the involved models must be built first, otherwise the queries will fail.
285-
286307
## Running tests
287308

288309
### Automatic testing with plan
@@ -292,6 +313,7 @@ Tests run automatically every time a new [plan](plans.md) is created.
292313
### Manual testing with the CLI
293314

294315
You can execute tests on demand using the `sqlmesh test` command as follows:
316+
295317
```bash
296318
$ sqlmesh test
297319
.
@@ -321,18 +343,18 @@ Ran 1 test in 0.012s
321343
FAILED (failures=1)
322344
```
323345

324-
Note: when there are many differing columns, the corresponding DataFrame will be truncated by default, but it can be fully rendered using the `-v` option (verbose) of the `sqlmesh test` command.
346+
Note: when there are many differing columns, the corresponding DataFrame will be truncated by default, but it can be fully displayed using the `-v` (verbose) option of the `sqlmesh test` command.
325347

326348
### Testing for specific models
327349

328350
To run a specific model test, pass in the suite file name followed by `::` and the name of the test:
329351

330-
```
331-
sqlmesh test tests/test_full_model.yaml::test_example_full_model
352+
```bash
353+
$ sqlmesh test tests/test_full_model.yaml::test_example_full_model
332354
```
333355

334356
You can also run tests that match a pattern or substring using a glob pathname expansion syntax:
335357

336-
```
337-
sqlmesh test tests/test_*
358+
```bash
359+
$ sqlmesh test tests/test_*
338360
```

docs/integrations/github.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -213,8 +213,7 @@ on:
213213
- created
214214
```
215215

216-
Note: `issue_comment` event will not work until this change in merged to your main branch.
217-
Therefore to enable this you will need to make the change in a branch, merge, and then future branches will support the deploy command.
216+
Note: the `issue_comment` event will not work until this change is merged into your main branch. Therefore, to enable this you will need to make the change in a branch, merge, and then future branches will support the deploy command.
218217

219218
### Desynchronized Production Code and Data Configuration
220219

0 commit comments

Comments
 (0)