Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
395 changes: 0 additions & 395 deletions docs/data-quality-checks/data-diff-check.md

This file was deleted.

82 changes: 82 additions & 0 deletions docs/data-quality-checks/data-diff/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# :material-api:{ .middle style="color: var(--q-brick)" } Data Diff Check API

The Data Diff check is created and managed through the standard Quality Checks API by setting `rule` to `dataDiff` and listing the compared fields under `fields`. The reference container, Row Identifiers, Passthrough Fields, Comparators, and `diff_change_types` are all configured through the `properties` object.

!!! tip
For complete API documentation, including request and response schemas, visit the [API docs](https://demo.qualytics.io/api/docs){:target="_blank"}.

## Endpoints

| Method | Path | Purpose |
|:---|:---|:---|
| `POST` | `/api/quality-checks` | Create a new Data Diff check. |
| `GET` | `/api/quality-checks/{id}` | Retrieve a Data Diff check by ID. |
| `PUT` | `/api/quality-checks/{id}` | Update an existing Data Diff check. |
| `DELETE` | `/api/quality-checks/{id}` | Archive a Data Diff check (soft delete). The check stops being evaluated by Scans and can be restored from the archive view. |

!!! note "What `PUT` can change"
**Editable:** `description`, `fields`, `filter`, `tags`, `additional_metadata`, `anomaly_message_field`, `status`, `owner_id`, `default_anomaly_assignee_id`, and the `properties` keys `ref_datastore_id`, `ref_container_id`, `id_field_names`, `passthrough_field_names`, `diff_change_types`, `numeric_comparator`, `duration_comparator`, `string_comparator`.

**Immutable:** `rule`, `container_id`, `template_id`. To change any of these, delete the check and create a new one.

**Permission**: Author team permission (or above) on the target container's team for `POST`, `PUT`, and `DELETE`; Reporter team permission (or above) for `GET`.

## Payload Example

Create a Data Diff check that compares `N_NATIONKEY` and `N_NATIONNAME` between `NATION` and `NATION_BACKUP`, matched by `N_NATIONKEY`, with `POST /api/quality-checks`. The payload below sets `diff_change_types` to `["removed", "changed"]` so unmatched reference rows are not reported as `added` anomalies, a typical choice when the reference is a superset of the target (such as a long-lived backup).

```json
{
"description": "Ensure NATION matches NATION_BACKUP on N_NATIONKEY and N_NATIONNAME",
"rule": "dataDiff",
"fields": ["N_NATIONKEY", "N_NATIONNAME"],
"container_id": 145,
"filter": null,
"properties": {
"ref_datastore_id": 22,
"ref_container_id": 803,
"id_field_names": ["N_NATIONKEY"],
"passthrough_field_names": [],
"diff_change_types": ["removed", "changed"]
},
"tags": ["replication"],
"additional_metadata": {"jira": "DATA-1234"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 12
}
```

## Field Notes

| Field | Required | Notes |
|:---|:---:|:---|
| `description` | Yes | Free-text description shown in the UI. |
| `rule` | Yes | Must be `"dataDiff"`. |
| `fields` | Yes | Array of field names to compare between target and reference. Order does not affect evaluation. |
| `container_id` | Yes | ID of the target container (the dataset the check runs on). |
| `filter` | No | Spark SQL `WHERE` expression applied to the **target** container before matching. Send `null` for no filter. The reference container is always read in full. |
| `properties.ref_datastore_id` | Yes | ID of the datastore that holds the reference container. |
| `properties.ref_container_id` | Yes | ID of the reference container (table, view, or file) to compare against. |
| `properties.id_field_names` | No | Array of field names that form the compound key used to match target rows to reference rows. Required to produce `changed` diffs and to enable the Comparison Source Records view. Omit (or `[]`) to fall back to a symmetrical set difference that produces only `added`/`removed`. |
| `properties.passthrough_field_names` | No | Array of extra field names carried into the source-records output for context. Passthrough fields appear alongside diffed fields but are never themselves a reason for the anomaly to fire. |
| `properties.diff_change_types` | No | Subset of `["added", "removed", "changed"]` that restricts which diff statuses produce an anomaly. Defaults to all three when omitted. An empty list is rejected with HTTP 422; at least one status must be selected. Sending this property on an `isReplicaOf` check is also rejected. See [How It Works → Restricting Anomalies by Status](how-it-works.md#restricting-anomalies-by-status){:target="_blank"}. |
| `properties.numeric_comparator` | No | Numeric Comparator tolerance object. See [How It Works → Comparators](how-it-works.md#comparators){:target="_blank"}. |
| `properties.duration_comparator` | No | Duration Comparator tolerance object. See [How It Works → Comparators](how-it-works.md#comparators){:target="_blank"}. |
| `properties.string_comparator` | No | String Comparator tolerance object. See [How It Works → Comparators](how-it-works.md#comparators){:target="_blank"}. |
| `tags` | No | List of tag names applied to the check for filtering and organization. |
| `additional_metadata` | No | Free-form key-value pairs (typically links to catalog, tickets, governance records). |
| `anomaly_message_field` | No | **Not applicable to Data Diff.** Data Diff emits only Shape Anomalies, which use a fixed message template, so this field is silently ignored at evaluation. Send `null`. |
| `template_id` | No | ID of a Check Template to associate the check with. `null` if not using a template. |
| `status` | No | `"Active"` (default) or `"Draft"`. Draft checks are not evaluated by Scans. |
| `owner_id` | No | ID of the user who owns the check. Defaults to the user creating the check when omitted. |
| `default_anomaly_assignee_id` | No | ID of the user automatically assigned to anomalies produced by the check. When omitted, anomalies are created unassigned and must be triaged manually. |

## Related

- [Introduction](introduction.md){:target="_blank"}: formal definition, field scope, and general/anomaly properties.
- [How It Works](how-it-works.md){:target="_blank"}: full semantics, Row Identifiers, Comparators, and edge cases.
- [Examples](examples.md){:target="_blank"}: three production scenarios with sample data and resulting anomalies.
- [FAQ](faq.md){:target="_blank"}: short answers to the most frequent questions.
Loading
Loading