diff --git a/docs/anomalies/deep-dive/fingerprints.md b/docs/anomalies/deep-dive/fingerprints.md index 404d67a7b8..38833de481 100644 --- a/docs/anomalies/deep-dive/fingerprints.md +++ b/docs/anomalies/deep-dive/fingerprints.md @@ -69,7 +69,7 @@ The lack of a persistent identifier means **Qualytics** cannot distinguish betwe To handle recurring anomalies in truncate-and-reload tables, configure your scan to use fingerprint-based duplicate handling. -Follow the steps in the [scan operation configuration](../../operations/scan/scan.md#configuration) to reach the correct settings. Then, under **Step 8 β†’ Scan Settings**, open the [anomaly options section](https://userguide.qualytics.io/source-datastore/scan/#configuration:~:text=Step%208%3A%20Configure%20the%20Scan%20Settings) and enable both duplicate-handling options: +Follow the steps in the [scan operation configuration](../../operations/scan/how-tos/scan-settings.md){:target="_blank"} to reach the correct settings. Then, under **Step 8 β†’ Scan Settings**, open the **Anomaly Options** section and enable both duplicate-handling options: - **Archive Duplicate Anomalies:** When the same 127 anomalies appear again after the table reload, Qualytics recognizes their fingerprints and automatically marks them as duplicates rather than new anomalies. - **Reactivate Recurring Anomalies:** If an anomaly was previously archived or resolved but reappears in subsequent scans, Qualytics reactivates the original anomaly record, maintaining full historical context. @@ -88,4 +88,4 @@ Enable these settings in Scan Settings of your Scan Operation: - **Archive Duplicate Anomalies** - **Reactivate Recurring Anomalies** -Set an appropriate **Anomaly Rollup Threshold** based on your data volume and tolerance for grouped anomalies. \ No newline at end of file +Set an appropriate **Maximum Record Anomalies per Check** based on your data volume and tolerance for grouped anomalies. \ No newline at end of file diff --git a/docs/anomalies/deep-dive/source-record.md b/docs/anomalies/deep-dive/source-record.md index 35e6c584ff..60057b2dcc 100644 --- a/docs/anomalies/deep-dive/source-record.md +++ b/docs/anomalies/deep-dive/source-record.md @@ -17,7 +17,7 @@ If the Anomaly Type is **Record**, the highlighted row(s) that failed the checks ## Source Record Visualization -The number of source records displayed per anomaly is determined by the **Maximum Source Examples per Anomaly** setting, which can be configured during [scan setup](../../operations/scan/scan.md#configuration){:target="_blank"}. The available limits are 10, 100, 1,000, or 10,000 records. The interface includes sticky headers that remain visible when scrolling through large datasets, making navigation easier during data review. +The number of source records displayed per anomaly is determined by the **Maximum Source Examples per Anomaly** setting, which can be configured during [scan setup](../../operations/scan/how-tos/scan-settings.md){:target="_blank"}. The available limits are 10, 100, 1,000, or 10,000 records. The interface includes sticky headers that remain visible when scrolling through large datasets, making navigation easier during data review. ![visualization](../../assets/anomalies/deep-dive/source-record/visualization.png) @@ -78,7 +78,7 @@ Click the **Download :material-download:** button to export all source records a ![download](../../assets/anomalies/deep-dive/source-record/download.png) !!! note - The download includes only the records that were captured during the scan. The number of available records depends on the **Maximum Source Examples per Anomaly** setting, configured in the [scan settings](../../operations/scan/scan.md#configuration){:target="_blank"}. If you need more records, increase the limit and re-run the scan. + The download includes only the records that were captured during the scan. The number of available records depends on the **Maximum Source Examples per Anomaly** setting, configured in the [scan settings](../../operations/scan/how-tos/scan-settings.md){:target="_blank"}. If you need more records, increase the limit and re-run the scan. ## Masked Fields in Source Records diff --git a/docs/anomalies/deep-dive/types.md b/docs/anomalies/deep-dive/types.md index 8f794e6031..f804ffc39d 100644 --- a/docs/anomalies/deep-dive/types.md +++ b/docs/anomalies/deep-dive/types.md @@ -47,6 +47,9 @@ A shape anomaly identifies an anomalous structure within the analyzed data. The !!! note Sometimes, shape anomalies only affect a subset of the dataset. This means that only certain rows exhibit the structural issue, rather than the entire dataset. +!!! note "Shape anomalies from the rollup threshold" + Shape anomalies can also be created when the number of individual record anomalies for a single check exceeds the **Maximum Record Anomalies per Check** threshold. When this happens, remaining violations are consolidated into a single rolled-up shape anomaly that preserves the total violation count, rather than producing one record anomaly per violation. The threshold is configurable in the scan Advanced Options; see [Scan Settings](../operations/scan/how-tos/scan-settings.md#anomaly-options). + ## Example Use Case **Scenario** diff --git a/docs/anomalies/detection.md b/docs/anomalies/detection.md index 8b03b8d21f..3285af246a 100644 --- a/docs/anomalies/detection.md +++ b/docs/anomalies/detection.md @@ -38,7 +38,7 @@ Authored checks can range from simple, template-based checks to more complex rul The Scan operation asserts rigorous quality checks to identify any anomalies within the data. This step ensures data integrity and reliability by recording the analyzed data in your configured enrichment datastore, facilitating continuous data quality improvement. !!! note - For more information, please refer to the documentation [Scan Operation](../operations/scan/scan.md). + For more information, please refer to the documentation [Scan Operation](../operations/scan/getting-started.md). **6. Anomaly Analysis** diff --git a/docs/anomalies/faq.md b/docs/anomalies/faq.md index 526d62f1c7..f5cbcd5789 100644 --- a/docs/anomalies/faq.md +++ b/docs/anomalies/faq.md @@ -102,7 +102,7 @@ Source records are cached locally for up to **8 hours** to keep the UI responsiv #### How many source records can I download? -The CSV download includes every record that was captured during the scan, up to the **maximum source records per anomaly** configured in your [Scan settings](../operations/scan/scan.md#configuration){:target="_blank"}. The CSV is capped at 250 MB. +The CSV download includes every record that was captured during the scan, up to the **maximum source records per anomaly** configured in your [Scan settings](../operations/scan/how-tos/scan-settings.md){:target="_blank"}. The CSV is capped at 250 MB. #### Can I see the raw value of a masked field? diff --git a/docs/assets/operations/runs/by-types/scan/aborted-detail-checks.png b/docs/assets/operations/runs/by-types/scan/aborted-detail-checks.png new file mode 100644 index 0000000000..c09abccc03 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-detail-checks.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-detail-overview.png b/docs/assets/operations/runs/by-types/scan/aborted-detail-overview.png new file mode 100644 index 0000000000..475b397fb3 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-detail-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-detail-partitions.png b/docs/assets/operations/runs/by-types/scan/aborted-detail-partitions.png new file mode 100644 index 0000000000..f030793b6f Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-detail-partitions.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-detail-results.png b/docs/assets/operations/runs/by-types/scan/aborted-detail-results.png new file mode 100644 index 0000000000..9b8367f157 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-detail-results.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-detail-summary.png b/docs/assets/operations/runs/by-types/scan/aborted-detail-summary.png new file mode 100644 index 0000000000..960a210103 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-detail-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-row-overview.png b/docs/assets/operations/runs/by-types/scan/aborted-row-overview.png new file mode 100644 index 0000000000..2a89ad28f4 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-row-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-row-settings.png b/docs/assets/operations/runs/by-types/scan/aborted-row-settings.png new file mode 100644 index 0000000000..0a34987c07 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-row-settings.png differ diff --git a/docs/assets/operations/runs/by-types/scan/aborted-row-summary.png b/docs/assets/operations/runs/by-types/scan/aborted-row-summary.png new file mode 100644 index 0000000000..c457d3398f Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/aborted-row-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-detail-checks.png b/docs/assets/operations/runs/by-types/scan/failure-detail-checks.png new file mode 100644 index 0000000000..33097c6cb6 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-detail-checks.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-detail-overview.png b/docs/assets/operations/runs/by-types/scan/failure-detail-overview.png new file mode 100644 index 0000000000..1a29cbaacd Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-detail-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-detail-partitions.png b/docs/assets/operations/runs/by-types/scan/failure-detail-partitions.png new file mode 100644 index 0000000000..9b8f27ef06 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-detail-partitions.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-detail-results.png b/docs/assets/operations/runs/by-types/scan/failure-detail-results.png new file mode 100644 index 0000000000..cb67329f11 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-detail-results.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-detail-summary.png b/docs/assets/operations/runs/by-types/scan/failure-detail-summary.png new file mode 100644 index 0000000000..37bbc990b1 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-detail-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-row-overview.png b/docs/assets/operations/runs/by-types/scan/failure-row-overview.png new file mode 100644 index 0000000000..72a458c05b Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-row-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-row-settings.png b/docs/assets/operations/runs/by-types/scan/failure-row-settings.png new file mode 100644 index 0000000000..187f9cf33b Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-row-settings.png differ diff --git a/docs/assets/operations/runs/by-types/scan/failure-row-summary.png b/docs/assets/operations/runs/by-types/scan/failure-row-summary.png new file mode 100644 index 0000000000..36d53f1721 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/failure-row-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-detail-checks.png b/docs/assets/operations/runs/by-types/scan/running-detail-checks.png new file mode 100644 index 0000000000..6672820e60 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-detail-checks.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-detail-overview.png b/docs/assets/operations/runs/by-types/scan/running-detail-overview.png new file mode 100644 index 0000000000..43944fb16f Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-detail-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-detail-partitions.png b/docs/assets/operations/runs/by-types/scan/running-detail-partitions.png new file mode 100644 index 0000000000..ff9ce7aba0 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-detail-partitions.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-detail-results.png b/docs/assets/operations/runs/by-types/scan/running-detail-results.png new file mode 100644 index 0000000000..28deb7e2a8 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-detail-results.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-detail-summary.png b/docs/assets/operations/runs/by-types/scan/running-detail-summary.png new file mode 100644 index 0000000000..7954ca28bc Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-detail-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-row-overview.png b/docs/assets/operations/runs/by-types/scan/running-row-overview.png new file mode 100644 index 0000000000..66b2fdd967 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-row-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-row-settings.png b/docs/assets/operations/runs/by-types/scan/running-row-settings.png new file mode 100644 index 0000000000..fa9ce3147c Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-row-settings.png differ diff --git a/docs/assets/operations/runs/by-types/scan/running-row-summary.png b/docs/assets/operations/runs/by-types/scan/running-row-summary.png new file mode 100644 index 0000000000..954b8c61a2 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/running-row-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-detail-checks.png b/docs/assets/operations/runs/by-types/scan/success-detail-checks.png new file mode 100644 index 0000000000..f2b599f9f1 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-detail-checks.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-detail-overview.png b/docs/assets/operations/runs/by-types/scan/success-detail-overview.png new file mode 100644 index 0000000000..0fa27b10d6 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-detail-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-detail-partitions.png b/docs/assets/operations/runs/by-types/scan/success-detail-partitions.png new file mode 100644 index 0000000000..b19c85a59e Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-detail-partitions.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-detail-results.png b/docs/assets/operations/runs/by-types/scan/success-detail-results.png new file mode 100644 index 0000000000..69d7a676d1 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-detail-results.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-detail-summary.png b/docs/assets/operations/runs/by-types/scan/success-detail-summary.png new file mode 100644 index 0000000000..62e7b11c54 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-detail-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-row-overview.png b/docs/assets/operations/runs/by-types/scan/success-row-overview.png new file mode 100644 index 0000000000..9b7283b986 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-row-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-row-settings.png b/docs/assets/operations/runs/by-types/scan/success-row-settings.png new file mode 100644 index 0000000000..e8fb4f9591 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-row-settings.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-row-summary.png b/docs/assets/operations/runs/by-types/scan/success-row-summary.png new file mode 100644 index 0000000000..4b385c55c6 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-row-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-checks.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-checks.png new file mode 100644 index 0000000000..085cd524c7 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-checks.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-overview.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-overview.png new file mode 100644 index 0000000000..a5db0c90f1 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-partitions.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-partitions.png new file mode 100644 index 0000000000..c801cba4d9 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-partitions.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-results.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-results.png new file mode 100644 index 0000000000..98003ef120 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-results.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-source-records.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-source-records.png new file mode 100644 index 0000000000..8697f424c1 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-source-records.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-summary.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-summary.png new file mode 100644 index 0000000000..9b01a50dad Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-detail-summary.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-row-overview.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-overview.png new file mode 100644 index 0000000000..8f0b62f074 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-overview.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-row-settings.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-settings.png new file mode 100644 index 0000000000..64d51dd1e0 Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-settings.png differ diff --git a/docs/assets/operations/runs/by-types/scan/success-with-warning-row-summary.png b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-summary.png new file mode 100644 index 0000000000..5f2e873bdc Binary files /dev/null and b/docs/assets/operations/runs/by-types/scan/success-with-warning-row-summary.png differ diff --git a/docs/assets/operations/runs/getting-started/activity-tab-runs.png b/docs/assets/operations/runs/getting-started/activity-tab-runs.png new file mode 100644 index 0000000000..0942ec5c4d Binary files /dev/null and b/docs/assets/operations/runs/getting-started/activity-tab-runs.png differ diff --git a/docs/assets/operations/scan/getting-started/getting-started-1.png b/docs/assets/operations/scan/getting-started/getting-started-1.png new file mode 100644 index 0000000000..1018be71af Binary files /dev/null and b/docs/assets/operations/scan/getting-started/getting-started-1.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-1-strategy.png b/docs/assets/operations/scan/how-tos/read-settings/step-1-strategy.png new file mode 100644 index 0000000000..8f67fe9bac Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-1-strategy.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-2-starting-threshold.png b/docs/assets/operations/scan/how-tos/read-settings/step-2-starting-threshold.png new file mode 100644 index 0000000000..c4111c512c Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-2-starting-threshold.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-3-record-limit-custom.png b/docs/assets/operations/scan/how-tos/read-settings/step-3-record-limit-custom.png new file mode 100644 index 0000000000..eb6a7d7a52 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-3-record-limit-custom.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-4-record-limit-button.png b/docs/assets/operations/scan/how-tos/read-settings/step-4-record-limit-button.png new file mode 100644 index 0000000000..9a032115d4 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-4-record-limit-button.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-5-record-limit-menu.png b/docs/assets/operations/scan/how-tos/read-settings/step-5-record-limit-menu.png new file mode 100644 index 0000000000..82e2ab200b Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-5-record-limit-menu.png differ diff --git a/docs/assets/operations/scan/how-tos/read-settings/step-6-record-limit-selected.png b/docs/assets/operations/scan/how-tos/read-settings/step-6-record-limit-selected.png new file mode 100644 index 0000000000..6ffb8b9c34 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/read-settings/step-6-record-limit-selected.png differ diff --git a/docs/assets/operations/scan/how-tos/scan-settings/step-1-anomaly-options-full.png b/docs/assets/operations/scan/how-tos/scan-settings/step-1-anomaly-options-full.png new file mode 100644 index 0000000000..3a8800e25e Binary files /dev/null and b/docs/assets/operations/scan/how-tos/scan-settings/step-1-anomaly-options-full.png differ diff --git a/docs/assets/operations/scan/how-tos/scan-settings/step-2-anomaly-options-incremental.png b/docs/assets/operations/scan/how-tos/scan-settings/step-2-anomaly-options-incremental.png new file mode 100644 index 0000000000..5a15fa1fa3 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/scan-settings/step-2-anomaly-options-incremental.png differ diff --git a/docs/assets/operations/scan/how-tos/scan-settings/step-3-advanced-options-full.png b/docs/assets/operations/scan/how-tos/scan-settings/step-3-advanced-options-full.png new file mode 100644 index 0000000000..6fd60cabae Binary files /dev/null and b/docs/assets/operations/scan/how-tos/scan-settings/step-3-advanced-options-full.png differ diff --git a/docs/assets/operations/scan/how-tos/scan-settings/step-4-advanced-options-incremental.png b/docs/assets/operations/scan/how-tos/scan-settings/step-4-advanced-options-incremental.png new file mode 100644 index 0000000000..d199b80e96 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/scan-settings/step-4-advanced-options-incremental.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-1-form-fields.png b/docs/assets/operations/scan/how-tos/schedule-options/step-1-form-fields.png new file mode 100644 index 0000000000..66d69f7477 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-1-form-fields.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-2-hourly.png b/docs/assets/operations/scan/how-tos/schedule-options/step-2-hourly.png new file mode 100644 index 0000000000..23c97eb7d1 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-2-hourly.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-3-daily.png b/docs/assets/operations/scan/how-tos/schedule-options/step-3-daily.png new file mode 100644 index 0000000000..e04f8c5cfd Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-3-daily.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-4-weekly.png b/docs/assets/operations/scan/how-tos/schedule-options/step-4-weekly.png new file mode 100644 index 0000000000..f4e10143a4 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-4-weekly.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-5-monthly.png b/docs/assets/operations/scan/how-tos/schedule-options/step-5-monthly.png new file mode 100644 index 0000000000..da960985f7 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-5-monthly.png differ diff --git a/docs/assets/operations/scan/how-tos/schedule-options/step-6-advanced.png b/docs/assets/operations/scan/how-tos/schedule-options/step-6-advanced.png new file mode 100644 index 0000000000..b2ecd5beab Binary files /dev/null and b/docs/assets/operations/scan/how-tos/schedule-options/step-6-advanced.png differ diff --git a/docs/assets/operations/scan/how-tos/select-check-categories/step-1-categories.png b/docs/assets/operations/scan/how-tos/select-check-categories/step-1-categories.png new file mode 100644 index 0000000000..84b5058267 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/select-check-categories/step-1-categories.png differ diff --git a/docs/assets/operations/scan/how-tos/select-tables/step-1-all.png b/docs/assets/operations/scan/how-tos/select-tables/step-1-all.png new file mode 100644 index 0000000000..8f48f96962 Binary files /dev/null and b/docs/assets/operations/scan/how-tos/select-tables/step-1-all.png differ diff --git a/docs/assets/operations/scan/how-tos/select-tables/step-2-specific.png b/docs/assets/operations/scan/how-tos/select-tables/step-2-specific.png new file mode 100644 index 0000000000..0ebd62330d Binary files /dev/null and b/docs/assets/operations/scan/how-tos/select-tables/step-2-specific.png differ diff --git a/docs/assets/operations/scan/how-tos/select-tables/step-3-tag.png b/docs/assets/operations/scan/how-tos/select-tables/step-3-tag.png new file mode 100644 index 0000000000..9088321cae Binary files /dev/null and b/docs/assets/operations/scan/how-tos/select-tables/step-3-tag.png differ diff --git a/docs/assets/operations/scan/step-1-side-menu.png b/docs/assets/operations/scan/step-1-side-menu.png deleted file mode 100644 index 07b6580635..0000000000 Binary files a/docs/assets/operations/scan/step-1-side-menu.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-10-incremental.png b/docs/assets/operations/scan/step-10-incremental.png deleted file mode 100644 index d0e44ef8bc..0000000000 Binary files a/docs/assets/operations/scan/step-10-incremental.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-11-starting-threshold.png b/docs/assets/operations/scan/step-11-starting-threshold.png deleted file mode 100644 index ab1531458a..0000000000 Binary files a/docs/assets/operations/scan/step-11-starting-threshold.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-12-record-limit-line.png b/docs/assets/operations/scan/step-12-record-limit-line.png deleted file mode 100644 index d267ea022f..0000000000 Binary files a/docs/assets/operations/scan/step-12-record-limit-line.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-13-record-limit-options.png b/docs/assets/operations/scan/step-13-record-limit-options.png deleted file mode 100644 index b8ce34cb2f..0000000000 Binary files a/docs/assets/operations/scan/step-13-record-limit-options.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-14-next-button.png b/docs/assets/operations/scan/step-14-next-button.png deleted file mode 100644 index 19dff994f8..0000000000 Binary files a/docs/assets/operations/scan/step-14-next-button.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-15-anomaly-option.png b/docs/assets/operations/scan/step-15-anomaly-option.png deleted file mode 100644 index e178d0154f..0000000000 Binary files a/docs/assets/operations/scan/step-15-anomaly-option.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-16-anomalyy.png b/docs/assets/operations/scan/step-16-anomalyy.png deleted file mode 100644 index 3ad1eb1504..0000000000 Binary files a/docs/assets/operations/scan/step-16-anomalyy.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-17-source-record-limit.png b/docs/assets/operations/scan/step-17-source-record-limit.png deleted file mode 100644 index e41fb3c3ab..0000000000 Binary files a/docs/assets/operations/scan/step-17-source-record-limit.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-18-record-limit.png b/docs/assets/operations/scan/step-18-record-limit.png deleted file mode 100644 index e234031c15..0000000000 Binary files a/docs/assets/operations/scan/step-18-record-limit.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-19-run-now.png b/docs/assets/operations/scan/step-19-run-now.png deleted file mode 100644 index 9ff257827b..0000000000 Binary files a/docs/assets/operations/scan/step-19-run-now.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-2-details-page.png b/docs/assets/operations/scan/step-2-details-page.png deleted file mode 100644 index 901b59c052..0000000000 Binary files a/docs/assets/operations/scan/step-2-details-page.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-20-click-schedule.png b/docs/assets/operations/scan/step-20-click-schedule.png deleted file mode 100644 index f0a2023fa2..0000000000 Binary files a/docs/assets/operations/scan/step-20-click-schedule.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-21-hourly.png b/docs/assets/operations/scan/step-21-hourly.png deleted file mode 100644 index 6c07a7fabb..0000000000 Binary files a/docs/assets/operations/scan/step-21-hourly.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-22-daily.png b/docs/assets/operations/scan/step-22-daily.png deleted file mode 100644 index f31651e0bc..0000000000 Binary files a/docs/assets/operations/scan/step-22-daily.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-23-weekly.png b/docs/assets/operations/scan/step-23-weekly.png deleted file mode 100644 index 8a4e690b7f..0000000000 Binary files a/docs/assets/operations/scan/step-23-weekly.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-24-monthly.png b/docs/assets/operations/scan/step-24-monthly.png deleted file mode 100644 index 298a326c1d..0000000000 Binary files a/docs/assets/operations/scan/step-24-monthly.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-25-advanced.png b/docs/assets/operations/scan/step-25-advanced.png deleted file mode 100644 index d9dc7381e7..0000000000 Binary files a/docs/assets/operations/scan/step-25-advanced.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-26-schedule-name.png b/docs/assets/operations/scan/step-26-schedule-name.png deleted file mode 100644 index 123e5a1522..0000000000 Binary files a/docs/assets/operations/scan/step-26-schedule-name.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-27-schedule.png b/docs/assets/operations/scan/step-27-schedule.png deleted file mode 100644 index 1779e7ace9..0000000000 Binary files a/docs/assets/operations/scan/step-27-schedule.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-28-activity-operation.png b/docs/assets/operations/scan/step-28-activity-operation.png deleted file mode 100644 index 77ce1f6782..0000000000 Binary files a/docs/assets/operations/scan/step-28-activity-operation.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-29-activity.png b/docs/assets/operations/scan/step-29-activity.png deleted file mode 100644 index 8f5e1840c3..0000000000 Binary files a/docs/assets/operations/scan/step-29-activity.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-3-run.png b/docs/assets/operations/scan/step-3-run.png deleted file mode 100644 index 1811982e47..0000000000 Binary files a/docs/assets/operations/scan/step-3-run.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-30-running.png b/docs/assets/operations/scan/step-30-running.png deleted file mode 100644 index cfd2a1cb28..0000000000 Binary files a/docs/assets/operations/scan/step-30-running.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-31-aborted-operation.png b/docs/assets/operations/scan/step-31-aborted-operation.png deleted file mode 100644 index 0d221ff54e..0000000000 Binary files a/docs/assets/operations/scan/step-31-aborted-operation.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-32-warning.png b/docs/assets/operations/scan/step-32-warning.png deleted file mode 100644 index 55705f7d77..0000000000 Binary files a/docs/assets/operations/scan/step-32-warning.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-33-success.png b/docs/assets/operations/scan/step-33-success.png deleted file mode 100644 index 97d7b05c32..0000000000 Binary files a/docs/assets/operations/scan/step-33-success.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-34-records-scan-operation.png b/docs/assets/operations/scan/step-34-records-scan-operation.png deleted file mode 100644 index d944dfe949..0000000000 Binary files a/docs/assets/operations/scan/step-34-records-scan-operation.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-35-result-scan-operation.png b/docs/assets/operations/scan/step-35-result-scan-operation.png deleted file mode 100644 index a295cb757b..0000000000 Binary files a/docs/assets/operations/scan/step-35-result-scan-operation.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-36-result.png b/docs/assets/operations/scan/step-36-result.png deleted file mode 100644 index ccdc0d837e..0000000000 Binary files a/docs/assets/operations/scan/step-36-result.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-37-drop-down.png b/docs/assets/operations/scan/step-37-drop-down.png deleted file mode 100644 index 55c5842011..0000000000 Binary files a/docs/assets/operations/scan/step-37-drop-down.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-4-all-operation.png b/docs/assets/operations/scan/step-4-all-operation.png deleted file mode 100644 index dceb1f6238..0000000000 Binary files a/docs/assets/operations/scan/step-4-all-operation.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-5-specific.png b/docs/assets/operations/scan/step-5-specific.png deleted file mode 100644 index 8fc50e6429..0000000000 Binary files a/docs/assets/operations/scan/step-5-specific.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-6-tag.png b/docs/assets/operations/scan/step-6-tag.png deleted file mode 100644 index 0c1d9a48cd..0000000000 Binary files a/docs/assets/operations/scan/step-6-tag.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-7-next.png b/docs/assets/operations/scan/step-7-next.png deleted file mode 100644 index b21d599deb..0000000000 Binary files a/docs/assets/operations/scan/step-7-next.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-8-select-check.png b/docs/assets/operations/scan/step-8-select-check.png deleted file mode 100644 index 61ad2e8c9e..0000000000 Binary files a/docs/assets/operations/scan/step-8-select-check.png and /dev/null differ diff --git a/docs/assets/operations/scan/step-9-nextt.png b/docs/assets/operations/scan/step-9-nextt.png deleted file mode 100644 index 906da0e830..0000000000 Binary files a/docs/assets/operations/scan/step-9-nextt.png and /dev/null differ diff --git a/docs/container/actions-on-container.md b/docs/container/actions-on-container.md index bbc0316c76..d53ea1c381 100644 --- a/docs/container/actions-on-container.md +++ b/docs/container/actions-on-container.md @@ -40,5 +40,5 @@ The **Run** button provides options to execute operations on datasets, such as p | No. | Options | Description | | :---- | :---- | :---- | | **1.** | Profile | **Profile** allows you to run a profiling operation to analyze the data structure, gather metadata, set thresholds, and define record limits for comprehensive dataset profiling.
**Note:** For profile operation, please refer to the [Profile Operation documentation](../operations/profile/profile.md){target="_blank"}. | -| **2.** | Scan | **Scan** allows you to perform data quality checks, configure scan strategies, and detect anomalies in the dataset.
**Note:** For scan operation, please refer to the [Scan Operation documentation](../operations/scan/scan.md){target="_blank"}. | +| **2.** | Scan | **Scan** allows you to perform data quality checks, configure scan strategies, and detect anomalies in the dataset.
**Note:** For scan operation, please refer to the [Scan Operation documentation](../operations/scan/getting-started.md){target="_blank"}. | | **3.** | External Scan | **External Scan** allows you to upload a file and validate its data against predefined checks in the selected table.
**Note:** For external scan, please refer to the [ External Scan documentation](../operations/external-scan/external-scan.md){target="_blank"}. | \ No newline at end of file diff --git a/docs/container/manage-tables-and-files/run.md b/docs/container/manage-tables-and-files/run.md index aaa46269f6..ffb5a9406c 100644 --- a/docs/container/manage-tables-and-files/run.md +++ b/docs/container/manage-tables-and-files/run.md @@ -16,4 +16,4 @@ Under **Run**, choose the type of operation you want to perform: To understand how a profile operation is performed, you can follow the remaining steps from the documentation [Profile Operation.](../../operations/profile/profile.md#configuration){target="_blank"}. -To understand how a scan operation is performed, you can follow the remaining steps from the documentation [Scan Operation.](../../operations/scan/scan.md#configuration){target="_blank"}. +To understand how a scan operation is performed, you can follow the remaining steps from the documentation [Scan Operation.](../../operations/scan/getting-started.md){target="_blank"}. diff --git a/docs/container/overview.md b/docs/container/overview.md index 636dfec35f..13e61ab220 100644 --- a/docs/container/overview.md +++ b/docs/container/overview.md @@ -57,7 +57,7 @@ Totals are calculated from sampled data, not the full dataset. Values may differ After a Sync operation, each container is assigned a status (**Available**, **Changed**, **Inaccessible**, or **Unloadable**) based on its current state in the datastore. !!! note - For the full list and definitions, refer to [Container Statuses](../operations/sync/sync.md#container-statuses){target="_blank"} on the Sync Operation page. If a container is marked **Unloadable** after 3 consecutive scan or profile failures, see [Unloadable Container Error](../operations/scan/scan.md#unloadable-container-error){target="_blank"} for resolution steps (run a Sync for tables, views, or file patterns; for computed assets, force an edit on the asset and re-save it with Validate then Save). + For the full list and definitions, refer to [Container Statuses](../operations/sync/sync.md#container-statuses){target="_blank"} on the Sync Operation page. If a container is marked **Unloadable** after 3 consecutive scan or profile failures, see [Unloadable Container Error](../operations/scan/troubleshooting.md#unloadable-container-error){target="_blank"} for resolution steps (run a Sync for tables, views, or file patterns; for computed assets, force an edit on the asset and re-save it with Validate then Save). ## Actions on Container diff --git a/docs/data-quality-checks/authored-check.md b/docs/data-quality-checks/authored-check.md index e8c9a43389..638420cbc8 100644 --- a/docs/data-quality-checks/authored-check.md +++ b/docs/data-quality-checks/authored-check.md @@ -113,7 +113,7 @@ If the validation fails, a red message will appear saying **"Failed Validation"* ![failed](../assets/data-quality-checks/authored-check/failed.png) !!! note "Container marked as Unloadable" - If validation fails with a message like `Container '' is marked as Unloadable. No attempt was made to load the container due to multiple consecutive failures in prior operations.`, the underlying container has been skipped after 3 consecutive scan or profile failures. For tables, views, and file patterns, run a [Sync Operation](../operations/sync/sync.md) on the datastore to reset the status. For computed assets (Computed Tables, Computed Files, Computed Joins), force an edit on the asset (click **Edit**), then click **Validate** and **Save** to re-evaluate the definition. See [Unloadable Container Error](../operations/scan/scan.md#unloadable-container-error) for the full resolution steps. + If validation fails with a message like `Container '' is marked as Unloadable. No attempt was made to load the container due to multiple consecutive failures in prior operations.`, the underlying container has been skipped after 3 consecutive scan or profile failures. For tables, views, and file patterns, run a [Sync Operation](../operations/sync/sync.md) on the datastore to reset the status. For computed assets (Computed Tables, Computed Files, Computed Joins), force an edit on the asset (click **Edit**), then click **Validate** and **Save** to re-evaluate the definition. See [Unloadable Container Error](../operations/scan/troubleshooting.md#unloadable-container-error) for the full resolution steps. **Step 5:** Once you have a successful validation, click the **"Save"** button. diff --git a/docs/explore/activity.md b/docs/explore/activity.md index 6b05d0946d..78405f504e 100644 --- a/docs/explore/activity.md +++ b/docs/explore/activity.md @@ -2,7 +2,7 @@ **Activity** in Qualytics provides a comprehensive view of all operations, helping users monitor and analyze the performance and workflows across various source datastores. Activities are categorized into **Runs** and **Schedule** operations, offering distinct insights into executed and scheduled activities. -The Rerun and Resume options depend on the type of operation. [Profile](../operations/profile/profile.md) and [Scan](../operations/scan/scan.md) support both because the system can remember where it stopped and continue from there. [Sync](../operations/sync/sync.md), [Export](../operations/export-operation/export-operation.md), and [Materialize](../operations/materialize-operation/materialize-operation.md) only support Rerun, since the system can't pick up from where it left off and must start over. External Scan doesn't support either option, as they don't apply to it. +The Rerun and Resume options depend on the type of operation. [Profile](../operations/profile/profile.md) and [Scan](../operations/scan/getting-started.md) support both because the system can remember where it stopped and continue from there. [Sync](../operations/sync/sync.md), [Export](../operations/export-operation/export-operation.md), and [Materialize](../operations/materialize-operation/materialize-operation.md) only support Rerun, since the system can't pick up from where it left off and must start over. External Scan doesn't support either option, as they don't apply to it. Let’s get started πŸš€ @@ -16,7 +16,7 @@ Let’s get started πŸš€ ![activity](../assets/explore/activity/activity-light.png) -You will be navigated to the **Activity** tab and here you'll see a list of operations [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/scan.md), and [external scan](../operations/external-scan/external-scan.md) across different source datastores. +You will be navigated to the **Activity** tab and here you'll see a list of operations [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/getting-started.md), and [external scan](../operations/external-scan/external-scan.md) across different source datastores. ![list](../assets/explore/activity/list-light.png) @@ -28,7 +28,7 @@ Activities are divided into two categories: Runs and Schedule Operations. Runs p ### Runs -Runs provide a complete record of all executed operations across various source datastores. This section enables users to monitor and review activities such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/scan.md), and [external scan](../operations/external-scan/external-scan.md). Each run displays key details like the operation type, status, execution time, duration, and triggering method, offering a clear overview of system performance and data processing workflows. +Runs provide a complete record of all executed operations across various source datastores. This section enables users to monitor and review activities such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/getting-started.md), and [external scan](../operations/external-scan/external-scan.md). Each run displays key details like the operation type, status, execution time, duration, and triggering method, offering a clear overview of system performance and data processing workflows. ![run](../assets/explore/activity/runs-light.png) @@ -37,9 +37,17 @@ Runs provide a complete record of all executed operations across various source | 1. | Select Source Datastore | Select specific source datastores to focus on their operations. | | 2. | Search | This feature helps users quickly find specific identifiers. | | 3. | Sort By | **Sort By** option helps users organize the list of performed operations by criteria like Duration and Created Date for quick access. | -| 4. | Filter | The filter lets users easily refine the list of performed operations by choosing a specific Type [Scan](../operations/scan/scan.md), [Sync](../operations/sync/sync.md), [Profile](../operations/profile/profile.md), [External Scan](../operations/external-scan/external-scan.md), etc. along with Status (Success, Failure, Running, and Aborted) or **Has Logs** to view operations that completed with logs. | +| 4. | Filter | The filter lets users easily refine the list of performed operations by choosing a specific Type [Scan](../operations/scan/getting-started.md), [Sync](../operations/sync/sync.md), [Profile](../operations/profile/profile.md), [External Scan](../operations/external-scan/external-scan.md), etc. along with Status (Success, Failure, Running, and Aborted) or **Has Logs** to view operations that completed with logs. | | 5. | Activity Heatmap | The **Activity Heatmap** shows daily activity levels, with color intensity indicating operation counts. Hovering over a square reveals details for that day. | -| 6. | Operation List | Shows a list of operations [**sync**](../operations/sync/sync.md), [**profile**](../operations/profile/profile.md), [**scan**](../operations/scan/scan.md), and [**external scan**](../operations/external-scan/external-scan.md), etc performed across various source datastores. | +| 6. | Operation List | Shows a list of operations [**sync**](../operations/sync/sync.md), [**profile**](../operations/profile/profile.md), [**scan**](../operations/scan/getting-started.md), and [**external scan**](../operations/external-scan/external-scan.md), etc performed across various source datastores. | + +#### Auto-Resolved indicator on Scan operations + +When a Scan operation ran with [Auto Resolve Anomalies](../operations/scan/deep-dive/read-strategies.md#auto-resolve-on-full-scans) enabled, the operation row shows a green **Anomalies Auto-Resolved** pill joined to the existing **Anomalies** pill, with the count of previously open anomalies this scan automatically resolved. Expanding the operation row reveals the same value as a stat tile in the summary card. + + + +The pill and stat are hidden when the scan ran as Incremental, when **Auto Resolve Anomalies** was turned off, or when the scan did not auto-resolve any anomaly. ### Activity Heatmap @@ -98,7 +106,7 @@ If the operation is a **Materialize** or **Export** run, users can click the **V ### Schedule -The Schedule section provides a complete record of all scheduled operations across various source datastores. This section enables users to monitor and review scheduled operations such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), and [scan](../operations/scan/scan.md). Each scheduled operation includes key details like operation type, scheduled time, and triggering method, giving users a clear overview of system performance and data workflows. +The Schedule section provides a complete record of all scheduled operations across various source datastores. This section enables users to monitor and review scheduled operations such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), and [scan](../operations/scan/getting-started.md). Each scheduled operation includes key details like operation type, scheduled time, and triggering method, giving users a clear overview of system performance and data workflows. ![schedule](../assets/explore/activity/schedule-light.png) @@ -107,8 +115,8 @@ The Schedule section provides a complete record of all scheduled operations acro | 1. | Selected Source Datastores | Select specific source datastores to focus on their operations. | | 2. | Search | This feature helps users quickly find specific identifiers. | | 3. | Sort By | **Sort By** option helps users organize the list of scheduled operations by criteria like Created Date and Operations for quick access. | -| 4. | Filter | The filter lets users easily refine the list of scheduled operations by choosing a specific operation type: [Scan](../operations/scan/scan.md), [Sync](../operations/sync/sync.md), [Profile](../operations/profile/profile.md), etc. to view. | -| 5. | Operation List | Shows the list of scheduled operations such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/scan.md), etc across various source datastores. | +| 4. | Filter | The filter lets users easily refine the list of scheduled operations by choosing a specific operation type: [Scan](../operations/scan/getting-started.md), [Sync](../operations/sync/sync.md), [Profile](../operations/profile/profile.md), etc. to view. | +| 5. | Operation List | Shows the list of scheduled operations such as [sync](../operations/sync/sync.md), [profile](../operations/profile/profile.md), [scan](../operations/scan/getting-started.md), etc across various source datastores. | #### Deactivate Schedule Operation diff --git a/docs/explore/insights.md b/docs/explore/insights.md index 7f6d1972aa..a156eb16ad 100644 --- a/docs/explore/insights.md +++ b/docs/explore/insights.md @@ -153,11 +153,11 @@ The Checks & Profiling section provides a consolidated view of your active check ![Screenshot](../assets/explore/insights/checks-7-light.png) -**1. Passing Check:** Displays the real-time number of passed checks that were successfully completed during the [**scan**](../operations/scan/scan.md) or [**profile operation**](../operations/profile/profile.md), indicating that the data met the set quality criteria. +**1. Passing Check:** Displays the real-time number of passed checks that were successfully completed during the [**scan**](../operations/scan/getting-started.md) or [**profile operation**](../operations/profile/profile.md), indicating that the data met the set quality criteria. ![passed-check](../assets/explore/insights/passed-check-8-light.png) -**2. Failing Checks:** This shows the real-time number of checks that did not pass during the [**scan**](../operations/scan/scan.md) or [**profile operation**](../operations/profile/profile.md), indicating data that did not meet the quality criteria. +**2. Failing Checks:** This shows the real-time number of checks that did not pass during the [**scan**](../operations/scan/getting-started.md) or [**profile operation**](../operations/profile/profile.md), indicating data that did not meet the quality criteria. ![failed-check](../assets/explore/insights/failed-check-9-light.png) @@ -274,7 +274,7 @@ Field Profiled shows the number of fields processed during the profile runs. It ## Scans -[**Scans**](../operations/scan/scan.md) section provides a clear overview of all scanning activities within a selected period. It helps users keep track of how many scans were performed and how many anomalies were detected during those scans. This section makes it easier to understand the scanning process and manage data by offering insight into how often scans occur. +[**Scans**](../operations/scan/getting-started.md) section provides a clear overview of all scanning activities within a selected period. It helps users keep track of how many scans were performed and how many anomalies were detected during those scans. This section makes it easier to understand the scanning process and manage data by offering insight into how often scans occur. ![scans](../assets/explore/insights/scan-22-light.png) diff --git a/docs/operations/export-operation/export-operation.md b/docs/operations/export-operation/export-operation.md index 78b1dce7ac..3ea74cf7d0 100644 --- a/docs/operations/export-operation/export-operation.md +++ b/docs/operations/export-operation/export-operation.md @@ -1,4 +1,4 @@ -# Export Operation +# :material-database-arrow-right-outline:{ .middle style="color: var(--q-brick)" } Export Operation Qualytics metadata export feature lets you capture the changing states of your data. You can export metadata for Quality Checks, Field Profiles, and Anomalies from selected profiles into an enrichment datastore so that you can perform deeper analysis, identify trends, detect issues, and make informed decisions based on your data. diff --git a/docs/operations/external-scan/external-scan.md b/docs/operations/external-scan/external-scan.md index ccbb492d2a..e19ec4ba87 100644 --- a/docs/operations/external-scan/external-scan.md +++ b/docs/operations/external-scan/external-scan.md @@ -1,4 +1,4 @@ -# External Scan Operation +# :material-database-alert-outline:{ .middle style="color: var(--q-brick)" } External Scan Operation An external scan is ideal for ad hoc scenarios, where you may receive a file intended to be replicated to a source datastore. Before loading, you can perform an external scan to ensure the file aligns with existing data standards. The schema of the file must match the target table or file pattern that has already been profiled within Qualytics, allowing you to reuse the quality checks to identify any issues before data integration. diff --git a/docs/operations/materialize-operation/materialize-operation.md b/docs/operations/materialize-operation/materialize-operation.md index 5f5796c432..ad19d98f93 100644 --- a/docs/operations/materialize-operation/materialize-operation.md +++ b/docs/operations/materialize-operation/materialize-operation.md @@ -1,4 +1,4 @@ -# Materialize Operation +# :material-database-arrow-down-outline:{ .middle style="color: var(--q-brick)" } Materialize Operation **Materialize Operation** captures snapshots of selected containers from a **source datastore** and exports them to an **enrichment datastore** for seamless data loading. Users can run it instantly or schedule it at set intervals, ensuring structured data is readily available for analysis and integration. diff --git a/docs/operations/overview.md b/docs/operations/overview.md new file mode 100644 index 0000000000..e94b2f031b --- /dev/null +++ b/docs/operations/overview.md @@ -0,0 +1,71 @@ +# Operations + +Operations are the actions Qualytics runs against a source datastore. They cover refreshing its inventory, profiling content, scanning data, exporting enrichment results, and promoting checks between environments. Each execution is saved as a **Run**, with its status, duration, summary metrics, and a downloadable report, so every workflow can be audited and re-run later. + +
+ +- :material-play-circle:{ .lg .middle } **Runs** + + --- + + Every operation execution is recorded as a Run. Learn the row anatomy, the lifecycle (Queued β†’ Running β†’ Success/Failure/Aborted), and the actions (Abort, Resume, Rerun, Delete) available in each state. + + [:octicons-arrow-right-24: Runs](runs/getting-started.md) + +- :material-database-sync-outline:{ .lg .middle } **Sync** + + --- + + Refreshes the inventory of containers, their schemas, and incremental identifiers from the source datastore. + + [:octicons-arrow-right-24: Sync](sync/sync.md) + +- :material-database-eye-outline:{ .lg .middle } **Profile** + + --- + + Analyzes the content of selected containers to compute statistics and infer baseline quality checks. + + [:octicons-arrow-right-24: Profile](profile/profile.md) + +- :material-database-search-outline:{ .lg .middle } **Scan** + + --- + + Runs the datastore's quality checks against the data and writes detected anomalies to the linked Enrichment Datastore. + + [:octicons-arrow-right-24: Scan](scan/getting-started.md) + +- :material-database-alert-outline:{ .lg .middle } **External Scan** + + --- + + Records the results of a scan executed outside Qualytics (for example, from a CLI step in your own pipeline) so the findings appear in the Activity tab. + + [:octicons-arrow-right-24: External Scan](external-scan/external-scan.md) + +- :material-database-arrow-right-outline:{ .lg .middle } **Export Operation** + + --- + + Exports the contents of an enrichment table to an external destination. Use to share anomalies, source records, or computed data with downstream consumers. + + [:octicons-arrow-right-24: Export Operation](export-operation/export-operation.md) + +- :material-database-arrow-down-outline:{ .lg .middle } **Materialize Operation** + + --- + + Writes the result of computed tables, files, or joins back into the enrichment datastore so they become available for further analysis, scanning, or export. + + [:octicons-arrow-right-24: Materialize Operation](materialize-operation/materialize-operation.md) + +- :material-database-arrow-up-outline:{ .lg .middle } **Promote** + + --- + + Copies metadata (quality checks, computed fields, computed tables, and computed files) between source datastores. Use to roll out checks from staging to production or to keep environments aligned. + + [:octicons-arrow-right-24: Promote](promote/overview.md) + +
diff --git a/docs/operations/profile/profile.md b/docs/operations/profile/profile.md index e7cb91d550..fd5ab2c939 100644 --- a/docs/operations/profile/profile.md +++ b/docs/operations/profile/profile.md @@ -1,4 +1,4 @@ -# Profile Operation +# :material-database-eye-outline:{ .middle style="color: var(--q-brick)" } Profile Operation The Profile Operation is a comprehensive analysis conducted on every record within all available containers in a datastore. This process is aimed at understanding and improving data quality by generating metadata for each field within the collections of data (like tables or files). @@ -750,4 +750,4 @@ This happens after a container has failed in 3 consecutive scan or profile opera - For tables, views, and file patterns, run a [Sync Operation](../sync/sync.md) on the datastore to reset the status. - For computed assets (Computed Tables, Computed Files, Computed Joins), force an edit on the asset (click **Edit**), then click **Validate** and **Save** to re-evaluate the definition. -See [Unloadable Container Error](../scan/scan.md#unloadable-container-error) for the full resolution steps and common root causes. +See [Unloadable Container Error](../scan/troubleshooting.md#unloadable-container-error) for the full resolution steps and common root causes. diff --git a/docs/operations/promote/overview.md b/docs/operations/promote/overview.md index 6e1bcc501a..f33f85bc24 100644 --- a/docs/operations/promote/overview.md +++ b/docs/operations/promote/overview.md @@ -1,4 +1,4 @@ -# Promote Overview +# :material-database-arrow-up-outline:{ .middle style="color: var(--q-brick)" } Promote Overview Promote is a core operation in Qualytics that allows you to copy and reuse data quality assets across containers and datastores. Instead of manually recreating quality checks, computed fields, computed tables, or computed files in each environment, you can promote them from a source to a destination β€” preserving definitions and ensuring consistency across your data quality ecosystem. diff --git a/docs/operations/runs/api.md b/docs/operations/runs/api.md new file mode 100644 index 0000000000..b2f8797824 --- /dev/null +++ b/docs/operations/runs/api.md @@ -0,0 +1,203 @@ +# :material-api:{ .middle style="color: var(--q-brick)" } Runs API + +The endpoints documented on this page are used to **manage** existing Runs: list them, fetch a single Run, abort an in-flight Run, resume or rerun a finished one, and delete a Run record. For payload examples to **create** a new Run, see the API page for the specific operation type (for example, [Scan API](../scan/api.md){:target="_blank"} or [Promote API](../promote/api.md){:target="_blank"}). + +All endpoints use the base URL of your Qualytics deployment (for example, `https://your-instance.qualytics.io/api`). + +!!! tip + For complete API documentation, including request and response schemas, visit the [API docs](https://demo.qualytics.io/api/docs){:target="_blank"}. + +## Listing Runs + +Send `GET /api/operations` to retrieve a paginated list of Runs across all operation types. The endpoint supports rich filtering and sorting via query parameters. + +| Query parameter | Type | Description | +| :--- | :--- | :--- | +| `id` | integer | Filter by a specific Run ID. | +| `schedule_id` | integer | Filter by the schedule that triggered the Runs. | +| `operation_type` | string | One of `catalog` (Sync operation), `profile`, `scan`, `external_scan`, `export`, `materialize`, `promote`. | +| `finished` | boolean | `true` returns only finished Runs; `false` returns only in-progress. | +| `result` | list of strings | Accepted values: `queued`, `running`, `success`, `failure`, `aborted`. Repeat the parameter to pass multiple. | +| `created_date`, `start_date`, `end_date` | date (`YYYY-MM-DD`) | Filter by the Run's creation, start, or end date. | +| `datastore` | list of integers | Filter by datastore ID. Repeat to pass multiple. | +| `container` | list of integers | Filter by container ID. Repeat to pass multiple. | +| `has_logs` | boolean | `true` returns only Runs that produced log entries. | +| `offset` | integer | Timezone offset in minutes (used to align date filters with the caller's timezone). | +| `sort_created` | `asc` or `desc` | Sort the result by `created_date`. | +| `sort_duration` | `asc` or `desc` | Sort the result by `duration`. | + +??? example "List recent Scan Runs in terminal states" + + **Request**: + + ```bash + curl -X GET "https://your-instance.qualytics.io/api/operations?operation_type=scan&result=success&result=failure&sort_created=desc" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response** (abbreviated): + + ```json + { + "items": [ + { + "id": 53388, + "type": "scan", + "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "result": "success", + "message": null, + "triggered_by": "user@example.com", + "datastore": { "id": 101, "name": "Datastore-Sample" }, + "status": { + "total_containers": 14, + "containers_analyzed": 14, + "partitions_scanned": 42, + "records_processed": 700000, + "anomalies_identified": 0 + } + } + ], + "total": 1, + "page": 1, + "size": 50, + "pages": 1 + } + ``` + +## Retrieving a Run + +Send `GET /api/operations/{id}` to return the full detail payload for a single Run, including the configuration it used, the summary metrics, and the triggering metadata. + +For the full shape of the response object (including operation-specific fields such as `auto_resolved_anomaly_count` on Scan Runs), see the response example in [Scan API: Retrieving Scan operation information](../scan/api.md#retrieving-scan-operation-information){:target="_blank"}. + +??? example "Fetch a single Run by ID" + + **Request**: + + ```bash + curl -X GET "https://your-instance.qualytics.io/api/operations/53388" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + +## Listing containers in a Run + +Send `GET /api/operations/{id}/containers` to return the per-container breakdown of a Run (one entry per container that the Run targeted), with the container-level status and counters. Most useful for Profile and Scan Runs that track progress at the container layer. + +??? example "List containers processed in a Scan Run" + + **Request**: + + ```bash + curl -X GET "https://your-instance.qualytics.io/api/operations/53388/containers" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response** (abbreviated): + + ```json + { + "items": [ + { + "id": 456, + "container": { "id": 234, "name": "customers", "container_type": "table" }, + "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "records_processed": 50000, + "anomaly_count": 0, + "result": "success", + "message": null + } + ], + "total": 14 + } + ``` + +## Aborting a Run + +Send `PUT /api/operations/abort/{id}` to stop an in-progress Run. The Run transitions to **Aborted** as soon as the worker can stop safely, and partial results captured up to the stop point are preserved. + +**Permission**: Editor team permission on the target datastore. + +??? example "Abort an in-flight Run" + + **Request**: + + ```bash + curl -X PUT "https://your-instance.qualytics.io/api/operations/abort/53388" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response** (abbreviated): the endpoint returns the updated Run with `result` set to `aborted` and `end_time` populated. + +## Resuming a Run + +Send `PUT /api/operations/run/{id}` to continue an Aborted or Failure Run from the unprocessed containers (or partitions). Containers already completed in the original Run are not re-read. + +Supported by Profile, Scan, and Promote. Not supported by Sync, External Scan, Export, or Materialize. + +**Permission**: Editor team permission on the target datastore. + +| Query parameter | Type | Description | +| :--- | :--- | :--- | +| `force_abort` | boolean | When `true`, forces the current operation to abort before resuming. Default `false`. | + +??? example "Resume a Run from where it stopped" + + **Request**: + + ```bash + curl -X PUT "https://your-instance.qualytics.io/api/operations/run/53204" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response** (abbreviated): the endpoint returns the updated Run with `result` set back to `running` (or briefly `queued`) and the new dispatch timestamp. + +## Rerunning a Run + +Send `PUT /api/operations/rerun/{id}` to start a brand-new Run that reuses the configuration of an existing Run. Every container is processed from scratch, regardless of whether it succeeded before. + +Supported by Sync, Profile, Scan, Export, Materialize, and Promote. Not supported by External Scan. + +**Permission**: Editor team permission on the target datastore. + +??? example "Rerun a finished Run with the same configuration" + + **Request**: + + ```bash + curl -X PUT "https://your-instance.qualytics.io/api/operations/rerun/53388" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response** (abbreviated): the endpoint returns the **new** Run created from the original's configuration, with a fresh `id`, `created` timestamp, and `result` set to `queued` or `running`. + +## Deleting a Run + +Send `DELETE /api/operations/{id}` to remove the Run record (and its summary metrics) from the Activity list. Anomalies, computed data, and other downstream artifacts produced by the Run are **not** removed; only the Run's own record is. + +**Permission**: Editor team permission on the target datastore. + +??? example "Delete a Run record from the Activity list" + + **Request**: + + ```bash + curl -X DELETE "https://your-instance.qualytics.io/api/operations/53388" \ + -H "Authorization: Bearer YOUR_API_TOKEN" + ``` + + **Response**: `204 No Content` on success. + +## Error Responses + +| Status Code | Description | +| :--- | :--- | +| `400 Bad Request` | The payload is malformed or the `id` path parameter is invalid. | +| `401 Unauthorized` | Missing or invalid API token. | +| `403 Forbidden` | The user does not have the required team permission on the datastore. | +| `404 Not Found` | A Run with the specified ID does not exist. | +| `409 Conflict` | The Run is not in a state that supports the action (for example, calling Abort on a Run that already finished, or Resume on a Run type that does not support it). | +| `422 Unprocessable Entity` | A query parameter or payload value fails schema validation. | + diff --git a/docs/operations/runs/by-types/profile/aborted.md b/docs/operations/runs/by-types/profile/aborted.md new file mode 100644 index 0000000000..6c7ae83463 --- /dev/null +++ b/docs/operations/runs/by-types/profile/aborted.md @@ -0,0 +1,22 @@ +# :material-stop-circle-outline:{ .middle style="color: var(--q-brick)" } Profile β€” Aborted + + + +## See also + +- [Profile β€” Success](success.md) +- [Profile β€” Success with Warning](success-with-warning.md) +- [Profile β€” Failure](failure.md) +- [Profile β€” Running](running.md) +- [Profile β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/profile/failure.md b/docs/operations/runs/by-types/profile/failure.md new file mode 100644 index 0000000000..00b333c624 --- /dev/null +++ b/docs/operations/runs/by-types/profile/failure.md @@ -0,0 +1,23 @@ +# :material-alert-circle-outline:{ .middle style="color: var(--q-brick)" } Profile β€” Failure + + + +## See also + +- [Profile β€” Success](success.md) +- [Profile β€” Success with Warning](success-with-warning.md) +- [Profile β€” Aborted](aborted.md) +- [Profile β€” Running](running.md) +- [Profile β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/profile/queued.md b/docs/operations/runs/by-types/profile/queued.md new file mode 100644 index 0000000000..61fd4297ad --- /dev/null +++ b/docs/operations/runs/by-types/profile/queued.md @@ -0,0 +1,22 @@ +# :material-circle-outline:{ .middle style="color: var(--q-brick)" } Profile β€” Queued + + + +## See also + +- [Profile β€” Running](running.md) +- [Profile β€” Success](success.md) +- [Profile β€” Success with Warning](success-with-warning.md) +- [Profile β€” Failure](failure.md) +- [Profile β€” Aborted](aborted.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/profile/running.md b/docs/operations/runs/by-types/profile/running.md new file mode 100644 index 0000000000..54a8723c64 --- /dev/null +++ b/docs/operations/runs/by-types/profile/running.md @@ -0,0 +1,23 @@ +# :material-progress-clock:{ .middle style="color: var(--q-brick)" } Profile β€” Running + + + +## See also + +- [Profile β€” Success](success.md) +- [Profile β€” Success with Warning](success-with-warning.md) +- [Profile β€” Failure](failure.md) +- [Profile β€” Aborted](aborted.md) +- [Profile β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/profile/success-with-warning.md b/docs/operations/runs/by-types/profile/success-with-warning.md new file mode 100644 index 0000000000..0ca0238f92 --- /dev/null +++ b/docs/operations/runs/by-types/profile/success-with-warning.md @@ -0,0 +1,22 @@ +# :material-alert-circle:{ .middle style="color: var(--q-brick)" } Profile β€” Success with Warning + + + +## See also + +- [Profile β€” Success](success.md) +- [Profile β€” Failure](failure.md) +- [Profile β€” Aborted](aborted.md) +- [Profile β€” Running](running.md) +- [Profile β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/profile/success.md b/docs/operations/runs/by-types/profile/success.md new file mode 100644 index 0000000000..07798b69cd --- /dev/null +++ b/docs/operations/runs/by-types/profile/success.md @@ -0,0 +1,24 @@ +# :material-check-circle-outline:{ .middle style="color: var(--q-brick)" } Profile β€” Success + + + +## See also + +- [Profile β€” Success with Warning](success-with-warning.md) +- [Profile β€” Failure](failure.md) +- [Profile β€” Aborted](aborted.md) +- [Profile β€” Running](running.md) +- [Profile β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/scan/aborted.md b/docs/operations/runs/by-types/scan/aborted.md new file mode 100644 index 0000000000..7a0f2d08b5 --- /dev/null +++ b/docs/operations/runs/by-types/scan/aborted.md @@ -0,0 +1,266 @@ +# :material-alert-circle:{ .middle style="color: var(--q-brick)" } Scan: Aborted + +Aborted operations were stopped manually by a user (or by the platform during shutdown) before completion. Aborted operations expose a **Resume** action that picks up from where the operation stopped, alongside the usual Rerun and Delete actions. The Status badge shows the red **Aborted** state. + +## Expanded row in the operations list + +### Header of the operation + +![aborted-row-overview](../../../../assets/operations/runs/by-types/scan/aborted-row-overview.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Operation ID and type | The unique identifier (for example `#53204`) and the operation type (Scan). The :material-arrow-top-right: icon next to the ID links to the dedicated **Overview** of the operation. | +| 2 | Status badge | The red **Aborted** badge indicating the operation was stopped before completion. | +| 3 | Time info | **Started At** (for example `Started at Jun 5 2026, 10:10 PM (BRT)`) and **Duration** (the time elapsed up to the abort, for example `Took 5 minutes`). | +| 4 | Progress | Containers processed before the abort against the total requested (`5 / 14 Tables`). | +| 5 | Triggered by | The user (avatar and name) who launched the operation. The user who **aborted** the operation is recorded in the Timeline, not in this field. | +| 6 | Schedule | The named schedule if recurring, otherwise `No schedule`. | +| 7 | Quick stats icons | A cluster of small status icons on the right of the row. Each icon shows a tooltip on hover summarizing configuration and result counters at a glance. See the **Right-side icons (No. 7) in detail** breakdown below this table. **Not** the row action buttons. | + +The red Aborted badge differs from Failure in intent: Aborted means a user (or system shutdown) deliberately stopped the run; Failure means a fatal error did. + +**Right-side icons (No. 7) in detail** + +The scan-row cluster contains the following icons, left to right: + +| Icon | Tooltip | Meaning | +| :--: | --- | --- | +| :material-signal: | **Incremental Field** | Whether the scan ran in Incremental mode (filled when active) or Full (grey when the scan was Full). | +| :material-wrench: | **Remediation Strategy** | The remediation strategy used for this run (filled when `Append` or `Overwrite` is set, grey when `None`). | +| :material-alert: / :material-check-bold: | **Anomalies Identified** | Total number of anomalies detected before the abort, with **Open** and **Archived** counts in the tooltip. Reflects partial work. | +| :material-triangle::material-check-bold: | **Anomalies Auto-Resolved** | Number of previously open anomalies (Active or Acknowledged) that the partial run automatically resolved. Shown only on Full scans where Auto Resolve was enabled and at least one anomaly was resolved before the abort. | + +--- + +### Details of the operation + +Expanding the row reveals the **Settings** used for the run (read-only) and the inline action buttons, including **Resume** which is unique to Aborted (and recoverable Failure) operations. + +![aborted-row-settings](../../../../assets/operations/runs/by-types/scan/aborted-row-settings.png) + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Which categories were configured. | +| 2 | Incremental | Read strategy used. | +| 3 | Read Record Limit | Per-container record cap. | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Maximum Record Anomalies per Check | The rollup threshold used. | +| 8 | Maximum Source Examples per Anomaly | Source examples cap used. | +| 9 | Remediation Strategy | The remediation strategy applied. | +| 10 | Results | Opens the Scan Results modal for the containers processed before the abort. | +| 11 | Resume | Continues the operation from where it stopped, with the same settings. | +| 12 | Rerun | Replays the operation as a fresh run with the same configuration, starting from container 1 (does not preserve what was already processed before the abort; use **Resume** instead when you want to continue without reprocessing). | +| 13 | Delete | Removes the operation record from the Activity list. Anomalies and other downstream artifacts produced by the Run are preserved. | + +**Resume** is the differentiator for Aborted operations: it picks up exactly where the run stopped (container N+1 onwards) without reprocessing the containers that already completed. + +--- + +### Summary + +![aborted-row-summary](../../../../assets/operations/runs/by-types/scan/aborted-row-summary.png) + +| No. | Metric | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (full requested list, unchanged by the abort). | +| 2 | Tables Scanned | How many containers completed before the abort. | +| 3 | Partitions Scanned | Partitions read up to the abort. | +| 4 | Records Scanned | Records processed up to the abort. | +| 5 | Anomalies Identified | Anomalies detected on the containers that completed before the abort. | + +For Aborted scans the counters reflect partial work. Use these numbers to estimate what is still pending if you choose **Resume**. + +### Logs + +The **Logs** block on the expanded Aborted row surfaces the `[INFO]`, `[WARN]`, and `[ERROR]` lines emitted during the partial run, in chronological order. For Aborted operations, the block ends with an `[INFO]` line that records the abort actor and timestamp (for example, *"Operation aborted by Sarah Mitchell at 2026-06-05 22:21 UTC"*). The same block is surfaced on the operation detail page in a wider view (see the **Logs block** under **Operation detail page** below). + +The Logs confirm who stopped the run and when. Review them before deciding whether to **Resume** or **Rerun**. + +--- + +## Operation detail page + +Clicking the row opens the dedicated operation detail page, which has two top-level tabs: **Overview** and **Results**. The Overview tab presents the operation's properties, settings, partial metrics, log entries, and chronological timeline. The Results tab drills into the containers that completed before the abort. + +### Overview tab + +The Overview tab opens with a snapshot of the run, organized into five blocks: **Operation**, **Settings**, **Summary**, **Logs**, and **Timeline**. + +![aborted-detail-overview](../../../../assets/operations/runs/by-types/scan/aborted-detail-overview.png) + +#### Operation block + +The **Operation** block at the top carries the properties that summarize the run identity and outcome up to the abort. + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Status | The state badge (Aborted, red). | +| 2 | Started At | Exact start timestamp. | +| 3 | Duration | How long the run lasted up to the abort. | +| 4 | Progress | Containers processed against total requested (often partial, for example `5 / 14`). | +| 5 | Triggered By | User who launched the operation, or schedule. | +| 6 | Schedule | The named schedule (or `No schedule`). | +| 7 | Remediation Strategy | Effective remediation strategy for the run. | + +#### Settings block + +The **Settings** block lists the scan settings used for the run (same fields as the row's Details section, minus the action buttons). + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Categories selected in [Step 2](../../../scan/how-tos/select-check-categories.md) (for example, `Metadata, Data_Integrity`). | +| 2 | Incremental | Whether Incremental was the read strategy (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap (`All` if uncapped). | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto-Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Max Record Anomalies per Check | Rollup threshold used. | +| 8 | Max Source Examples per Anomaly | Source examples cap used. | + +Together, the Operation and Settings blocks act as the canonical "what happened" snapshot of the run. If the operation looks wrong, this is the first place to check the inputs. + +#### Summary block + +Scrolling down on the Overview tab shows the **Summary** block, which repeats the partial metrics from the expanded row. The screenshot below covers Summary, Logs, and Timeline in a single view. + +![aborted-detail-summary](../../../../assets/operations/runs/by-types/scan/aborted-detail-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (unchanged by the abort). | +| 2 | Tables Scanned | Containers that completed before the abort. | +| 3 | Partitions Scanned | Partitions read up to the abort. | +| 4 | Records Scanned | Records processed up to the abort. | +| 5 | Anomalies Identified | Anomalies detected on the containers that completed before the abort, split into **Open** and **Archived**. | + +#### Logs block + +The **Logs** block surfaces the same messages from the row's Logs section, in a wider full-screen-friendly view. For Aborted scans, the block usually ends with an `[INFO]` line confirming the abort actor and timestamp. Each log line has the following structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Severity tag | `[INFO]`, `[WARN]`, or `[ERROR]` | Severity of the log line. Aborted operations typically emit `[INFO]` status confirmations (`"Operation aborted by "`) plus any `[WARN]`/`[ERROR]` lines accumulated before the abort. | +| Message | Human-readable text | The narrative explaining what the platform encountered. Examples: *"Container scan started: customers"*, *"Container scan completed: customers (50K records)"*, *"Operation aborted by Sarah Mitchell at 2026-06-05 22:21 UTC"*. | +| Order | Position in the block | Lines are listed in chronological order from top (earliest) to bottom (most recent), mirroring the event sequence visible on the Timeline. | + +#### Timeline block + +The **Timeline** is critical for Aborted: it records the exact user (or system) who stopped the run and the time at which it happened. Each entry follows the same structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Status icon | Marker on the left | A marker colored by event type. See the legend below. | +| Timestamp | Date and time | When the event was recorded, in the viewer's timezone. Sorted from most recent at the top to oldest at the bottom. | +| Event title | Short label | The event identity (for example, `Operation Started`, `Container scan completed`, `Operation Aborted by `, `Operation Aborted`). | +| Event detail | Context line | Additional information specific to the event: duration, the user who triggered the operation, the container name involved, the abort actor, or other event-specific data. | + +**Timeline icon legend:** + +- :material-check-circle:{ style="color: var(--q-positive);" } **Operation Success** terminal entry. +- :material-close-circle:{ style="color: var(--q-negative);" } **Operation Failure** terminal entry. +- :material-alert-circle:{ style="color: var(--q-warning);" } **Operation Aborted** terminal entry. +- :material-loading:{ style="color: var(--q-info);" } **Operation In Progress** (Queued or Running, animated in the UI). +- :material-play-circle:{ style="color: var(--q-brick);" } **Operation Started**. +- Per-container entries inherit the same icon and color based on each container's own outcome (Success/Failure/Aborted/Running). + +For an Aborted scan, the typical events are: + +- **`Operation Aborted`** at the top, with the abort timestamp and total elapsed time. +- **`Operation Aborted by `**, with the avatar and name of the user who stopped the run. Use this as the audit record for the abort, since the `Triggered By` field on the row only records who started the operation. +- One **`Container scan completed`** entry per container that finished before the abort. +- **`Operation Started`** at the bottom, with `Triggered by `. + +### Results tab + +The **Results** tab lists only the containers that completed before the abort. Each row links to the container's detail page. + +![aborted-detail-results](../../../../assets/operations/runs/by-types/scan/aborted-detail-results.png) + +#### Container sub-tabs + +Expanding a container row reveals **two sub-tabs** that drill into the per-container results captured before the abort. A **Source Records** sub-tab is not exposed for Aborted operations. + +##### Partitions + +Lists every partition read for this container before the abort, with size, read timestamp, completion time, and per-partition counters (records processed, anomalies emitted). Use this view to confirm exactly how much of each container was actually read prior to the abort. + +![aborted-detail-partitions](../../../../assets/operations/runs/by-types/scan/aborted-detail-partitions.png) + +##### Checks + +Lists every check that was asserted against this container, with check ID, name, description, target field, and pass/fail status. Filters at the top (**All / Passed / Failed**) let you focus on the checks that did not pass. For Aborted scans, this view reflects only the checks completed against the partitions read before the abort. + +![aborted-detail-checks](../../../../assets/operations/runs/by-types/scan/aborted-detail-checks.png) + +--- + +## Worked example: Run #53204 + +The scan operation `#53204` was manually aborted on the **Healthcare Analytics** datastore. It started at Jun 5 2026, 10:10 PM (BRT) and ran for 5 minutes before user **Sarah Mitchell** stopped it, after 5 of 14 containers (15 partitions, 250K records) had been processed. The Timeline records both the `Operation Started` event with the launching user, and the `Operation Aborted by ` event identifying Sarah as the abort actor. Use the **Resume** button to continue from container 6 with the same settings, or **Rerun** to start a fresh run. + +## See also + +
+ +- :material-check-circle:{ .lg .middle } **Success** + + --- + + The Run finished cleanly with full results. + + [:octicons-arrow-right-24: Success](success.md) + +- :material-alert-circle:{ .lg .middle } **Success with Warning** + + --- + + The Run finished cleanly but the worker recorded log entries during execution. + + [:octicons-arrow-right-24: Success with Warning](success-with-warning.md) + +- :material-close-circle:{ .lg .middle } **Failure** + + --- + + The Run stopped because of an unrecoverable error. + + [:octicons-arrow-right-24: Failure](failure.md) + +- :material-loading:{ .lg .middle } **Running** + + --- + + A worker is actively processing the Run; counters update live. + + [:octicons-arrow-right-24: Running](running.md) + + + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/scan/failure.md b/docs/operations/runs/by-types/scan/failure.md new file mode 100644 index 0000000000..349cd68b6c --- /dev/null +++ b/docs/operations/runs/by-types/scan/failure.md @@ -0,0 +1,263 @@ +# :material-close-circle:{ .middle style="color: var(--q-brick)" } Scan: Failure + +Failure operations stopped because of a fatal error and did not produce a complete result set. Any partial results captured before the failure are still visible; the Logs section on the detail page explains the cause. The Status badge shows the red **Failure** state. Action buttons available: Results, Resume (when applicable), Rerun, Delete. + +## Expanded row in the operations list + +### Header of the operation + +![failure-row-overview](../../../../assets/operations/runs/by-types/scan/failure-row-overview.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Operation ID and type | The unique identifier (for example `#53201`) and the operation type (Scan). The :material-arrow-top-right: icon next to the ID links to the dedicated **Overview** of the operation. | +| 2 | Status badge | The red **Failure** badge indicating the operation stopped because of a fatal error. | +| 3 | Time info | **Started At** (for example `Started at Jun 5 2026, 11:15 AM (BRT)`) and **Duration** (for example `Took 6 minutes`). | +| 4 | Progress | Containers processed against the total requested (typically `0 / 14 Tables` for early failures). | +| 5 | Triggered by | The user (avatar and name) who launched the operation, or the schedule that triggered it. | +| 6 | Schedule | The named schedule if recurring, otherwise `No schedule`. | +| 7 | Quick stats icons | A cluster of small status icons on the right of the row. Each icon shows a tooltip on hover summarizing configuration and result counters at a glance. See the **Right-side icons (No. 7) in detail** breakdown below this table. **Not** the row action buttons. | + +The red badge is the trigger to open the Logs: a Failure operation always has a cause recorded in the Logs block on the detail page. + +**Right-side icons (No. 7) in detail** + +For this Failure operation the cluster shows the following icons, left to right: + +| Icon | Tooltip | What this screenshot shows | +| :--: | --- | --- | +| :material-signal: | **Incremental Field** | Filled, indicating the Incremental read strategy was Enabled for this run. | +| :material-wrench: | **Remediation Strategy** | Filled, indicating a remediation strategy is set (`Overwrite` in this run). | +| :material-alert: | **Anomalies Identified** | Counter pill; for Failure scans this is typically zero because the operation stopped before reading records. Hovering surfaces the breakdown into **Open** and **Archived**. | + +The **Anomalies Auto-Resolved** indicator (:material-triangle::material-check-bold:) is **not shown** for this run because no anomalies were resolved before the failure (and Auto-Resolve only applies to Full scans). + +--- + +### Details of the operation + +Expanding the row reveals the **Settings** used for the run (read-only) and the inline action buttons (**Results**, **Resume** (when supported), **Rerun**, **Delete**). + +![failure-row-settings](../../../../assets/operations/runs/by-types/scan/failure-row-settings.png) + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Which categories were configured. | +| 2 | Incremental | Read strategy configured (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap (`All` if uncapped). | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Maximum Record Anomalies per Check | The rollup threshold configured. | +| 8 | Maximum Source Examples per Anomaly | Source examples cap configured. | +| 9 | Remediation Strategy | The remediation strategy configured. | +| 10 | Results | Opens the Scan Results modal (often empty for early failures). | +| 11 | Rerun | Replays the operation with the same configuration after the root cause is fixed. | +| 12 | Delete | Removes the operation record from the Activity list. Anomalies and other downstream artifacts produced by the Run are preserved. | + +!!! note "Resume on recoverable failures" + When the Run can be picked up from where it stopped (for example, a transient connection error), a **Resume** button also appears alongside **Rerun** and **Delete**. Resume is hidden when the failure is not resumable. See [Available Actions](../../deep-dive/actions.md){:target="_blank"} for the full rules. + +For Failure operations, the typical workflow is: review the Logs (in the Timeline of the detail page), fix the underlying problem (credentials, connectivity, schema), then click **Rerun**. + +--- + +### Summary + +The **Summary** section reports the headline metrics for the run. + +![failure-row-summary](../../../../assets/operations/runs/by-types/scan/failure-row-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (full requested list, unchanged by the failure). | +| 2 | Tables Scanned | How many containers were actually scanned before the failure (often `0` for early failures). | +| 3 | Partitions Scanned | Total partitions read before the failure. | +| 4 | Records Scanned | Total records processed before the failure. | +| 5 | Anomalies Identified | Anomalies detected before the failure (rarely meaningful for fatal failures). | + +For Failure scans the counters reflect only what was captured before the fatal error; in many cases the operation failed before reading any records, in which case all counters read `0`. + +--- + +## Operation detail page + +Clicking the row opens the dedicated operation detail page, which has two top-level tabs: **Overview** and **Results**. The Overview tab presents the operation's properties, settings, partial metrics, log entries, and chronological timeline. The Results tab drills into the containers (if any) processed before the fatal error. + +### Overview tab + +The Overview tab opens with a snapshot of the run, organized into five blocks: **Operation**, **Settings**, **Summary**, **Logs**, and **Timeline**. + +![failure-detail-overview](../../../../assets/operations/runs/by-types/scan/failure-detail-overview.png) + +#### Operation block + +The **Operation** block at the top carries the properties that summarize the run identity and outcome up to the failure. + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Status | The state badge (Failure, red). | +| 2 | Started At | Exact start timestamp. | +| 3 | Duration | How long the run lasted before the failure. | +| 4 | Progress | Containers processed against total requested (often `0` of N). | +| 5 | Triggered By | User or schedule. | +| 6 | Schedule | The named schedule (or `No schedule`). | +| 7 | Remediation Strategy | Effective remediation strategy for the run. | + +#### Settings block + +The **Settings** block lists the scan settings used for the run (same fields as the row's Details section, minus the action buttons). + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Categories selected in [Step 2](../../../scan/how-tos/select-check-categories.md). | +| 2 | Incremental | Whether Incremental was the read strategy. | +| 3 | Read Record Limit | Per-container record cap. | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto-Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Max Record Anomalies per Check | Rollup threshold configured. | +| 8 | Max Source Examples per Anomaly | Source examples cap configured. | + +Together, the Operation and Settings blocks act as the canonical "what happened" snapshot of the run. If the operation looks wrong, this is the first place to check the inputs. + +#### Summary block + +Scrolling down on the Overview tab shows the **Summary** block, which repeats the row's metrics. The screenshot below covers Summary, Logs, and Timeline in a single view. + +![failure-detail-summary](../../../../assets/operations/runs/by-types/scan/failure-detail-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted. | +| 2 | Tables Scanned | Containers scanned before the failure. | +| 3 | Partitions Scanned | Partitions read before the failure. | +| 4 | Records Scanned | Records processed before the failure. | +| 5 | Anomalies Identified | Anomalies detected before the failure. | + +#### Logs block + +The **Logs** block surfaces every `[ERROR]` and `[INFO]` line emitted during the run, in a wider full-screen-friendly view. For Failure operations, this is where the root-cause messages live. Each log line has the following structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Severity tag | `[ERROR]`, `[INFO]`, or `[WARN]` | Severity of the log line. Failure operations typically emit `[ERROR]` for the fatal condition that stopped the run (connection failures, schema mismatches, permission denials) and `[INFO]` for status confirmations such as "Operation aborted." | +| Message | Human-readable text | The narrative explaining what the platform encountered. Examples: *"Connection to Healthcare Analytics failed after 3 retries, JDBC error 28000 (auth)"*, *"Schema mismatch: column `salary` not found in source"*, *"No partitions scanned. Operation aborted."* | +| Order | Position in the block | Lines are listed in chronological order from top (earliest) to bottom (most recent), mirroring the event sequence visible on the Timeline. | + +#### Timeline block + +The **Timeline** is critical for Failure: it records every event up to the failure point. Each entry follows the same structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Status icon | Marker on the left | A marker colored by event type. See the legend below. | +| Timestamp | Date and time | When the event was recorded, in the viewer's timezone. Sorted from most recent at the top to oldest at the bottom. | +| Event title | Short label | The event identity (for example, `Operation Started`, `Operation Failed`, `Container scan completed`). | +| Event detail | Context line | Additional information specific to the event: duration, the user who triggered the operation, container name, or other event-specific data. | + +**Timeline icon legend:** + +- :material-check-circle:{ style="color: var(--q-positive);" } **Operation Success** terminal entry. +- :material-close-circle:{ style="color: var(--q-negative);" } **Operation Failure** terminal entry. +- :material-alert-circle:{ style="color: var(--q-warning);" } **Operation Aborted** terminal entry. +- :material-loading:{ style="color: var(--q-info);" } **Operation In Progress** (Queued or Running, animated in the UI). +- :material-play-circle:{ style="color: var(--q-brick);" } **Operation Started**. +- Per-container entries inherit the same icon and color based on each container's own outcome (Success/Failure/Aborted/Running). + +For a Failure scan, the typical events are: + +- **`Operation Failed`** at the top, with the failure timestamp. +- **`Operation Started`** at the bottom, with `Triggered by `. + +Error messages appear in the **Logs** block (above the Timeline), not in the Timeline itself. + +### Results tab + +The **Results** tab is typically empty for early failures. For failures that occurred mid-scan, it lists only the containers processed before the fatal error. + +![failure-detail-results](../../../../assets/operations/runs/by-types/scan/failure-detail-results.png) + +#### Container sub-tabs + +Expanding a container row reveals **two sub-tabs** that drill into the per-container results captured before the failure. A **Source Records** sub-tab is not exposed for Failure operations. + +##### Partitions + +Lists every partition read for this container before the failure. + +![failure-detail-partitions](../../../../assets/operations/runs/by-types/scan/failure-detail-partitions.png) + +##### Checks + +Lists every check that was asserted against this container, with pass/fail status. For Failure scans, this is often empty if no records were read. + +![failure-detail-checks](../../../../assets/operations/runs/by-types/scan/failure-detail-checks.png) + +## Worked example: Run #53201 + +The scan operation `#53201` failed on the **Healthcare Analytics** datastore. It started at Jun 5 2026, 11:15 AM (BRT) and stopped after 6 minutes without scanning any of the 14 requested tables. The Logs block shows the cause: *"Connection to Healthcare Analytics failed after 3 retries, JDBC error 28000 (auth)"*, followed by *"No partitions scanned. Operation aborted."* This indicates an authentication failure against the source datastore. The Rerun button replays the same configuration once credentials are corrected; the Delete button removes the failed record from history. + +## See also + +
+ +- :material-check-circle:{ .lg .middle } **Success** + + --- + + The Run finished cleanly with full results. + + [:octicons-arrow-right-24: Success](success.md) + +- :material-alert-circle:{ .lg .middle } **Success with Warning** + + --- + + The Run finished cleanly but the worker recorded log entries during execution. + + [:octicons-arrow-right-24: Success with Warning](success-with-warning.md) + +- :material-alert-circle:{ .lg .middle } **Aborted** + + --- + + A user or the system stopped the Run before completion. + + [:octicons-arrow-right-24: Aborted](aborted.md) + +- :material-loading:{ .lg .middle } **Running** + + --- + + A worker is actively processing the Run; counters update live. + + [:octicons-arrow-right-24: Running](running.md) + + + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/scan/queued.md b/docs/operations/runs/by-types/scan/queued.md new file mode 100644 index 0000000000..e1783704c6 --- /dev/null +++ b/docs/operations/runs/by-types/scan/queued.md @@ -0,0 +1,79 @@ +# :material-circle-outline:{ .middle style="color: var(--q-brick)" } Scan β€” Queued + + + +## See also + +
+ +- :material-progress-clock:{ .lg .middle } **Running** + + --- + + A worker is actively processing the Run; counters update live. + + [:octicons-arrow-right-24: Running](running.md) + +- :material-check-circle-outline:{ .lg .middle } **Success** + + --- + + The Run finished cleanly with full results. + + [:octicons-arrow-right-24: Success](success.md) + +- :material-alert-circle:{ .lg .middle } **Success with Warning** + + --- + + The Run finished cleanly but the worker recorded log entries during execution. + + [:octicons-arrow-right-24: Success with Warning](success-with-warning.md) + +- :material-alert-circle-outline:{ .lg .middle } **Failure** + + --- + + The Run stopped because of an unrecoverable error. + + [:octicons-arrow-right-24: Failure](failure.md) + +- :material-stop-circle-outline:{ .lg .middle } **Aborted** + + --- + + A user or the system stopped the Run before completion. + + [:octicons-arrow-right-24: Aborted](aborted.md) + +- :material-timer-sand:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- :material-gesture-tap:{ .lg .middle } **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/scan/running.md b/docs/operations/runs/by-types/scan/running.md new file mode 100644 index 0000000000..75ede3e78f --- /dev/null +++ b/docs/operations/runs/by-types/scan/running.md @@ -0,0 +1,247 @@ +# :material-loading:{ .middle style="color: var(--q-brick)" } Scan: Running + +Running operations are still in progress. The Status badge shows **Running** in orange, the Duration shows elapsed time, and the Progress bar shows how many containers have been processed so far. The expanded row reveals an **Abort** button that lets you stop the operation before it completes. Action buttons available: Results, **Abort** (replaces Delete while the operation is in flight). + +## Expanded row in the operations list + +### Header of the operation + +![running-row-overview](../../../../assets/operations/runs/by-types/scan/running-row-overview.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Operation ID and type | The unique identifier (for example `#53201`) and the operation type (Scan). The :material-arrow-top-right: icon next to the ID links to the dedicated **Overview** of the operation. | +| 2 | Status badge | The orange **Running** badge indicating the operation is still in progress. | +| 3 | Time info | **Started At** (for example `Started at Jun 5 2026, 3:30 PM (BRT)`) and **Duration** (the elapsed time so far, for example `Running for 1 hour`). The Duration updates live as the operation continues. | +| 4 | Progress | Containers processed so far against the total requested (`4 / 14 Tables`). The bar updates live. | +| 5 | Triggered by | The user (avatar and name) who launched the operation. | +| 6 | Schedule | The named schedule if recurring, otherwise `No schedule`. | +| 7 | Quick stats icons | A cluster of small status icons on the right of the row. Each icon shows a tooltip on hover summarizing configuration and result counters at a glance. Anomaly counters update live as containers are processed. See the **Right-side icons (No. 7) in detail** breakdown below this table. **Not** the row action buttons. | + +The orange Running badge confirms the operation is in flight: counters are updating live and the only terminal action available is **Abort**. + +**Right-side icons (No. 7) in detail** + +The scan-row cluster contains the following icons, left to right: + +| Icon | Tooltip | Meaning | +| :--: | --- | --- | +| :material-signal: | **Incremental Field** | Whether the scan is running in Incremental mode (filled when active) or Full (grey when the scan is Full). | +| :material-wrench: | **Remediation Strategy** | The remediation strategy in effect for this run (filled when `Append` or `Overwrite` is set, grey when `None`). | +| :material-alert: / :material-check-bold: | **Anomalies Identified** | Anomalies detected so far during the run, with **Open** and **Archived** counts in the tooltip. Updates live as containers complete. | +| :material-triangle::material-check-bold: | **Anomalies Auto-Resolved** | Anomalies the in-flight Full scan has already resolved automatically. Updates live; only appears on Full scans where Auto Resolve is enabled. | + +--- + +### Details of the operation + +Expanding the row reveals the **Settings** used for the run (read-only) and the inline action buttons. + +![running-row-settings](../../../../assets/operations/runs/by-types/scan/running-row-settings.png) + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Which categories are running. | +| 2 | Incremental | Read strategy in use (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap. | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Maximum Record Anomalies per Check | The rollup threshold in effect. | +| 8 | Maximum Source Examples per Anomaly | Source examples cap in effect. | +| 9 | Remediation Strategy | The remediation strategy in effect. | +| 10 | Results | Opens the Scan Results modal with whatever has been processed so far. | +| 11 | Abort | Replaces Delete while the operation is in flight. Stops the operation; the row transitions to the **Aborted** state, with an `Operation Aborted by ` event in the Timeline. | + +The unique aspect of Running operations is the **Abort** button: it is the only way to stop the operation before it reaches a terminal state. + +--- + +### Summary + +![running-row-summary](../../../../assets/operations/runs/by-types/scan/running-row-summary.png) + +| No. | Metric | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (fixed for the run). | +| 2 | Tables Scanned | How many containers have completed so far (updates live). | +| 3 | Partitions Scanned | Partitions read so far (updates live). | +| 4 | Records Scanned | Records processed so far (updates live). | +| 5 | Anomalies Identified | Anomalies detected so far, split into Open and Archived (updates live). | + +All counters update in real time as containers are processed. Refreshing the Activity tab pulls the latest values. + +--- + +## Operation detail page + +Clicking the row opens the dedicated operation detail page, which has two top-level tabs: **Overview** and **Results**. The Overview tab presents the operation's live properties, settings, in-flight metrics, and chronological timeline. The Results tab drills into the containers processed so far. + +### Overview tab + +The Overview tab opens with a live snapshot of the run, organized into four blocks: **Operation**, **Settings**, **Summary**, and **Timeline**. Counters update live until the operation reaches a terminal state. + +![running-detail-overview](../../../../assets/operations/runs/by-types/scan/running-detail-overview.png) + +#### Operation block + +The **Operation** block at the top carries the properties that summarize the run identity and live outcome. + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Status | The state badge (Running, orange). | +| 2 | Started At | Exact start timestamp. | +| 3 | Duration | Elapsed time so far, ticking up live (for example, `Running for 1 hour`). | +| 4 | Progress | Containers processed so far against total requested (`4 / 14 Tables`); updates live. | +| 5 | Triggered By | User who launched the operation, or schedule. | +| 6 | Schedule | The named schedule (or `No schedule`). | +| 7 | Remediation Strategy | Effective remediation strategy for the run. | + +#### Settings block + +The **Settings** block lists the scan settings in effect for the run (same fields as the row's Details section, minus the action buttons). + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Categories selected in [Step 2](../../../scan/how-tos/select-check-categories.md) (for example, `Metadata, Data_Integrity`). | +| 2 | Incremental | Whether Incremental is the read strategy (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap (`All` if uncapped). | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto-Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Max Record Anomalies per Check | Rollup threshold in effect. | +| 8 | Max Source Examples per Anomaly | Source examples cap in effect. | + +Together, the Operation and Settings blocks act as the canonical "what is happening" snapshot of the run. If the operation looks wrong, this is the first place to check the inputs. + +#### Summary block + +Scrolling down on the Overview tab shows the **Summary** block, which exposes live metrics that update as containers complete. + +![running-detail-summary](../../../../assets/operations/runs/by-types/scan/running-detail-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (fixed for the run). | +| 2 | Tables Scanned | Containers that have completed so far (updates live). | +| 3 | Partitions Scanned | Partitions read so far (updates live). | +| 4 | Records Scanned | Records processed so far (updates live). | +| 5 | Anomalies Identified | Anomalies detected so far, split into **Open** and **Archived** (updates live). | + +#### Timeline block + +The **Timeline** is the most up-to-date view of progress: it streams new events as containers start and complete. Each entry follows the same structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Status icon | Marker on the left | A marker colored by event type. While the operation is running, only in-progress and container icons appear; the terminal icon (Success, Failure, or Aborted) is added once the operation completes. See the legend below. | +| Timestamp | Date and time | When the event was recorded, in the viewer's timezone. Sorted from most recent at the top to oldest at the bottom. | +| Event title | Short label | The event identity (for example, `Operation Started`, `Container scan started`, `Container scan completed`). | +| Event detail | Context line | Additional information specific to the event: the user who triggered the operation, the container name involved, partition progress, or other event-specific data. | + +**Timeline icon legend:** + +- :material-check-circle:{ style="color: var(--q-positive);" } **Operation Success** terminal entry (added when the run completes successfully). +- :material-close-circle:{ style="color: var(--q-negative);" } **Operation Failure** terminal entry (added on fatal error). +- :material-alert-circle:{ style="color: var(--q-warning);" } **Operation Aborted** terminal entry (added when stopped by user or system). +- :material-loading:{ style="color: var(--q-info);" } **Operation In Progress** (active while Running, animated in the UI). +- :material-play-circle:{ style="color: var(--q-brick);" } **Operation Started**. +- Per-container entries inherit the same icon and color based on each container's own outcome (Success/Failure/Aborted/Running). + +For an in-flight Running scan, the typical events are: + +- **`Container scan started`** / **`Container scan completed`** entries appearing live as each container is picked up and finishes. +- **`Operation Started`** at the bottom, with `Triggered by `. +- No terminal `Operation Success` / `Operation Failed` / `Operation Aborted` event yet; that entry is added at the top once the run finishes. + +### Results tab + +The **Results** tab lists every container that has been processed so far. Containers still in queue appear with a pending indicator. Each row links to the container's detail page. + +![running-detail-results](../../../../assets/operations/runs/by-types/scan/running-detail-results.png) + +#### Container sub-tabs + +Expanding a container row reveals **two sub-tabs** that drill into the live per-container results. A **Source Records** sub-tab is not exposed for Running operations. + +##### Partitions + +Lists every partition read for this container so far, with size, read timestamp, completion time, and per-partition counters (records processed, anomalies emitted). Useful to monitor progress on a container that is still being processed. + +![running-detail-partitions](../../../../assets/operations/runs/by-types/scan/running-detail-partitions.png) + +##### Checks + +Lists every check that has been asserted against this container so far, with check ID, name, description, target field, and pass/fail status. Filters at the top (**All / Passed / Failed**) let you focus on the checks that did not pass. While the operation is running, additional checks are added to the list as the container finishes processing them. + +![running-detail-checks](../../../../assets/operations/runs/by-types/scan/running-detail-checks.png) + +--- + +## Worked example: Run #53201 + +The scan operation `#53201` is currently running against the **Healthcare Analytics** datastore. It started at Jun 5 2026, 3:30 PM (BRT) and has been processing for 1 hour so far, having scanned 4 of 14 requested tables (12 partitions, 320K records). Settings show both Metadata and Data Integrity categories selected with an Incremental read strategy; Auto Resolve is hidden because it does not apply to Incremental scans. Counters update live: use **Abort** to stop the run early, or wait for it to transition into one of the terminal states (Success, Failure, Aborted). + +## See also + +
+ +- :material-check-circle:{ .lg .middle } **Success** + + --- + + The Run finished cleanly with full results. + + [:octicons-arrow-right-24: Success](success.md) + +- :material-alert-circle:{ .lg .middle } **Success with Warning** + + --- + + The Run finished cleanly but the worker recorded log entries during execution. + + [:octicons-arrow-right-24: Success with Warning](success-with-warning.md) + +- :material-close-circle:{ .lg .middle } **Failure** + + --- + + The Run stopped because of an unrecoverable error. + + [:octicons-arrow-right-24: Failure](failure.md) + +- :material-alert-circle:{ .lg .middle } **Aborted** + + --- + + A user or the system stopped the Run before completion. + + [:octicons-arrow-right-24: Aborted](aborted.md) + + + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/scan/success-with-warning.md b/docs/operations/runs/by-types/scan/success-with-warning.md new file mode 100644 index 0000000000..ef40a56af5 --- /dev/null +++ b/docs/operations/runs/by-types/scan/success-with-warning.md @@ -0,0 +1,274 @@ +# :material-alert-circle:{ .middle style="color: var(--q-brick)" } Scan: Success with Warning + +Some Success operations complete cleanly **but** the worker recorded log entries during execution (skipped partitions, container errors that did not abort the operation, anomalies whose checks ran with reduced coverage). The Status badge still shows the green **Success** state. + +An additional :material-alert-circle:{ style="color: var(--q-warning);" } icon appears next to the Run ID with the tooltip *"Warning: Completed with logs"*. The underlying status remains Success, and the Run is treated as a terminal Success everywhere downstream (Auto-Resolve eligibility, summary metrics, schedules). The indicator only flags that the operation log is worth opening before relying on the result. + +Action buttons available: **Results**, **Rerun**, **Delete**. + +## Expanded row in the operations list + +### Header of the operation + +![success-with-warning-row-overview](../../../../assets/operations/runs/by-types/scan/success-with-warning-row-overview.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Operation ID and type | The unique identifier (for example `#53201`) and the operation type (Scan). The :material-arrow-top-right: icon next to the ID links to the dedicated **Overview** of the operation. | +| 2 | Status badge | The green **Success** badge. An :material-alert-circle:{ style="color: var(--q-warning);" } indicator appears next to the Run ID (in cell No. 1) with the tooltip *"Warning: Completed with logs"*, signaling that the worker recorded log entries during the run. The Status badge itself stays green. | +| 3 | Time info | The operation's timing: **Started At** (for example `Started at Jun 5 2026, 11:15 AM (BRT)`) and **Duration** (for example `Took 14 minutes`). | +| 4 | Progress | Containers processed against the total requested (`1 / 14 Tables`). | +| 5 | Triggered by | The user (avatar and name) who launched the operation, or the schedule that triggered it. | +| 6 | Schedule | The named schedule if recurring, otherwise `No schedule`. | +| 7 | Quick stats icons | A cluster of small status icons on the right of the row. Each icon shows a tooltip on hover summarizing configuration and result counters at a glance. See the **Right-side icons (No. 7) in detail** breakdown below this table. **Not** the row action buttons. | + +**Right-side icons (No. 7) in detail** + +For this operation the cluster shows three icons, left to right: + +| Icon | Tooltip | What this screenshot shows | +| :--: | --- | --- | +| :material-signal: | **Incremental Field** | Filled, indicating the Incremental read strategy was Enabled for this run. | +| :material-wrench: | **Remediation Strategy** | Filled, indicating a remediation strategy is set (`Overwrite` in this run). | +| :material-alert: | **Anomalies Identified** | Orange triangle with the count **27**, indicating 27 anomalies were detected. Hovering surfaces the breakdown into **Open** and **Archived**. | + +The **Anomalies Auto-Resolved** indicator (:material-triangle::material-check-bold:) is **not shown** in this screenshot because Auto-Resolve only applies to Full scans, and this operation ran in Incremental mode. + +--- + +### Details of the operation + +Expanding the row reveals the **Settings** used for the run (read-only) and the inline action buttons (**Results**, **Rerun**, **Delete**). + +![success-with-warning-row-settings](../../../../assets/operations/runs/by-types/scan/success-with-warning-row-settings.png) + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Which categories ran. | +| 2 | Incremental | Read strategy used (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap. | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Maximum Record Anomalies per Check | The rollup threshold used. | +| 8 | Maximum Source Examples per Anomaly | How many source records were captured per anomaly. | +| 9 | Remediation Strategy | The remediation strategy applied. | +| 10 | Results | Opens the Scan Results modal for the operation. | +| 11 | Rerun | Replays the operation with the same configuration. | +| 12 | Delete | Removes the operation record from the Activity list. Anomalies and other downstream artifacts produced by the Run are preserved. | + +For Success-with-Warning operations, the Rerun button is the typical next step after reviewing the Logs and fixing the underlying issue (for example, restoring partition key values that were null during this run). + +--- + +### Summary and Logs + +The **Summary** section reports the headline metrics for the run. + +![success-with-warning-row-summary](../../../../assets/operations/runs/by-types/scan/success-with-warning-row-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted. | +| 2 | Tables Scanned | Total containers actually scanned. For Success-with-Warning runs, this may be lower than Tables Requested if some containers were skipped. | +| 3 | Partitions Scanned | Total partitions read across all scanned containers. | +| 4 | Records Scanned | Total records processed (formatted with K/M/B suffixes when large). | +| 5 | Anomalies Identified | Total anomalies detected, split into **Open** and **Archived**. | +| 6 | Logs block | Inline block listing every `[WARN]` and `[INFO]` message emitted during the run, in chronological order. The same block is surfaced on the operation detail page. | + +The Logs block is what surfaces the Warning indicator next to the Run ID. Typical messages include anomaly-coverage drops below 100%, skipped partitions due to null keys, and container-level errors that did not propagate to a full failure. + +The Logs explain why the run carries the Warning indicator. Review them before deciding whether to Rerun. + +--- + +## Operation detail page + +Clicking the row opens the dedicated operation detail page, which has two top-level tabs: **Overview** and **Results**. The Overview tab presents the operation's properties, settings, headline metrics, log entries, and chronological timeline. The Results tab drills into each container scanned by the run. + +### Overview tab + +The Overview tab opens with a snapshot of the run, organized into five blocks: **Operation**, **Settings**, **Summary**, **Timeline**, and **Logs**. + +![success-with-warning-detail-overview](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-overview.png) + +#### Operation block + +The **Operation** block at the top carries the properties that summarize the run identity and outcome. + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Status | The green **Success** badge with the :material-alert-circle:{ style="color: var(--q-warning);" } indicator next to the Run ID. | +| 2 | Started At | Exact start timestamp. | +| 3 | Duration | How long the run lasted. | +| 4 | Progress | Containers processed against total requested. | +| 5 | Triggered By | User or schedule. | +| 6 | Schedule | The named schedule (or `No schedule`). | +| 7 | Remediation Strategy | Effective remediation strategy for the run. | + +#### Settings block + +The **Settings** block lists the scan settings used for the run (same fields as the row's Details section, minus the action buttons). + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Categories selected in [Step 2](../../../scan/how-tos/select-check-categories.md) (for example, `Metadata, Data_Integrity`). | +| 2 | Incremental | Whether Incremental was the read strategy (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap (`All` if uncapped). | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto-Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Max Record Anomalies per Check | Rollup threshold used. | +| 8 | Max Source Examples per Anomaly | Source examples cap used. | + +Together, the Operation and Settings blocks act as the canonical "what happened" snapshot of the run. If the operation looks wrong, this is the first place to check the inputs. + +#### Summary block + +Scrolling down on the Overview tab shows the **Summary** block, which repeats the row's metrics. + +![success-with-warning-detail-summary](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted. | +| 2 | Tables Scanned | Total containers actually scanned. | +| 3 | Partitions Scanned | Total partitions read. | +| 4 | Records Scanned | Total records processed. | +| 5 | Anomalies Identified | Open and Archived counters. | + +#### Timeline block + +The **Timeline** lists every event recorded for the operation, in reverse chronological order. Each entry follows the same structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Status icon | Marker on the left | A marker colored by event type. See the legend below. | +| Timestamp | Date and time | When the event was recorded, in the viewer's timezone (for example, `JUN 5 2026, 11:29 AM (BRT)`). Sorted from most recent at the top to oldest at the bottom. | +| Event title | Short label | The event identity (for example, `Operation Started`, `Operation Success`, `Container scan completed`). | +| Event detail | Context line | Additional information specific to the event: duration, the user who triggered the operation, the container name involved, or other event-specific data. | + +**Timeline icon legend:** + +- :material-check-circle:{ style="color: var(--q-positive);" } **Operation Success** terminal entry. +- :material-close-circle:{ style="color: var(--q-negative);" } **Operation Failure** terminal entry. +- :material-alert-circle:{ style="color: var(--q-warning);" } **Operation Aborted** terminal entry. +- :material-loading:{ style="color: var(--q-info);" } **Operation In Progress** (Queued or Running, animated in the UI). +- :material-play-circle:{ style="color: var(--q-brick);" } **Operation Started**. +- Per-container entries inherit the same icon and color based on each container's own outcome (Success/Failure/Aborted/Running). + +For a Success-with-Warning scan, the typical events are: + +- **`Operation Success`** at the top, with the total duration. +- **`Container scan completed`** entries for each container that finished. +- **`Operation Started`** at the bottom, with `Triggered by `. + +Warning reasons appear in the **Logs** block (below the Timeline), not in the Timeline itself. + +#### Logs block + +The **Logs** block surfaces the same messages from the row summary, in a wider full-screen-friendly view. Each log line has the following structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Severity tag | `[WARN]`, `[INFO]`, or `[ERROR]` | Severity of the log line. Success-with-Warning operations typically emit `[WARN]` for the non-fatal issues that surfaced the Warning indicator, and `[INFO]` for status confirmations such as "Operation completed with logs." | +| Message | Human-readable text | The narrative explaining what the platform encountered. Examples: *"27 anomalies detected across 6 containers; coverage dropped to 87%"*, *"3 checks skipped due to NULL partition keys"*, *"Operation completed with warnings."* | +| Order | Position in the block | Lines are listed in chronological order from top (earliest) to bottom (most recent), mirroring the event sequence visible on the Timeline. | + +### Results tab + +The **Results** tab lists every container scanned by this Success-with-Warning operation. Each row links to the container's detail page. + +![success-with-warning-detail-results](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-results.png) + +#### Container sub-tabs + +Expanding a container row reveals **three sub-tabs** that drill into the per-container results. + +##### Partitions + +Lists every partition read for this container, with size, read timestamp, completion time, and per-partition counters (records processed, anomalies emitted). Use this view to confirm coverage when a container is processed across many partitions, especially after a Success-with-Warning run where the Logs may have flagged skipped partitions. + +![success-with-warning-detail-partitions](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-partitions.png) + +##### Checks + +Lists every check that was asserted against this container, with check ID, name, description, target field, and pass/fail status. Filters at the top (**All / Passed / Failed**) let you focus on the checks that did not pass. This is the most actionable view: it tells you exactly which expectations broke on this container during this scan. + +![success-with-warning-detail-checks](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-checks.png) + +##### Source Records + +Lists the **distinct source records identified as anomalous** for this container. Each row is a distinct record with a counter of how many anomalies are tied to it, and the same anomaly can appear on multiple records. The empty state reads *"No source records available for this container scan"* when nothing was captured. Records are bounded by the **Maximum Source Examples per Anomaly** cap set in [Step 4](../../../scan/how-tos/scan-settings.md). Use the toolbar to **Sort By** any field, toggle **Field visibility**, **Reveal masked** values (when masked fields are present and you have permission), **Refresh** the table, or **Download** the records for offline analysis. + +![success-with-warning-detail-source-records](../../../../assets/operations/runs/by-types/scan/success-with-warning-detail-source-records.png) + +--- + +## Worked example: Run #53201 + +The scan operation `#53201` completed successfully but with log entries on the **Healthcare Analytics** datastore. The badge shows green **Success** with the :material-alert-circle:{ style="color: var(--q-warning);" } Warning indicator next to the Run ID. It started at Jun 5 2026, 11:15 AM (BRT) and ran for 14 minutes, scanning 1 of 14 requested tables and processing 700K records across 43 partitions. The Logs block reports that 27 anomalies were detected across 6 containers, that 3 checks were skipped due to NULL partition keys, and that coverage dropped to 87% before the operation completed. Despite the warnings in the log, results were saved and the open anomaly queue reflects the 27 anomalies as expected. + +## See also + +
+ +- :material-check-circle:{ .lg .middle } **Success** + + --- + + The Run finished cleanly with full results. + + [:octicons-arrow-right-24: Success](success.md) + +- :material-close-circle:{ .lg .middle } **Failure** + + --- + + The Run stopped because of an unrecoverable error. + + [:octicons-arrow-right-24: Failure](failure.md) + +- :material-alert-circle:{ .lg .middle } **Aborted** + + --- + + A user or the system stopped the Run before completion. + + [:octicons-arrow-right-24: Aborted](aborted.md) + +- :material-loading:{ .lg .middle } **Running** + + --- + + A worker is actively processing the Run; counters update live. + + [:octicons-arrow-right-24: Running](running.md) + + + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/scan/success.md b/docs/operations/runs/by-types/scan/success.md new file mode 100644 index 0000000000..9d5842f975 --- /dev/null +++ b/docs/operations/runs/by-types/scan/success.md @@ -0,0 +1,254 @@ +# :material-check-circle:{ .middle style="color: var(--q-brick)" } Scan: Success + +A Successful Scan completed every requested check on every targeted container. Anomalies (if any) are recorded; if none were found, the operation reports zero anomalies and every check is shown as passed in the Checks sub-tab. + +## Expanded row in the operations list + +From the Activity tab, every operation appears as a row. Expanding a row reveals three parts: the **header** (identity and status), the **details** (settings used), and the **summary** (outcome metrics). + +### Header of the operation + +![success-row-overview](../../../../assets/operations/runs/by-types/scan/success-row-overview.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Operation ID and type | The unique identifier (for example `#53388`) and the operation type (Scan). Next to the ID, the **View operation** icon (:material-arrow-top-right:) opens the dedicated **Overview** page (the same page covered in *Operation detail page* below). | +| 2 | Status badge | The terminal state. For Success it shows the green **Success** badge. | +| 3 | Time info | The operation's timing: **Started At** (start timestamp, for example `Started at Jun 4 2026, 6:45 PM (BRT)`) and **Duration** (how long it ran, for example `Took 38 minutes`). | +| 4 | Progress | The number of containers processed against the total requested (for example, `14 / 14 Tables`). | +| 5 | Triggered by | The user (avatar and name) who launched the operation, or the schedule that triggered it. | +| 6 | Schedule | The named schedule if the operation was scheduled, otherwise `No schedule`. | +| 7 | Quick stats icons | A cluster of small status icons on the right of the row. Each icon shows a tooltip on hover summarizing configuration and result counters at a glance. See the **Right-side icons (No. 7) in detail** breakdown below this table. These are **not** the row action buttons (Results, Rerun, Delete); those live in the expanded **Details** section. | + +**Right-side icons (No. 7) in detail** + +The scan-row cluster contains the following icons, left to right: + +| Icon | Tooltip | Meaning | +| :--: | --- | --- | +| :material-signal: | **Incremental Field** | Whether the scan ran in Incremental mode (filled when active) or Full (grey when the scan was Full). | +| :material-wrench: | **Remediation Strategy** | The remediation strategy used for this run (filled when `Append` or `Overwrite` is set, grey when `None`). | +| :material-alert: / :material-check-bold: | **Anomalies Identified** | Total number of anomalies detected in this scan, with **Open** and **Archived** counts in the tooltip. The pill is orange with the count when anomalies were found, green check when zero. | +| :material-triangle::material-check-bold: | **Anomalies Auto-Resolved** | Number of previously open anomalies ([Active or Acknowledged](../../../../anomalies/status.md)) that this scan automatically resolved. Shown only on Full scans where Auto Resolve was enabled and at least one anomaly was resolved. | + +The row header is intended for quick triage: at a glance you can confirm the operation ran to completion, see how long it took, who started it, and decide whether to open the full results or remove the record. + +--- + +### Details of the operation + +Expanding the row reveals the **Settings** that were used for the run (read-only) and the inline action buttons (**Results**, **Rerun**, **Delete**) to operate on the record. This block confirms exactly how the scan was configured (including the Anomaly Options and Advanced Options chosen in [Step 4](../../../scan/how-tos/scan-settings.md)) and gives you the actions to inspect, replay, or remove the operation. + +![success-row-settings](../../../../assets/operations/runs/by-types/scan/success-row-settings.png) + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Which categories ran: `Metadata`, `Data Integrity`, or both. | +| 2 | Incremental | Whether the read strategy was Incremental (Enabled) or Full (Disabled). | +| 3 | Read Record Limit | The per-container record cap from [Step 3](../../../scan/how-tos/read-settings.md) (`All` means no cap). | +| 4 | Archive Duplicate Anomalies | Whether duplicate anomalies are archived (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Whether archived anomalies are reactivated when their fingerprint reappears. | +| 6 | Auto Resolve Anomalies | Whether previously open anomalies are resolved when no longer detected (Full scans only). | +| 7 | Maximum Record Anomalies per Check | The rollup threshold used for the run. | +| 8 | Maximum Source Examples per Anomaly | How many source records were captured per anomaly. | +| 9 | Remediation Strategy | The remediation strategy applied (`Append`, `Overwrite`, or `None`). | +| 10 | Results | Action button that opens the Scan Results modal for the operation. | +| 11 | Rerun | Replays the operation with the same configuration. | +| 12 | Delete | Removes the operation record from the Activity list. Anomalies and other downstream artifacts produced by the Run are preserved. | + +The settings (1-9) are read-only on the row; to change them, configure a new scan from the Scan Operation modal. The action buttons (10-12) operate on this specific record: **Results** opens the Scan Results modal, **Rerun** replays the operation with the same configuration, and **Delete** removes the operation record (downstream anomalies and artifacts are preserved). The block is most useful for confirming that a recurring schedule is still using the expected settings, and for debugging unexpected anomaly counts (for example, when Auto Resolve was unexpectedly disabled). + +--- + +### Summary + +The **Summary** section reports the headline metrics for the run. These are the numbers most stakeholders care about and the first thing to share after a scan completes. + +![success-row-summary](../../../../assets/operations/runs/by-types/scan/success-row-summary.png) + +| No. | Metric | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | How many containers were targeted by the run, before any per-container skip or fallback. | +| 2 | Tables Scanned | How many of those containers were actually scanned. For Success scans, this typically equals Tables Requested. | +| 3 | Partitions Scanned | Total number of partitions read across all scanned containers. | +| 4 | Records Scanned | Total number of records processed (formatted with K/M/B suffixes when large). | +| 5 | Anomalies Identified | The total count of anomalies detected during the run, with separate `Open` and `Archived` counters. | + +Compare Tables Requested versus Tables Scanned to detect silent partial failures: if they diverge, some containers were skipped (typically due to Unloadable status). The Anomalies Identified counter is split into Open and Archived because the Archive Duplicate Anomalies setting may have moved some anomalies straight to Archived. + +--- + +## Operation detail page + +Clicking the row opens the dedicated operation detail page, which has two top-level tabs: **Overview** and **Results**. The Overview tab presents the same information as the expanded row, plus a chronological timeline. The Results tab drills into each container. + +### Overview tab + +The Overview tab opens with a snapshot of the run, organized into four blocks: **Operation**, **Settings**, **Summary**, and **Timeline**. + +![success-detail-overview](../../../../assets/operations/runs/by-types/scan/success-detail-overview.png) + +#### Operation block + +The **Operation** block at the top carries the properties that summarize the run identity and outcome. + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Status | The state badge (Success). | +| 2 | Started At | Exact start timestamp in the operator's timezone (for example, `Jun 4 2026, 6:45 PM (BRT)`). | +| 3 | Duration | How long the run lasted (for example, `Took 38 minutes`). | +| 4 | Progress | Containers processed against total requested (`14 / 14 Tables`). | +| 5 | Triggered By | User (avatar and name) or schedule. | +| 6 | Schedule | The named schedule (or `No schedule`). | +| 7 | Remediation Strategy | Effective remediation strategy for the run (`Overwrite`, `Append`, or `None`). | + +#### Settings block + +The **Settings** block lists the scan settings used for the run. + +| No. | Setting | What it shows | +| --- | --- | --- | +| 1 | Check Categories | Categories selected in [Step 2](../../../scan/how-tos/select-check-categories.md) (for example, `Metadata, Data_Integrity`). | +| 2 | Incremental | Whether Incremental was the read strategy (Enabled/Disabled). | +| 3 | Read Record Limit | Per-container record cap (`All` if uncapped). | +| 4 | Archive Duplicate Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Enabled/Disabled). | +| 5 | Reactivate Recurring Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md). | +| 6 | Auto-Resolve Anomalies | Setting from [Step 4](../../../scan/how-tos/scan-settings.md) (Full scans only). | +| 7 | Max Record Anomalies per Check | Rollup threshold used for this run. | +| 8 | Max Source Examples per Anomaly | Source examples cap used. | + +Together, the Operation and Settings blocks act as the canonical "what happened" snapshot of the run. If the operation looks wrong, this is the first place to check the inputs. + +#### Summary block + +Scrolling down on the Overview tab shows the **Summary** block, which repeats the headline metrics from the expanded row. + +![success-detail-summary](../../../../assets/operations/runs/by-types/scan/success-detail-summary.png) + +| No. | Element | What it shows | +| --- | --- | --- | +| 1 | Tables Requested | Total containers targeted (`14`). | +| 2 | Tables Scanned | Total containers actually scanned (`14`). | +| 3 | Partitions Scanned | Total partitions read (`42`). | +| 4 | Records Scanned | Total records processed (`700K`). | +| 5 | Anomalies Identified | Open and Archived counters (`Open 0`, `Archived 0`). | + +#### Timeline block + +The **Timeline** is a chronological list of every event recorded for the operation. Each entry in the list follows the same structure. + +| Part | Element | What it shows | +| --- | --- | --- | +| Status icon | Marker on the left | A marker colored by event type. See the legend below. | +| Timestamp | Date and time | When the event was recorded, in the viewer's timezone (for example, `JUN 4 2026, 7:23 PM (BRT)`). Sorted from most recent at the top to oldest at the bottom. | +| Event title | Short label | The event identity (for example, `Operation Started`, `Operation Success`, `Operation Aborted`, `Container scan completed`). | +| Event detail | Context line | Additional information specific to the event: duration of the run, the user who triggered the operation (with avatar), the container name involved, the abort reason, or other event-specific data. Some events have no detail line. | + +**Timeline icon legend:** + +- :material-check-circle:{ style="color: var(--q-positive);" } **Operation Success** terminal entry. +- :material-close-circle:{ style="color: var(--q-negative);" } **Operation Failure** terminal entry. +- :material-alert-circle:{ style="color: var(--q-warning);" } **Operation Aborted** terminal entry. +- :material-loading:{ style="color: var(--q-info);" } **Operation In Progress** (Queued or Running, animated in the UI). +- :material-play-circle:{ style="color: var(--q-brick);" } **Operation Started**. +- Per-container entries inherit the same icon and color based on each container's own outcome (Success/Failure/Aborted/Running). + +For a Success scan, the typical events shown are: + +- **`Operation Success`** at the top, with `Duration: Took N minutes`. +- One **`Container scan completed`** entry per container scanned (when the operation includes per-container events). +- **`Operation Started`** at the bottom, with `Triggered by `. + +The Timeline is the audit trail for the operation. Use it to confirm exactly when a specific container started scanning, when the first anomaly was emitted, or when the operation transitioned to a final state. For Aborted and Failure operations, the Timeline also records the user or system that triggered the abort and the timestamp. Use this as the audit record for the operation. + +### Results tab + +The **Results** tab lists every container scanned by the operation, with anomaly counts and overall status. Each row links to the container's detail page. Use this view to find the containers with the most anomalies or to navigate to a specific scanned table. + +![success-detail-results](../../../../assets/operations/runs/by-types/scan/success-detail-results.png) + +#### Container sub-tabs + +Expanding a container row reveals **two sub-tabs** that drill into the per-container results. A **Source Records** sub-tab also appears when the scan captures source records for anomalies; this example produced zero anomalies, so the sub-tab is not shown here. For a worked example of Source Records with anomalies, see [Success with Warning](success-with-warning.md#source-records). + +##### Partitions + +Lists every partition read for this container, with size, read timestamp, completion time, and per-partition counters (records processed, anomalies emitted). Use this view to confirm coverage when a container is processed across many partitions. + +![success-detail-partitions](../../../../assets/operations/runs/by-types/scan/success-detail-partitions.png) + +##### Checks + +Lists every check that was asserted against this container, with check ID, name, description, target field, and pass/fail status. Filters at the top (**All / Passed / Failed**) let you focus on the checks that did not pass. This is the most actionable view: it tells you exactly which expectations broke on this container during this scan. + +![success-detail-checks](../../../../assets/operations/runs/by-types/scan/success-detail-checks.png) + +--- + +## Worked example: Run #53388 + +The scan operation `#53388` ran successfully against the **Healthcare Analytics** datastore. It started at Jun 5 2026, 6:43 PM (BRT) and completed in 39 minutes, scanning all 14 of 14 requested tables across 42 partitions and processing 700K records. The run used both Metadata and Data Integrity check categories, with Incremental read strategy enabled and the full set of Anomaly Options (Archive Duplicate, Reactivate Recurring, Auto Resolve) active. Zero anomalies were identified, indicating every check passed across every scanned container. + +## See also + +
+ +- :material-alert-circle:{ .lg .middle } **Success with Warning** + + --- + + The Run finished cleanly but the worker recorded log entries during execution. + + [:octicons-arrow-right-24: Success with Warning](success-with-warning.md) + +- :material-close-circle:{ .lg .middle } **Failure** + + --- + + The Run stopped because of an unrecoverable error. + + [:octicons-arrow-right-24: Failure](failure.md) + +- :material-alert-circle:{ .lg .middle } **Aborted** + + --- + + A user or the system stopped the Run before completion. + + [:octicons-arrow-right-24: Aborted](aborted.md) + +- :material-loading:{ .lg .middle } **Running** + + --- + + A worker is actively processing the Run; counters update live. + + [:octicons-arrow-right-24: Running](running.md) + + + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram, transitions, and per-operation-type lifecycle. + + [:octicons-arrow-right-24: Lifecycle](../../deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown. + + [:octicons-arrow-right-24: Available Actions](../../deep-dive/actions.md) + +
diff --git a/docs/operations/runs/by-types/sync/aborted.md b/docs/operations/runs/by-types/sync/aborted.md new file mode 100644 index 0000000000..cbf7a16a2f --- /dev/null +++ b/docs/operations/runs/by-types/sync/aborted.md @@ -0,0 +1,22 @@ +# :material-stop-circle-outline:{ .middle style="color: var(--q-brick)" } Sync β€” Aborted + + + +## See also + +- [Sync β€” Success](success.md) +- [Sync β€” Success with Warning](success-with-warning.md) +- [Sync β€” Failure](failure.md) +- [Sync β€” Running](running.md) +- [Sync β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/sync/failure.md b/docs/operations/runs/by-types/sync/failure.md new file mode 100644 index 0000000000..c3937a1fd5 --- /dev/null +++ b/docs/operations/runs/by-types/sync/failure.md @@ -0,0 +1,23 @@ +# :material-alert-circle-outline:{ .middle style="color: var(--q-brick)" } Sync β€” Failure + + + +## See also + +- [Sync β€” Success](success.md) +- [Sync β€” Success with Warning](success-with-warning.md) +- [Sync β€” Aborted](aborted.md) +- [Sync β€” Running](running.md) +- [Sync β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/sync/queued.md b/docs/operations/runs/by-types/sync/queued.md new file mode 100644 index 0000000000..c29ae49e72 --- /dev/null +++ b/docs/operations/runs/by-types/sync/queued.md @@ -0,0 +1,22 @@ +# :material-circle-outline:{ .middle style="color: var(--q-brick)" } Sync β€” Queued + + + +## See also + +- [Sync β€” Running](running.md) +- [Sync β€” Success](success.md) +- [Sync β€” Success with Warning](success-with-warning.md) +- [Sync β€” Failure](failure.md) +- [Sync β€” Aborted](aborted.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/sync/running.md b/docs/operations/runs/by-types/sync/running.md new file mode 100644 index 0000000000..692a3b2221 --- /dev/null +++ b/docs/operations/runs/by-types/sync/running.md @@ -0,0 +1,23 @@ +# :material-progress-clock:{ .middle style="color: var(--q-brick)" } Sync β€” Running + + + +## See also + +- [Sync β€” Success](success.md) +- [Sync β€” Success with Warning](success-with-warning.md) +- [Sync β€” Failure](failure.md) +- [Sync β€” Aborted](aborted.md) +- [Sync β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/sync/success-with-warning.md b/docs/operations/runs/by-types/sync/success-with-warning.md new file mode 100644 index 0000000000..d5922d083a --- /dev/null +++ b/docs/operations/runs/by-types/sync/success-with-warning.md @@ -0,0 +1,22 @@ +# :material-alert-circle:{ .middle style="color: var(--q-brick)" } Sync β€” Success with Warning + + + +## See also + +- [Sync β€” Success](success.md) +- [Sync β€” Failure](failure.md) +- [Sync β€” Aborted](aborted.md) +- [Sync β€” Running](running.md) +- [Sync β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/by-types/sync/success.md b/docs/operations/runs/by-types/sync/success.md new file mode 100644 index 0000000000..ec6b1c7b97 --- /dev/null +++ b/docs/operations/runs/by-types/sync/success.md @@ -0,0 +1,24 @@ +# :material-check-circle-outline:{ .middle style="color: var(--q-brick)" } Sync β€” Success + + + +## See also + +- [Sync β€” Success with Warning](success-with-warning.md) +- [Sync β€” Failure](failure.md) +- [Sync β€” Aborted](aborted.md) +- [Sync β€” Running](running.md) +- [Sync β€” Queued](queued.md) +- [Lifecycle](../../deep-dive/lifecycle.md) +- [Available Actions](../../deep-dive/actions.md) diff --git a/docs/operations/runs/deep-dive/actions.md b/docs/operations/runs/deep-dive/actions.md new file mode 100644 index 0000000000..adaab6163e --- /dev/null +++ b/docs/operations/runs/deep-dive/actions.md @@ -0,0 +1,21 @@ +# Available Actions + +Each Run exposes a set of action buttons that depend on (1) the Run's current state and (2) the type of operation. All actions are gated by the **Editor** team permission on the target datastore. See [Permissions](permissions.md) for the full matrix. + +## Support matrix + +| Icon | Action | When it is shown | Supported by | What it does | +| :--: | :---- | :---- | :---- | :---- | +| :material-stop-circle-outline:{ .lg style="color: var(--q-orange);" } | **Abort** | While the Run is **Queued** or **Running** | All operation types | Signals the worker to stop the Run as soon as possible. Partial results captured up to the stop point are preserved. The Run lands on **Aborted**. **Aborted By** shows the user, or **System** when the platform stopped the Run automatically. | +| :material-skip-next-circle-outline:{ .lg style="color: var(--q-normal);" } | **Resume** | After the Run ends in **Aborted** or **Failure** | Profile, Scan, Promote (not Sync, External Scan, Export, Materialize) | Continues the same Run from where it stopped. Only containers (or partitions) that had not finished are processed; containers already completed in the original Run are not re-read. | +| :material-backup-restore:{ .lg style="color: var(--q-normal);" } | **Rerun** | After the Run ends in any terminal state | Sync, Profile, Scan, Export, Materialize, Promote (not External Scan) | Starts a brand-new Run with the same configuration as the original. Every container is processed from scratch, regardless of whether it succeeded before. | +| :material-trash-can-outline:{ .lg style="color: var(--q-negative);" } | **Delete** | After the Run reaches a terminal state | All operation types | Removes the Run record and its summary metrics from the Activity list. Anomalies, computed data, and other downstream artifacts produced by the Run are **not** removed; only the Run's own record is. | + +## When to use each action + +- **Abort** is the only way to stop an active Run. There is no destructive side effect: anomalies, container metadata, and partial counters already written are kept. +- **Resume** avoids reprocessing data the system has already validated. Use it when you want to pick up exactly where the original Run stopped. The button's tooltip names the unit of work it will pick up. +- **Rerun** is the right choice when upstream data has changed, when the configuration needs to be re-validated end to end, or when Resume is not available for the operation type. +- Resume is unsupported for operations that cannot pick up from the last completed unit of work (Sync, Export, Materialize). External Scan supports neither Resume nor Rerun because it only records results from outside Qualytics; there is nothing to continue or re-execute from the platform. + +For the team-permission requirements that gate each of these actions, see [Permissions](permissions.md). diff --git a/docs/operations/runs/deep-dive/introduction.md b/docs/operations/runs/deep-dive/introduction.md new file mode 100644 index 0000000000..f2cf7ba980 --- /dev/null +++ b/docs/operations/runs/deep-dive/introduction.md @@ -0,0 +1,50 @@ +# :material-information-outline:{ .middle style="color: var(--q-brick)" } Introduction + +A **Run** is a single execution of an operation against a source datastore. Every operation that fires (manually, on a schedule, or via the API) is recorded as a Run on the **Activity** page. The Run captures who triggered the work, when it started and finished, what configuration it used, what it produced, and what its current status is. + +For per-operation walkthroughs with screenshots and per-block tables, see the [By Operation Type](../getting-started.md#by-operation-type) section on the Runs overview. + +## What a Run is for + +A Run is the **unit of execution and audit** in Qualytics. It serves three purposes: + +1. **Execution.** When a user (or a schedule, or an API caller) launches an operation, a Run is created to track that specific execution. The Run holds the configuration the operation will use and the worker assignment. +2. **Observation.** While the operation is in progress, the Run exposes live counters (containers processed, anomalies identified, records scanned) and a progress bar. Once finished, it stores the summary metrics, the result payload, and a downloadable log. +3. **Audit and replay.** Every Run is permanent until explicitly deleted. The Activity tab shows the full history, and any Run can be re-launched via **Rerun** (a fresh execution with the same configuration) or **Resume** (continue from where the original stopped, when applicable). + +A Run never mutates after it ends. To "redo" a Run with the same settings, the platform creates a new Run; the original record is preserved as part of the audit trail. + +## Anatomy of a Run state + +Every state a Run can be in exposes the same set of visual and functional components. Understanding these components first makes the rest of the Deep Dive easier to read. + +| Component | What it carries | Varies by state? | +| :---- | :---- | :---- | +| **Status badge** | A colored pill with the state label (`Queued`, `Running`, `Success`, `Failure`, `Aborted`) on the Run row and the operation detail page. | Yes. Color and label change per state. | +| **State icon** | An MDI glyph that visually anchors the state. | Yes. One icon per state. | +| **Progress bar** | Containers processed against total requested (for example, `4 / 14 Tables`). | Yes. Hidden when Queued; live while Running; frozen on terminal states. | +| **Action buttons** | The set of operations the user can perform on this Run: **Abort**, **Resume**, **Rerun**, **Delete**. | Yes. Driven by state and operation type. See [Available Actions](actions.md). | +| **Warning indicator** | A small :material-alert-circle:{ style="color: var(--q-warning);" } icon next to the Run ID when the worker produced log entries during the Run. | Only on Success runs that produced logs (the badge stays Success). | +| **Summary metrics** | Counters specific to the operation: anomalies identified, records scanned, checks added, containers profiled, etc. | Live while Running; frozen on terminal states. Partial when Aborted/Failure. | +| **Triggered By / Aborted By** | The user, schedule, or system that initiated (or stopped) the Run. | The label changes to **Aborted By** on Aborted runs. | +| **Logs** | An inline block in the row summary plus a wider block on the operation detail page. | Active states stream lines live; terminal states freeze the log. | + +## States at a glance + +The five **canonical states** plus the Warning indicator are described below in brief. For the detailed walkthrough of each, see the [By Operation Type](../getting-started.md#by-operation-type) section on the Runs overview. + +| Icon | State | Typical action available | What it means | +| :--: | :---- | :---- | :---- | +| :material-loading:{ .lg style="color: var(--q-info);" } | Queued | Abort | The Run is registered but no worker has picked it up yet. | +| :material-loading:{ .lg style="color: var(--q-info);" } | Running | Abort | A worker is actively processing the Run; counters update live. | +| :material-check-circle:{ .lg style="color: var(--q-positive);" } | Success | Rerun, Delete | The Run finished cleanly. Summary metrics and result payload are authoritative. | +| :material-alert-circle:{ .lg style="color: var(--q-warning);" } | Success + Warning indicator | Rerun, Delete | A Success Run that recorded log entries during execution. Status stays **Success**; the warning icon only appears next to the Run ID to flag that the log is worth opening. | +| :material-close-circle:{ .lg style="color: var(--q-negative);" } | Failure | Resume (when supported), Rerun, Delete | The Run stopped because of an unrecoverable error. Partial results are preserved; the operation log carries the root cause. | +| :material-alert-circle:{ .lg style="color: var(--q-warning);" } | Aborted | Resume (when supported), Rerun, Delete | A user (or the system) stopped the Run before completion. Partial results are preserved. | + +The Run row badge is text-only. The glyphs above appear on the operation detail page as the timeline step icon; on the Activity list, the only icon next to a Run ID is the :material-alert-circle:{ style="color: var(--q-warning);" } warning indicator (when a Success Run produced logs). **Queued** and **Running** are shown in the UI as an animated spinner; the static loader icon in the table above represents both states. + +**Next:** + +- [Lifecycle](lifecycle.md): how Runs transition between states. +- [Available Actions](actions.md): when each action button appears. diff --git a/docs/operations/runs/deep-dive/lifecycle.md b/docs/operations/runs/deep-dive/lifecycle.md new file mode 100644 index 0000000000..1cfaf6aba1 --- /dev/null +++ b/docs/operations/runs/deep-dive/lifecycle.md @@ -0,0 +1,71 @@ +# :material-state-machine:{ .middle style="color: var(--q-brick)" } Lifecycle + +Every Run progresses through a small set of states, from the moment it is queued until it reaches a terminal outcome. This page covers the state diagram, how Promote differs from the other operations, how per-container progress relates to the Run-level state, and where logs live. + +## State diagram + +Every Run follows the same general lifecycle: an active phase (`Queued`, `Running`) followed by a terminal outcome (`Success`, `Failure`, or `Aborted`). **Promote** is the exception. It transitions directly to `Running`, skipping `Queued`. + +=== "Standard operations" + + Applies to Sync, Profile, Scan, External Scan, Export, and Materialize. + + ```mermaid + flowchart LR + Queued -->|worker picks up the Run| Running + Queued -.->|user clicks Abort| Aborted + Running -->|all containers processed| Success + Running -.->|unrecoverable error| Failure + Running -.->|user clicks Abort| Aborted + ``` + +=== "Promote" + + Promote skips `Queued` and starts directly in `Running`. + + ```mermaid + flowchart LR + Running -->|all entities processed| Success + Running -.->|unrecoverable error| Failure + Running -.->|user clicks Abort| Aborted + ``` + +Solid arrows: automatic transitions. Dotted arrows: user-initiated or error. + +!!! note "Warning indicator" + The Warning indicator is not a distinct status. A Run that ends in **Success** may show a :material-alert-circle:{ style="color: var(--q-warning);" } icon next to its ID when the worker recorded log entries during execution. The underlying status remains Success, and the Run is treated as a terminal Success everywhere downstream. See the [States at a glance](introduction.md#states-at-a-glance) table. + +A Run cannot move out of a terminal outcome, but it can be re-launched with **Resume** or **Rerun** (see [Available Actions](actions.md)). Each of those creates a brand-new Run rather than reopening the original. + +## Active vs terminal states + +| Phase | States | Action that drives the transition | +| :---- | :---- | :---- | +| Active | Queued, Running | Worker progress, unrecoverable error, or user **Abort** | +| Terminal | Success, Failure, Aborted | User clicks **Resume** or **Rerun** (creates a new Run) | + +## Per-container progress vs Run-level state + +Some operations (notably **Profile** and **Scan**) track progress at two layers: + +1. **Run-level state.** The badge on the Activity row and the terminal outcome for the whole operation. +2. **Container-level state.** The per-container result inside the expanded Run, with its own Success/Failure badge. + +These layers are independent. A container can land on **Failure** while the overall Run still finishes on **Success**. The failed container is recorded in the per-container Results tab, and the Run-level Success carries the Warning indicator next to the Run ID. The container-level badge uses the same component and palette as the Run-level badge. + +For **Sync**, **External Scan**, **Export**, **Materialize**, and **Promote**, all the work folds into a single Run-level state. There is no separate per-container badge. + +## Where logs live + +Two log scopes are exposed for every Run: + +- **Run-level log.** Surfaced in the expanded Run's **Logs** block and on the operation detail page. Captures setup, scheduling, worker-level events, and any unrecoverable error. +- **Container-level log.** Surfaced alongside each container row inside the expanded Run, when the operation has a per-container progress layer (Profile, Scan). Captures the per-container query and any error that prevented the container from finishing. + +Logs stream incrementally while a Run is in `Queued` or `Running`. Once the Run reaches a terminal state, the log is frozen. + +## See also + +- [Introduction](introduction.md). The purpose of Runs and the building blocks every state exposes. +- [Available Actions](actions.md). Abort, Resume, Rerun, and Delete: when each one is shown and what it does. +- For per-operation walkthroughs with screenshots and per-block tables, see the [By Operation Type](../getting-started.md#by-operation-type) section on the Runs overview. diff --git a/docs/operations/runs/deep-dive/permissions.md b/docs/operations/runs/deep-dive/permissions.md new file mode 100644 index 0000000000..5bcc11d5fe --- /dev/null +++ b/docs/operations/runs/deep-dive/permissions.md @@ -0,0 +1,35 @@ +# :material-shield-lock-outline:{ .middle style="color: var(--q-brick)" } Permissions + +Actions on a Run (Abort, Resume, Rerun, Delete) are gated by the **Editor** team permission on the target datastore. Without the right combination of a user role and a team permission on the datastore, the action buttons either do not appear in the expanded Run row or return a permission error when invoked. + +!!! note "Two permission systems" + Qualytics has two separate role systems, and a user needs the right combination of both to act on a Run: + + - **User Roles** (Member, Manager, Admin) control what a user can do across the entire workspace. + - **Team Permissions** (Reporter, Viewer, Drafter, Author, Editor) control what a user can do on specific datastores accessed through team membership. + +## Team permission matrix + +The same team-permission matrix applies to every action on a Run. Whether the action is supported by a specific operation is a separate question (see [Action support by operation type](#action-support-by-operation-type) below). + +| Team permission | Can Abort | Can Resume | Can Rerun | Can Delete | +|---|:---:|:---:|:---:|:---:| +| Viewer | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Reporter | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Drafter | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Author | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Editor | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | + +The Editor team permission is granted on a per-datastore basis through team memberships. For a full reference of team permissions and how they are assigned, see [Team Permissions Overview](../../../settings/security/teams/team-permissions/overview.md){:target="_blank"}. + +!!! info "Admin bypass" + Users with the **Admin** user role bypass all team-permission checks and can act on any Run, regardless of team membership. + +## Action support by operation type + +Permission alone is not enough. The action must also be supported by the operation type. See [Available Actions](actions.md#support-matrix) for the full support matrix. + +For per-operation permission details (configuring operation settings, scheduling recurring runs, and so on), see the operation's own Permissions page: + +- [Scan Permissions](../../scan/deep-dive/permissions.md) +- [Promote Permissions](../../promote/deep-dive/permissions.md) diff --git a/docs/operations/runs/faq.md b/docs/operations/runs/faq.md new file mode 100644 index 0000000000..20c99fc8a4 --- /dev/null +++ b/docs/operations/runs/faq.md @@ -0,0 +1,38 @@ +# :material-help-circle-outline:{ .middle style="color: var(--q-brick)" } FAQ + +## Why is the **Resume** button missing on my Sync, Export, or Materialize Run? + +By design. Sync, Export, and Materialize cannot pick up from where they stopped: each one operates on units of work that have to be re-applied in full to remain consistent. Aborted or failed Runs of these operations expose **Rerun** instead, which starts the operation from scratch. Profile, Scan, and Promote do support **Resume** because they track per-container progress and can continue processing only what was left. + +## What happens to partial results when a Run is **Aborted**? + +Partial results are preserved. Anomalies already written, containers already profiled, computed data already materialized, and any other work the Run completed before the abort remain in the system. The Run lands on **Aborted** with summary metrics reflecting only what was finished. **Resume** (where supported) continues with the unprocessed containers; **Rerun** starts the operation over from scratch. + +## What is the "Warning: Completed with logs" indicator next to a Run ID? + +It is not a separate status. The badge stays **Success** (green) and the Run is treated as a terminal Success everywhere it counts (Auto-Resolve, summary metrics, downstream consumers). The :material-alert-circle:{ style="color: var(--q-warning);" } icon next to the Run ID only signals that the worker recorded log entries during the Run, usually because individual containers raised warnings or partial issues. Open the operation log from the expanded Run to review the entries before relying on the result. + +## Can I delete a Run while it is **Running** or **Queued**? + +No. The **Delete** button is only available after the Run reaches a terminal state (**Success**, **Failure**, or **Aborted**). To stop an in-progress Run, use **Abort**; once it lands on **Aborted**, **Delete** becomes available. + +## Does deleting a Run also remove the anomalies it created? + +No. **Delete** only removes the Run's own record from the Activity list. Anomalies, computed data, materialized rows, and any other downstream artifacts produced by the Run are kept and remain accessible from their respective pages. + +## Why can External Scan only be **Aborted**? + +External Scan records the result of a scan executed outside Qualytics (for example, from a CLI step in your own pipeline). Because the actual work happened outside the platform, there is nothing to **Resume** or **Rerun** from Qualytics's side. You can still **Abort** an External Scan that has not finished registering, and **Delete** it once it lands on a terminal state. + +## What permission do I need to take actions on a Run? + +The **Editor** team permission on the target datastore lets a user Abort, Resume, Rerun, and Delete any Run against that datastore. Viewer, Reporter, and Drafter cannot act on Runs. **Admin** user role bypasses team-permission checks. + +See also: + +- [Runs Permissions](deep-dive/permissions.md): full team-permission matrix. +- Operation-specific Permissions pages: [Scan Permissions](../scan/deep-dive/permissions.md) and [Promote Permissions](../promote/deep-dive/permissions.md), for permissions that go beyond Run actions (configuring the operation, scheduling). + +## Where do I see logs for a failed Run? + +The operation log is reachable from the expanded Run row. Run-level errors (setup, scheduling, worker-level failures) live in the Run-level log; per-container errors live alongside each container row inside the expanded Run. diff --git a/docs/operations/runs/getting-started.md b/docs/operations/runs/getting-started.md new file mode 100644 index 0000000000..068e90db2e --- /dev/null +++ b/docs/operations/runs/getting-started.md @@ -0,0 +1,95 @@ +# :material-play-circle:{ .middle style="color: var(--q-brick)" } Runs + +A **Run** is a single execution of an operation against a source datastore. Every operation that fires (manually, on a schedule, or via the API) is recorded as a Run with its own status, timing, summary metrics, and downloadable result. + +![Activity tab listing recent Runs](../../assets/operations/runs/getting-started/activity-tab-runs.png) + +## Deep Dive + +
+ +- :material-information-outline:{ .lg .middle } **Introduction** + + --- + + What Runs are for, the parts a state exposes (badge, icon, action buttons, summary, logs), and the five states at a glance. + + [:octicons-arrow-right-24: Introduction](deep-dive/introduction.md) + +- :material-state-machine:{ .lg .middle } **Lifecycle** + + --- + + State diagram of how a Run moves between Queued, Running, Success, Failure, and Aborted, plus where to find the per-container progress and logs. + + [:octicons-arrow-right-24: Lifecycle](deep-dive/lifecycle.md) + +- **Available Actions** + + --- + + Abort, Resume, Rerun, and Delete: when each one is shown, what it does, and which operation types support it. + + [:octicons-arrow-right-24: Available Actions](deep-dive/actions.md) + +- :material-shield-lock-outline:{ .lg .middle } **Permissions** + + --- + + Team-permission matrix that governs which roles can Abort, Resume, Rerun, and Delete a Run. + + [:octicons-arrow-right-24: Permissions](deep-dive/permissions.md) + +
+ +## By Operation Type + +For end-to-end walkthroughs of what a Run looks like in each state on a specific operation type (with screenshots, per-block tables, and operation-specific quirks), see the dedicated page per state: + +
+ + + +- :material-database-search-outline:{ .lg .middle } **Scan** + + --- + + Walkthrough of what a Scan Run looks like in each state. + + [:octicons-arrow-right-24: Scan Runs](by-types/scan/success.md) + +
+ +## API & FAQ + +
+ +- :material-api:{ .lg .middle } **API** + + --- + + Endpoint reference for listing, fetching, and managing Runs (Abort, Resume, Rerun, Delete). + + [:octicons-arrow-right-24: API](api.md) + +- :material-help-circle-outline:{ .lg .middle } **FAQ** + + --- + + Common questions about Runs and their behavior. + + [:octicons-arrow-right-24: FAQ](faq.md) + +
diff --git a/docs/operations/scan/api.md b/docs/operations/scan/api.md new file mode 100644 index 0000000000..c7ba74c319 --- /dev/null +++ b/docs/operations/scan/api.md @@ -0,0 +1,189 @@ +# :material-api:{ .middle style="color: var(--q-brick)" } Scan API + +This page provides payload examples for running, scheduling, and checking the status of scan operations. Replace the placeholder values with data specific to your setup. + +All endpoints use the base URL of your Qualytics deployment (e.g., `https://your-instance.qualytics.io/api`). + +!!! tip "Interactive API reference" + For the full, interactive API reference (request schemas, response examples, and an in-browser request runner), visit [demo.qualytics.io/api/docs](https://demo.qualytics.io/api/docs){:target="_blank"}. + +## Running a Scan operation + +To run a scan operation, use the API payload example below and replace the placeholder values with your specific values. + +### Endpoint (Post) + +**Endpoint**: `POST /api/operations/run` + +=== "Option I: Running a Scan operation of all containers" + * **container_names:** `[]` means that it will scan all containers. + * **max_records_analyzed_per_partition:** `null` means that it will scan all records of all containers. + * **Remediation:** `append` replicates source containers using an append-first strategy. + * **auto_resolve_passed_anomalies:** `true` (default for Full scans) automatically resolves previously open anomalies when the same checks no longer detect the issue. Forced to `false` server-side when `incremental` is `true`. + + ```json + { + "type":"scan", + "name":null, + "datastore_id": 42, + "container_names":[], + "remediation":"append", + "incremental":false, + "auto_resolve_passed_anomalies":true, + "max_records_analyzed_per_partition":null, + "enrichment_source_record_limit":10 + } + ``` + +=== "Option II: Running a Scan operation of specific containers" + * **container_names:** `["table_name_1", "table_name_2"]` means that it will scan only the tables table_name_1 and table_name_2. + * **max_records_analyzed_per_partition:** `1000000` means that it will scan a maximum of 1 million records per partition. + * **Remediation:** `overwrite` replicates source containers using an overwrite strategy. + + ```json + { + "type":"scan", + "name":null, + "datastore_id": 42, + "container_names":[ + "table_name_1", + "table_name_2" + ], + "max_records_analyzed_per_partition":1000000, + "enrichment_source_record_limit":10 + } + ``` + +## Scheduling a Scan operation of all containers + +To schedule a scan operation, use the API payload example below and replace the placeholder values with your specific values. + +### Endpoint (Post) + +**Endpoint**: `POST /api/operations/schedule` + +This payload is to run a scheduled scan operation every day at 00:00 + +=== "Scheduling a Scan operation of all containers" + + ```json + { + "type":"scan", + "name":"My scheduled Scan operation", + "datastore_id":"datastore-id", + "container_names":[], + "remediation": "overwrite", + "incremental": false, + "auto_resolve_passed_anomalies": true, + "max_records_analyzed_per_partition":null, + "enrichment_source_record_limit":10, + "crontab":"0 0 * * *" + } + ``` + +## Retrieving Scan operation information + +### Endpoint (Get) + +**Endpoint**: `GET /api/operations/{id}` + +The `status` object includes `auto_resolved_anomaly_count`, the number of previously open anomalies this scan automatically resolved (always `0` for Incremental scans and for Full scans that ran with `auto_resolve_passed_anomalies` set to `false`). + +=== "Example result response" + ```json + { + "items": [ + { + "id": 12345, + "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "type": "scan", + "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "result": "success", + "message": null, + "triggered_by": "user@example.com", + "datastore": { + "id": 101, + "name": "Datastore-Sample", + "store_type": "jdbc", + "type": "db_type", + "enrich_only": false, + "enrich_container_prefix": "data_prefix", + "favorite": false + }, + "schedule": null, + "incremental": false, + "auto_resolve_passed_anomalies": true, + "remediation": "none", + "max_records_analyzed_per_partition": -1, + "greater_than_time": null, + "greater_than_batch": null, + "high_count_rollup_threshold": 10, + "enrichment_source_record_limit": 10, + "status": { + "total_containers": 2, + "containers_analyzed": 2, + "partitions_scanned": 2, + "records_processed": 28, + "anomalies_identified": 2, + "auto_resolved_anomaly_count": 1 + }, + "containers": [ + { + "id": 234, + "name": "Container1", + "container_type": "table", + "table_type": "table" + }, + { + "id": 235, + "name": "Container2", + "container_type": "table", + "table_type": "table" + } + ], + "container_scans": [ + { + "id": 456, + "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "container": { + "id": 235, + "name": "Container2", + "container_type": "table", + "table_type": "table" + }, + "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "records_processed": 8, + "anomaly_count": 1, + "auto_resolved_anomaly_count": 1, + "result": "success", + "message": null + }, + { + "id": 457, + "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "container": { + "id": 234, + "name": "Container1", + "container_type": "table", + "table_type": "table" + }, + "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", + "records_processed": 20, + "anomaly_count": 1, + "auto_resolved_anomaly_count": 0, + "result": "success", + "message": null + } + ], + "tags": [] + } + ], + "total": 1, + "page": 1, + "size": 50, + "pages": 1 + } + ``` diff --git a/docs/operations/scan/deep-dive/permissions.md b/docs/operations/scan/deep-dive/permissions.md new file mode 100644 index 0000000000..a9f1ee9b5b --- /dev/null +++ b/docs/operations/scan/deep-dive/permissions.md @@ -0,0 +1,22 @@ +# :material-database-search:{ .middle style="color: var(--q-brick)" } Permissions + +The Scan Operation is gated by the **Editor** team permission on the target datastore. The same permission level is required to run an ad-hoc scan, schedule a recurring scan, edit a schedule, or configure any of the Scan Settings (including Auto Resolve Anomalies). + +!!! note "Two Permission Systems" + Qualytics has two separate role systems. **User Roles** (Member, Manager, Admin) control what a user can do across the entire workspace. **Team Permissions** (Reporter, Viewer, Drafter, Author, Editor) control what a user can do on specific datastores they have access to through team membership. A user needs the right combination of both to run a scan. + +| Team permission | Can run a scan | Can schedule a scan | Can configure Scan Settings | +|---|:---:|:---:|:---:| +| Viewer | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Reporter | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Drafter | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | :material-close-circle-outline:{ .lg title="Not allowed" } | +| Editor | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | +| Author | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | :material-check-circle:{ .lg title="Allowed" } | + +Editor permission is granted on a per-datastore basis through team memberships. For a full reference of team permissions and how they are assigned, see [Team Permissions Overview](../../../settings/security/teams/team-permissions/overview.md){:target="_blank"}. + +!!! info "Admin Bypass" + Users with the **Admin** workspace role bypass all team-permission checks and can run, schedule, and configure scans on any datastore, regardless of team membership. + +!!! note "Dry Run is a separate gate" + Quality check Dry Runs are governed by the **Drafter** team permission and do not require Editor. diff --git a/docs/operations/scan/deep-dive/read-strategies.md b/docs/operations/scan/deep-dive/read-strategies.md new file mode 100644 index 0000000000..0f77caae0b --- /dev/null +++ b/docs/operations/scan/deep-dive/read-strategies.md @@ -0,0 +1,61 @@ +# :material-database-search:{ .middle style="color: var(--q-brick)" } Read Strategies + +A Scan operation reads data using one of two strategies: **Incremental** or **Full**. The choice affects how much data is processed, which records become candidates for anomaly detection, and whether previously open anomalies can be automatically resolved at the end of the scan. + +## Incremental + +Incremental scans process only the new or updated records since the previous scan operation. They rely on an [**incremental identifier**](../../../glossary.md#incremental) declared on the container (a timestamp column or a monotonically increasing batch column). Records whose identifier value is less than or equal to the highest value seen in the prior scan are skipped. + +!!! info "First incremental scan" + The first incremental scan against a container behaves like a Full scan, since there is no prior baseline. After it completes, Qualytics stores the highest identifier value and uses it as the [starting threshold](../faq.md#what-is-a-starting-threshold-and-when-do-i-need-one) for the next run. + +Incremental scans are designed for routine pipelines where only new or changed records need to be re-checked. They save compute and finish faster than a Full scan on the same dataset, but they cannot re-evaluate records they did not read, including records that previously caused open anomalies and that may have since been corrected upstream. + +!!! warning + If a selected container does not have an incremental identifier configured, Qualytics falls back to a Full read for that container even when the Read Strategy is set to Incremental. + +## Full + +Full scans process every record in each selected container, regardless of any prior scan. They are the only way to verify the entire dataset against every check in a single operation, and they are the only strategy under which Auto-Resolve is evaluated. + +Full scans are well-suited for periodic deep checks, for re-baselining a container after upstream changes, or for any scenario where Incremental cannot be used (no incremental identifier, file pattern without ordering guarantees, etc.). + +## Auto-Resolve on Full Scans + +When a Full scan completes successfully with **Auto Resolve Anomalies** enabled, Qualytics reconciles previously open anomalies against the work this scan actually did and resolves the ones that no longer apply. + +### When it runs + +Auto-Resolve runs exactly once per scan, at the end, and only when **all** of the following are true: + +- the operation is a Scan, not a Profile or Sync; +- the read strategy is **Full** (Incremental scans are excluded by design); +- the operation finished with `success` (Failure and Aborted operations do not trigger Auto-Resolve); +- the **Auto Resolve Anomalies** toggle was enabled when the scan started. + +### What gets resolved + +The candidates are anomalies that are currently in an [open status](../../../anomalies/status.md): **Active** or **Acknowledged**. Anomalies that are already in **Resolved**, **Invalid**, **Duplicate**, or **Discarded** are not re-evaluated and are not touched. + +For each candidate anomaly, Qualytics resolves it only when both conditions hold: + +1. **Every check that originally flagged the anomaly ran successfully in this scan.** If even one of those checks was not asserted (for example, the container was excluded from this run, or the check was archived), the anomaly is left as-is. +2. **None of those same checks raised the same issue against the same [fingerprint](../../../anomalies/fingerprints.md) in this scan.** If the new run produced a new anomaly for the same field combination, the previous anomaly stays open and the new one is recorded independently. + +When both conditions hold, the anomaly's status is set to **Resolved** and the scan that resolved it is recorded as part of the anomaly's history. + +### What the user sees + +The resolution is reflected on three surfaces: + +- **Operation summary:** the **Anomalies Auto-Resolved** count appears alongside the existing **Anomalies Identified** count once the scan finishes. See [Scan β€” Success](../../runs/by-types/scan/success.md). +- **Scan Results modal:** a dedicated **Auto-Resolved** tab lists the previously open anomalies this scan resolved. See the [Scan β€” Success](../../runs/by-types/scan/success.md) page (Results tab section). +- **Anomaly history:** each auto-resolved anomaly records an entry attributed to Qualytics, referencing the resolving scan. + +### Why Incremental scans never auto-resolve + +Auto-Resolve depends on the scan having read the records that originally flagged the anomaly. An Incremental scan, by definition, only reads records that arrived since the last scan, so it cannot confirm whether older records still violate a check. Allowing Auto-Resolve under Incremental would risk resolving anomalies on records that were never re-read in the current run. For this reason the toggle is hidden in the Scan Settings step when Incremental is selected, and any value sent through the API is forced off before the scan starts. + +### Permissions + +Auto-Resolve does not introduce a new permission. Any user with the **Editor** team permission on the target datastore (the same permission required to run or schedule a scan) can enable or disable the toggle. See [Permissions](permissions.md). diff --git a/docs/operations/scan/deep-dive/scan-settings.md b/docs/operations/scan/deep-dive/scan-settings.md new file mode 100644 index 0000000000..f380aeb13d --- /dev/null +++ b/docs/operations/scan/deep-dive/scan-settings.md @@ -0,0 +1,51 @@ +# :material-database-search:{ .middle style="color: var(--q-brick)" } Scan Settings + +This page is a conceptual reference for the settings exposed in the **Scan Settings** step of the scan form. For the step-by-step UI walkthrough, see [Scan Settings: Step 4 of 5](../how-tos/scan-settings.md). + +## Anomaly Options + +Anomaly options govern how anomalies are handled across consecutive scans of the same containers. They affect lifecycle decisions only, never the detection logic itself. + +### Archive Duplicate Anomalies + +When enabled, anomalies generated by the current scan that exactly match anomalies from a previous scan (same check, same [fingerprint](../../../anomalies/fingerprints.md)) are automatically archived with status **Duplicate**. This prevents the same finding from appearing as a new open anomaly every time the scan runs. The archived anomaly stays linked to the original so analysts can trace the duplicate chain. + +Use this option when you expect the same anomaly to recur across scans and you want to keep the open list focused on truly new findings. + +### Reactivate Recurring Anomalies + +When enabled, an anomaly produced by the current scan that matches an **archived** anomaly from a previous scan reactivates the original instead of creating a new one. A **Fingerprint** column is written to the [Enrichment Datastore](../../../enrichment/enrichment-tables.md#_failed_checks-table) to support this match. This is useful when the same underlying issue resurfaces after being resolved or archived, since it preserves the anomaly's existing history and tags. + +### Auto Resolve Anomalies + +When enabled, previously open anomalies ([Active or Acknowledged](../../../anomalies/status.md)) are automatically resolved at the end of a Full scan whose checks no longer detect the same issue. This option is hidden and does not apply to Incremental scans. It is enabled by default for Full scans. The resolving scan is attributed in each auto-resolved anomaly's history. + +For the eligibility rules and full behavior, see [Auto-Resolve on Full Scans](read-strategies.md#auto-resolve-on-full-scans). + +## Maximum Record Anomalies per Check + +The Maximum Record Anomalies per Check setting caps the number of record-level anomalies emitted per check. Once that cap is reached, additional findings are merged into a single rolled-up [shape anomaly](../../../anomalies/types.md#shape-anomaly) that preserves the total violation count. The setting does not limit detection: every violation is still found, but results above the threshold are presented as one consolidated shape anomaly instead of individual record anomalies. Use a higher limit when you need each anomaly individually for downstream remediation; use a lower limit when you only need to know that a check failed at scale. + +The default is **10**. The maximum is **1,000**: values above 1,000 are silently capped to 1,000. + +If the field is left blank in the scan form, the value is inherited from the datastore's [Enrichment Settings](../../../source-datastore/enrichment-datastore/how-tos/link-enrichment.md#enrichment-settings). + +## Maximum Source Examples per Anomaly + +This setting controls how many **source records** are stored in the Enrichment Datastore for each detected anomaly. The available limits are **10**, **100**, **1,000**, or **10,000** records. Source records are the rows that caused the anomaly and are the only records that can be viewed or downloaded later from the anomaly details. + +The limit must be set **before** the scan runs. Changing it afterward does not retroactively expand the captured records. If you need more records for an existing anomaly, raise the limit and re-run the scan. + +If the field is left blank in the scan form, the value is inherited from the datastore's Enrichment Settings. + +## Field Masking and Scanning + +Scan operations run normally on [masked fields](../../../fields/field-status/concepts/field-masking.md). Masking does not affect anomaly detection or quality check execution. The platform evaluates all checks using the actual source data. + +When scan results are displayed, masked field values are obfuscated in the following surfaces: + +- **Anomaly Source Records:** values are hidden by default; users with Editor permission can reveal them per anomaly. +- **Anomaly Descriptions:** check failure messages permanently show `` in place of actual values; the original value is not stored. +- **Enrichment Datastore:** source record values written during a [Materialize operation](../../materialize-operation/materialize-operation.md#field-masking-and-materialize) are obfuscated for masked fields. + +For more details, see [Masked Fields in Source Records](../../../anomalies/deep-dive/source-record.md#masked-fields-in-source-records). diff --git a/docs/operations/scan/faq.md b/docs/operations/scan/faq.md new file mode 100644 index 0000000000..a08652f399 --- /dev/null +++ b/docs/operations/scan/faq.md @@ -0,0 +1,146 @@ +# :material-help-circle-outline:{ .middle style="color: var(--q-brick)" } Scan FAQ + +Answers to common questions about the Scan Operation, grouped by topic. For step-by-step instructions, see the [how-tos](how-tos/select-tables.md); for conceptual references, see the [deep dive](deep-dive/read-strategies.md). + +## Read strategy + +### When should I choose Incremental over Full? + +Use **Incremental** for routine scans where only new or changed records need to be re-validated. It reads less data, finishes faster, and is the right choice for high-frequency post-load runs. Use **Full** when you need a complete re-validation across the entire dataset, when checks were edited and you want every record re-evaluated, or when you want Auto-Resolve to clear anomalies that no longer reproduce. See [Read Strategies](deep-dive/read-strategies.md) for the conceptual reference. + +### What happens if a selected container doesn't have an incremental identifier? + +The scan falls back to a Full read on that specific container, even when the Read Strategy is set to Incremental. Containers in the same scan that do have an incremental identifier still use Incremental; the fallback is per container, not per scan. + +### Does the first Incremental scan really run a Full scan? + +Yes. The first Incremental scan has no prior baseline, so it processes every record in the container and stores the highest incremental identifier value. From the second run on, only records with a higher identifier are processed. + +### What is a Starting Threshold and when do I need one? + +The Starting Threshold is an optional value (a timestamp for time-based incremental, a numeric value for batch-based) that tells the scan where to start reading. You only need it when you want to override the automatic baseline, for example after a backfill where you want to re-scan a specific range of history. + +## Auto Resolve Anomalies + +### Why didn't Auto-Resolve run on my Incremental scan? + +Auto-Resolve only runs after a successful **Full** scan. Incremental scans, by definition, do not read every record in the container, so they cannot confirm whether records that previously caused an open anomaly still violate the check. To prevent false resolutions, the **Auto Resolve Anomalies** toggle is hidden during configuration of Incremental scans, and any value sent through the API is forced off before the scan starts. + +To auto-resolve anomalies, run the scan with the **Full** read strategy. + +### Can I disable Auto Resolve Anomalies for a specific scan? + +Yes. The **Auto Resolve Anomalies** toggle in the **Scan Settings** step controls Auto-Resolve per scan run, and the same field is exposed on schedule create/update. Setting it to off keeps existing open anomalies untouched after the scan finishes. See [Scan Settings](how-tos/scan-settings.md). + +### Which anomalies are auto-resolved? + +Only anomalies currently in [**Active** or **Acknowledged**](../../anomalies/status.md) status are eligible. Anomalies already in **Resolved**, **Invalid**, **Duplicate**, or **Discarded** are left untouched. For an anomaly to be resolved, **all** the checks that originally flagged it must run successfully in this scan, and **none** of those checks may raise the same issue against the same fingerprint again. See [Auto-Resolve on Full Scans](deep-dive/read-strategies.md#auto-resolve-on-full-scans) for the full eligibility rules. + +### How is the resolving scan attributed to the anomaly? + +When a scan auto-resolves an anomaly, the resolution is recorded as a status change in the anomaly's history. The change is attributed to Qualytics with the scan operation referenced, so analysts can open the history of any auto-resolved anomaly and trace it back to the scan that produced the resolution. + +### What if a check that previously flagged an anomaly isn't included in the new scan? + +The anomaly is left as-is. Auto-Resolve requires every check that originally flagged the anomaly to run successfully in the current scan; if even one of those checks was skipped, archived, or excluded by the table or category selection, the anomaly is not resolved. This protects against accidental resolutions when the scope of a scan narrows. + +### Does Auto-Resolve change the anomaly's status to Resolved or to something else? + +Resolved. The auto-resolved anomaly's final status is **Resolved**, the same status used by manual resolution and Flow actions. Downstream Flows and reports that filter on Resolved status will include auto-resolved anomalies. + +## Anomaly handling + +### What's the difference between Archive Duplicate Anomalies and Reactivate Recurring Anomalies? + +Archive Duplicate sends a brand-new anomaly that matches an existing open one straight to archived status, keeping the open list focused on truly new findings. Reactivate Recurring does the opposite for archived ones: when a new anomaly matches an archived one, the original is reactivated (status moves back to Active) instead of staying archived. They cover opposite sides of the same de-duplication problem. + +### Why are anomaly counts higher than I expected after enabling Reactivate Recurring? + +Reactivate Recurring moves anomalies back from Archived to Active when the same issue resurfaces. If a check repeatedly flags the same fingerprint over time, the same anomaly will keep flipping back to Active rather than producing new entries. Check the anomaly's history to confirm whether it has been reactivated multiple times. + +### Does increasing Maximum Source Examples per Anomaly affect storage? + +Yes. Source examples are written to the [Enrichment Datastore](../../source-datastore/enrichment-datastore/getting-started.md) at scan time. A limit of 10,000 captures 1,000x more rows per anomaly than the default 10, and storage in the enrichment datastore grows proportionally. Use the higher limits only when you actually need many examples for debugging, then lower it back. + +### Can I change Maximum Source Examples after the scan finished? + +No. The limit applies at scan time, so once the scan finishes the captured source records are fixed. To capture more, raise the limit and re-run the scan. + +### How does Maximum Record Anomalies per Check work? + +When a single check produces more anomalies than the configured limit in one container, the additional anomalies are rolled up into a single anomaly that preserves the total count. This keeps the open anomaly list manageable while still recording the magnitude of the violation. + +## Scheduling + +### What timezone do scheduled scans use if I don't pick one? + +**UTC** is the default for every new and existing schedule. Existing schedules created before timezone support continue to run in UTC unless explicitly edited. + +### Can a deactivated schedule be reactivated without re-entering the cron expression? + +Yes. Deactivating a schedule keeps its cron expression on file, so reactivating it later resumes the same schedule without setup. Exception: schedules deactivated before May 7, 2026 may need the cron expression re-entered once after reactivation. + +### How does Daylight Saving Time affect scheduled scans? + +Schedules in DST-observant timezones (for example, `America/New_York`) automatically shift across DST transitions. A job set to 9:00 AM in `America/New_York` runs at 9:00 AM local time year-round, regardless of whether the zone is in EST or EDT. + +### Can I run multiple schedules on the same datastore? + +Yes. Each schedule is independent, with its own cron expression, timezone, and Scan Settings. Use this to run, for example, a fast Incremental scan every hour and a Full scan with Auto-Resolve once a day on the same datastore. + +## Permissions + +### Who can run, schedule, or configure a scan? + +Users with **Editor** team permission on the target datastore can run ad-hoc scans, create and edit schedules, and configure all Scan Settings (including Auto Resolve). Viewer, Reporter, and Drafter cannot. See [Permissions](deep-dive/permissions.md) for the full matrix. + +### Are masked field values revealed during a scan? + +Scans run normally against masked fields, and masking does not affect anomaly detection or check execution. However, when results are displayed, masked values are obfuscated in anomaly source records, anomaly descriptions, and the Enrichment Datastore. Users with Editor permission can reveal masked source-record values per anomaly. + +### Do I need Editor on every datastore, or just one? + +You need Editor on each datastore you want to scan. Permissions are evaluated per datastore, so a user can be Editor on one datastore and Viewer on another. + +## Results and history + +### Where do I see the operation summary after the scan finishes? + +In the [Activity tab](../../explore/activity.md) of the datastore. Each row shows the operation status, duration, and inline counters (including Anomalies Identified and Anomalies Auto-Resolved when applicable). Click the row to expand the full summary card. + +### Can I rerun a scan with the same settings as a previous run? + +Yes. Open the operation from the Activity tab and click **Rerun**. The new operation inherits the source containers, check categories, read strategy, and Scan Settings of the original. + +### Can I resume a scan that was aborted? + +Yes, for Scan and Profile operations. Open the aborted operation in Activity and click **Resume**, and the system continues from where it stopped instead of restarting. + +### What do the Identified and Auto-Resolved tabs in the Scan Results modal mean? + +The **Identified** tab lists anomalies that this scan detected (whether they ended up Active, Acknowledged, or were rolled up). The **Auto-Resolved** tab lists previously open anomalies that this scan automatically resolved because the same checks ran again and no longer found the issue. The Auto-Resolved tab is hidden when the scan ran as Incremental or when Auto Resolve was disabled. + +## API + +### How do I run a scan via the API? + +POST to `/api/operations/run` with `"type": "scan"` and the datastore ID. See [Running a Scan operation](api.md#running-a-scan-operation) for the full payload, including the optional `auto_resolve_passed_anomalies` field. + +### Can I pass runtime variables when running a scan via the API? + +Yes. Reference variables in your check definitions with `{{ variable_name }}` and pass values in the `check_variables` field of the scan payload. See [Use Runtime Variables](how-tos/use-runtime-variables.md). + +### How do I check the status of a running scan via the API? + +GET `/api/operations/{id}` to retrieve the current state. The response includes the `status` object with all counters (records processed, anomalies identified, anomalies auto-resolved) as the scan progresses. + +### Can I schedule a scan via the API? + +Yes. POST to `/api/operations/schedule` with the scan payload plus a `crontab` field. The same `auto_resolve_passed_anomalies` rules apply: it is forced to `false` when `incremental` is `true`. + +## Related + +- [Read Strategies](deep-dive/read-strategies.md): Incremental vs Full and Auto-Resolve behavior. +- [Scan Settings (deep dive)](deep-dive/scan-settings.md): conceptual reference for every scan setting. +- [Permissions](deep-dive/permissions.md): full team-permission matrix. +- [Troubleshooting](troubleshooting.md): resolution steps for known errors. diff --git a/docs/operations/scan/getting-started.md b/docs/operations/scan/getting-started.md new file mode 100644 index 0000000000..661b3c1066 --- /dev/null +++ b/docs/operations/scan/getting-started.md @@ -0,0 +1,148 @@ +# :material-database-search-outline:{ .middle style="color: var(--q-brick)" } Scan Operation + +The Scan Operation runs a datastore's data quality checks against its containers (tables, views, or file patterns) and writes every identified anomaly to the linked Enrichment Datastore. Defaults for source examples and record-anomaly limits come from the datastore's [Enrichment Settings](../../source-datastore/enrichment-datastore/how-tos/link-enrichment.md#enrichment-settings) and can be overridden in the scan form. + +!!! note + The Scan Operation can only run after the **Sync** and **Profile** operations have completed for the datastore. + +A scan identifies two kinds of anomaly: + +- **Record Anomalies:** A single record (row) flagged as anomalous, with details on why. The simplest example is a row missing an expected value for a field. + +- **Shape Anomalies:** Structural issues at the column or schema level, such as missing fields or inconsistent patterns across the dataset. + +Within the wizard you can: + +- Choose between an incremental load and a full load. +- Automatically resolve previously open anomalies that no longer flag on a Full scan. +- Limit the number of records scanned. +- Pick which tables or file patterns to include. +- Schedule the scan to run later. + +To open the Scan Operation modal, navigate to a source datastore from the side menu and click the **Run** button under **Scan** in the datastore's overview tab. The modal opens at Step 1 (Select Tables) and the stepper at the top shows the full configuration flow. + +![Scan Operation modal overview](../../assets/operations/scan/getting-started/getting-started-1.png) + +## Deep Dive + +
+ +- :material-database-search:{ .lg .middle } **Read Strategies** + + --- + + Incremental vs Full and Auto-Resolve behavior. + + [:octicons-arrow-right-24: Read Strategies](deep-dive/read-strategies.md) + +- :material-database-search:{ .lg .middle } **Scan Settings** + + --- + + Conceptual reference for every setting in the scan form. + + [:octicons-arrow-right-24: Scan Settings](deep-dive/scan-settings.md) + +- :material-shield-lock-outline:{ .lg .middle } **Permissions** + + --- + + Who can run, schedule, and configure scans. + + [:octicons-arrow-right-24: Permissions](deep-dive/permissions.md) + +
+ +## How-tos + +The numbered cards walk through each step of the wizard. The two unnumbered cards cover post-scan analysis and the API helper for runtime variables. + +
+ +- :material-numeric-1-circle:{ .lg .middle } **Select Tables** + + --- + + Choose the containers to scan: All, Specific, or by Tag. + + [:octicons-arrow-right-24: Select Tables](how-tos/select-tables.md) + +- :material-numeric-2-circle:{ .lg .middle } **Select Check Categories** + + --- + + Choose Metadata, Data Integrity, or both. + + [:octicons-arrow-right-24: Select Check Categories](how-tos/select-check-categories.md) + +- :material-numeric-3-circle:{ .lg .middle } **Read Settings** + + --- + + Pick Incremental or Full, set an optional starting threshold, and the record limit. + + [:octicons-arrow-right-24: Read Settings](how-tos/read-settings.md) + +- :material-numeric-4-circle:{ .lg .middle } **Scan Settings** + + --- + + Anomaly Options (including Auto Resolve), record-anomaly limits, and source examples. + + [:octicons-arrow-right-24: Scan Settings](how-tos/scan-settings.md) + +- :material-numeric-5-circle:{ .lg .middle } **Schedule Options** + + --- + + Set up a recurring run, or skip this step and use **Run Now**. + + [:octicons-arrow-right-24: Schedule Options](how-tos/schedule-options.md) + +- :material-chart-bar:{ .lg .middle } **Interpret Scan Results** + + --- + + Walk through the Activity row, the operation detail page, and what the Run looks like in each state. + + [:octicons-arrow-right-24: Scan Runs by state](../runs/by-types/scan/success.md) + +- :material-code-tags:{ .lg .middle } **Use Runtime Variables** + + --- + + Pass check variables at scan time via the API. + + [:octicons-arrow-right-24: Use Runtime Variables](how-tos/use-runtime-variables.md) + +
+ +## Reference + +
+ +- :material-tools:{ .lg .middle } **Troubleshooting** + + --- + + Resolution steps for known errors. + + [:octicons-arrow-right-24: Troubleshooting](troubleshooting.md) + +- :material-api:{ .lg .middle } **API** + + --- + + Payload examples for run, schedule, and retrieve. + + [:octicons-arrow-right-24: API](api.md) + +- :material-help-circle-outline:{ .lg .middle } **FAQ** + + --- + + Common questions. + + [:octicons-arrow-right-24: FAQ](faq.md) + +
diff --git a/docs/operations/scan/how-tos/read-settings.md b/docs/operations/scan/how-tos/read-settings.md new file mode 100644 index 0000000000..38b0411fa4 --- /dev/null +++ b/docs/operations/scan/how-tos/read-settings.md @@ -0,0 +1,145 @@ +# :material-numeric-3-circle:{ .middle style="color: var(--q-brick)" } Read Settings + +This is **Step 3 of 5** in the Scan Operation modal. You decide **how the scan reads records**: which records to include (Incremental or Full), an optional starting point inside that range (available only with Incremental), and a per-container cap on how many records to scan. + +!!! note "Step disabled when only Metadata is selected in Step 2" + If you unchecked **Data Integrity** in [Step 2](select-check-categories.md), every field on this page is disabled and a banner reads *"This step is skipped since Metadata checks don't require data to be loaded"*. Click **Next** to advance to [Step 4: Scan Settings](scan-settings.md) without filling anything in. + +## Read Strategy + +Pick one of two radio options. The default for a new scan is **Incremental**. + +![strategy](../../../assets/operations/scan/how-tos/read-settings/step-1-strategy.png) + +### Option 1: Incremental + +Selecting **Incremental** reads only the new or updated records since the last successful run on each container. On the very first Incremental run against a container, a Full scan is performed to establish the baseline; from the second run on, only records whose [incremental identifier](../../../glossary.md#incremental) value moved forward are processed. + +For JDBC datastores, the helper text also notes: *"Tables and views without a defined incremental key will also perform a full scan"*. The fallback is **per container**, decided at scan time; the rest of the scan continues with Incremental as configured. When some of the selected containers do not have an incremental identifier, an inline warning appears above this step listing how many will be scanned in full. + +For an in-depth look at how Incremental compares to Full and at the Auto-Resolve eligibility rules, see [Read Strategies](../deep-dive/read-strategies.md). + +#### Starting Threshold (optional) + +Use the **Starting Threshold** toggle when you want the Incremental scan to start from a specific point instead of relying on the automatically tracked baseline. Useful for backfills, for re-validating a corrected window, or for re-baselining after a schema change. **This toggle is only available when Incremental is selected**; switching to Full hides it and clears any values you had set. + +The toggle is **off by default**. When off, both threshold fields are cleared. When on, one or two inputs appear depending on the datastore type. + +![starting-threshold](../../../assets/operations/scan/how-tos/read-settings/step-2-starting-threshold.png) + +##### Option 1: Greater Than Time + +**Greater Than Time** applies to containers with a **time-based** incremental identifier (timestamp column). Enter the lower bound as a UTC timestamp; records whose incremental timestamp is strictly greater than this value are scanned and older records are skipped. + +This field is shown for every datastore type. + +##### Option 2: Greater Than Batch + +**Greater Than Batch** applies to containers with a **batch-based** incremental identifier (monotonically increasing integer). Enter the lower bound as a non-negative integer; records whose batch value is strictly greater than this value are scanned. + +This field is shown **only for JDBC datastores** (schema-based stores). DFS datastores do not expose it. + +!!! note "The UI does not pre-detect identifier type per container" + When both Time and Batch inputs are visible, the UI does not know which incremental strategy each individual container uses. You can set either or both: each value applies only to containers that actually use the matching identifier type. Containers whose identifier type does not match the value you set fall back to their automatic baseline. + +### Option 2: Full + +Selecting **Full** reads every record in each selected container, regardless of previous runs. This is the only strategy that makes [Auto-Resolve](../deep-dive/read-strategies.md#auto-resolve-on-full-scans) eligible to evaluate [previously open anomalies](../../../anomalies/status.md). + +Switching to **Full** hides the Starting Threshold section entirely and clears any threshold values you previously entered. Threshold values do not apply when reading every record, and direct API callers receive a validation error if they try to combine Full with `greater_than_time` or `greater_than_batch`; see [API](../api.md) for the supported combinations. + +### Record Limit (per container) + +The **Record Limit** input caps the number of records scanned **per container**, after the read strategy and any starting threshold have been applied. The default value is **All records** (no cap). + +There are two ways to set this value, both modifying the same field. The Record Limit input shows either the current preset label (`1M`, `10M`, `100M`, `All`) or the literal word `Custom` when you have a numeric value that does not match a preset. + +=== "Type a value" + + Type any integer between **1** and **1,000,000,000** directly into the input. Non-digit keys are blocked. When the value does not match a preset, the button label changes to **Custom**, and the helper text *"Value must be between 1 and 1,000,000,000"* appears underneath the input. + + ![record-limit-custom](../../../assets/operations/scan/how-tos/read-settings/step-3-record-limit-custom.png) + + Use this when you need a specific cap that does not match one of the four presets (for example, `5,000` to bound a partition during a re-run, or `25,000,000` for a custom sampling target). + +=== "Pick a preset" + + The button on the right of the input opens a menu with four ready-made values. + + **Step 1:** Click the button on the right of the Record Limit input. + + ![record-limit-button](../../../assets/operations/scan/how-tos/read-settings/step-4-record-limit-button.png) + + **Step 2:** The menu opens with four ready-made values. Click the one you want. + + ![record-limit-menu](../../../assets/operations/scan/how-tos/read-settings/step-5-record-limit-menu.png) + + The available presets are: + + - **1M** = 1,000,000 records per container + - **10M** = 10,000,000 records per container + - **100M** = 100,000,000 records per container + - **All** = no cap (every record in the container is scanned) + + **Step 3:** The input updates immediately and the button label changes to the selected preset (`1M`, `10M`, `100M`, or `All`). + + ![record-limit-selected](../../../assets/operations/scan/how-tos/read-settings/step-6-record-limit-selected.png) + +!!! note "It is a single maximum, not a range" + Record Limit is a single per-container cap, not a range. There is no minimum-value control or window. To restrict the lower bound of an incremental scan, use the [Starting Threshold](#starting-threshold-optional) inside Option 1 instead. + +## Continue to the next step + +Click **Next** to advance to [Step 4: Scan Settings](scan-settings.md). + +The **Back** button returns you to [Select Check Categories](select-check-categories.md). + +## Examples + +**Telecom carrier: daily incremental on call records**: A mobile carrier ingests `call_records` partitioned by `event_timestamp`. The table grows by 3M rows per day on top of a 5B-row historical base. They keep **Incremental** selected with Starting Threshold off and Record Limit on **All**: the nightly scan only re-validates yesterday's calls, avoiding a full re-read of five years of history every night. + +**Media streaming: weekly Full with sampling**: A streaming service has `ad_impressions` with 8B rows partitioned by hour. A full nightly scan is too expensive, so they schedule a weekly Full scan on Sundays with Record Limit set to **100M**. Each container in the catalog is sampled at 100 million rows, giving every check a chance to fire across the full table set without paying the cost of reading everything. + +**Energy utility: backfill after a metering correction**: A power utility discovers that its smart-meter ingestion produced wrong `consumption_kwh` values between `2026-03-12 00:00` and `2026-03-14 18:00` due to a timezone bug in the upstream collector. They keep **Incremental**, turn on Starting Threshold, and set **Greater Than Time** to `2026-03-11 23:59:59 UTC`. The scan re-evaluates only the affected three-day window instead of all of Q1. + +**Manufacturing: Full re-baseline after MES schema change**: A factory's Manufacturing Execution System renamed `assembly_line_id` to `production_line_id` across all OEE tables. After the migration, the data team switches to **Full** + Auto Resolve. The Full scan re-validates every record under the new schema, and Auto Resolve clears the now-irrelevant anomalies that referenced the old field name. Record Limit stays on **All** so the baseline is exhaustive. + +**Gaming platform: batch backfill with custom record limit**: A multiplayer game stores `match_events` in numbered ingestion batches. Batches 41,200 through 41,250 were reprocessed after a server bug was patched. The team picks **Incremental**, sets **Greater Than Batch** to `41199`, and types a custom Record Limit of `2,000,000` to cap each partition during the re-validation, so the re-run does not exhaust cluster memory. + +## Where to go next + +
+ +- :material-numeric-4-circle:{ .lg .middle } **Scan Settings** + + --- + + Anomaly Options (including Auto Resolve), record-anomaly limits, and source examples. + + [:octicons-arrow-right-24: Scan Settings](scan-settings.md) + +- :material-numeric-5-circle:{ .lg .middle } **Schedule Options** + + --- + + Set up a recurring run, or skip this step and use **Run Now**. + + [:octicons-arrow-right-24: Schedule Options](schedule-options.md) + +- :material-numeric-1-circle:{ .lg .middle } **Select Tables** + + --- + + Choose the containers to scan: All, Specific, or by Tag. + + [:octicons-arrow-right-24: Select Tables](select-tables.md) + +- :material-numeric-2-circle:{ .lg .middle } **Select Check Categories** + + --- + + Choose Metadata, Data Integrity, or both. + + [:octicons-arrow-right-24: Select Check Categories](select-check-categories.md) + +
diff --git a/docs/operations/scan/how-tos/scan-settings.md b/docs/operations/scan/how-tos/scan-settings.md new file mode 100644 index 0000000000..c1c7ccbec9 --- /dev/null +++ b/docs/operations/scan/how-tos/scan-settings.md @@ -0,0 +1,138 @@ +# :material-numeric-4-circle:{ .middle style="color: var(--q-brick)" } Scan Settings + +This is **Step 4 of 5** in the Scan Operation modal. You configure how anomalies are handled across scans and set the per-check limits that apply to the run. + +The entire Scan Settings panel renders in two visual variants based on the **Read Strategy** picked in [Step 3](read-settings.md). Switch the tab below to see the variant matching your scan. Picking a tab changes both the Anomaly Options screenshot and the Advanced Options screenshot together. + +## Anomaly Options + +=== "Full scan" + + With Full selected, the Anomaly Options block shows **three** options: + + ![anomaly-options-full](../../../assets/operations/scan/how-tos/scan-settings/step-1-anomaly-options-full.png) + + **1. Archive Duplicate Anomalies.** When a new anomaly is a duplicate of an [Active or Acknowledged](../../../anomalies/status.md) anomaly, the new anomaly's state is set to **Duplicate**. This keeps the open anomaly queue focused on truly new findings and prevents the same issue from being counted multiple times across scans. + + **2. Reactivate Recurring Anomalies.** When a new anomaly is a duplicate of an archived anomaly, the new anomaly's state is set to **Duplicate** and the original archived anomaly is reactivated to Active. This preserves history when the same issue resurfaces after being archived, and writes a [Fingerprint](../../../enrichment/enrichment-tables.md#_failed_checks-table) column to the Enrichment Datastore so subsequent runs can match. + + **3. Auto Resolve Anomalies.** When the Full scan completes successfully, previously open anomalies (Active or Acknowledged) are automatically resolved if the same checks run again and no longer detect the issue. The resolving scan is recorded in each auto-resolved anomaly's history. Enabled by default for Full scans. For the full eligibility rules, see [Auto-Resolve on Full Scans](../deep-dive/read-strategies.md#auto-resolve-on-full-scans). + + **Advanced Options** + + Toggle the **Advanced Options** switch below the Anomaly Options block to expand it. Three additional fields appear: Maximum Record Anomalies per Check, Maximum Source Examples per Anomaly, and Scan Variables. The Anomaly Options block above remains visible. + + ![advanced-options-full](../../../assets/operations/scan/how-tos/scan-settings/step-3-advanced-options-full.png) + + **1. Maximum Record Anomalies per Check.** Sets the maximum number of individual record anomalies a single check can emit before remaining violations are merged into a single rolled-up [shape anomaly](../../../anomalies/types.md#shape-anomaly) that preserves the total violation count. Useful for keeping the open anomaly queue manageable when a check is expected to produce many findings on a single run. + + This setting does **not** limit how many violations Qualytics finds. All violations are still detected; the setting only changes how results are **presented**: either as individual record anomalies, or as one rolled-up shape anomaly that preserves the count. + + The default is **10**. The maximum is **1,000**: values above 1,000 are silently capped to 1,000. + + **2. Maximum Source Examples per Anomaly.** Caps how many source records are stored in the Enrichment Datastore for each detected anomaly. The captured records are the only rows you can view or download later from the anomaly details. Available presets are 10, 100, 1,000, and 10,000. If you need more records, increase this value **before running the scan**; changes made afterward do not affect the captured set. + + !!! warning "Disabled when the datastore has no enrichment datastore" + Maximum Source Examples is greyed out (non-interactive) when the source datastore has **no enrichment datastore configured**. The modal also displays an info alert above the Anomaly Options block reading *"An enrichment datastore has not been configured for this source datastore"*. To enable the field, associate an enrichment datastore with the source datastore from its settings page, then reopen the Scan modal. Maximum Record Anomalies per Check and Scan Variables remain editable. + + **3. Scan Variables.** Override or extend the container's default variables for this scan. Useful for checks that reference variables in double curly braces (`{{ variable_name }}`). For the full syntax and casting rules, see [Use Runtime Variables](use-runtime-variables.md). + +=== "Incremental scan" + + With Incremental selected, **Auto Resolve Anomalies is hidden**. The Anomaly Options block shows only **two** options: + + ![anomaly-options-incremental](../../../assets/operations/scan/how-tos/scan-settings/step-2-anomaly-options-incremental.png) + + **1. Archive Duplicate Anomalies.** When a new anomaly is a duplicate of an Active or Acknowledged anomaly, the new anomaly's state is set to **Duplicate**. This keeps the open anomaly queue focused on truly new findings and prevents the same issue from being counted multiple times across scans. + + **2. Reactivate Recurring Anomalies.** When a new anomaly is a duplicate of an archived anomaly, the new anomaly's state is set to **Duplicate** and the original archived anomaly is reactivated to Active. This preserves history when the same issue resurfaces after being archived, and writes a [Fingerprint](../../../enrichment/enrichment-tables.md#_failed_checks-table) column to the Enrichment Datastore so subsequent runs can match. + + Auto Resolve is hidden because it only applies to Full scans. The platform needs to read every record to confirm a previously flagged anomaly no longer reproduces, which Incremental scans do not do by design. See [Auto-Resolve on Full Scans](../deep-dive/read-strategies.md#auto-resolve-on-full-scans). + + **Advanced Options** + + Toggle the **Advanced Options** switch below the Anomaly Options block to expand it. Three additional fields appear: Maximum Record Anomalies per Check, Maximum Source Examples per Anomaly, and Scan Variables. The Anomaly Options block above remains visible (without Auto Resolve). + + ![advanced-options-incremental](../../../assets/operations/scan/how-tos/scan-settings/step-4-advanced-options-incremental.png) + + **1. Maximum Record Anomalies per Check.** Sets the maximum number of individual record anomalies a single check can emit before remaining violations are merged into a single rolled-up [shape anomaly](../../../anomalies/types.md#shape-anomaly) that preserves the total violation count. Useful for keeping the open anomaly queue manageable when a check is expected to produce many findings on a single run. + + This setting does **not** limit how many violations Qualytics finds. All violations are still detected; the setting only changes how results are **presented**: either as individual record anomalies, or as one rolled-up shape anomaly that preserves the count. + + The default is **10**. The maximum is **1,000**: values above 1,000 are silently capped to 1,000. + + **2. Maximum Source Examples per Anomaly.** Caps how many source records are stored in the Enrichment Datastore for each detected anomaly. The captured records are the only rows you can view or download later from the anomaly details. Available presets are 10, 100, 1,000, and 10,000. If you need more records, increase this value **before running the scan**; changes made afterward do not affect the captured set. + + !!! warning "Disabled when the datastore has no enrichment datastore" + Maximum Source Examples is greyed out (non-interactive) when the source datastore has **no enrichment datastore configured**. The modal also displays an info alert above the Anomaly Options block reading *"An enrichment datastore has not been configured for this source datastore"*. To enable the field, associate an enrichment datastore with the source datastore from its settings page, then reopen the Scan modal. Maximum Record Anomalies per Check and Scan Variables remain editable. + + **3. Scan Variables.** Override or extend the container's default variables for this scan. Useful for checks that reference variables in double curly braces (`{{ variable_name }}`). For the full syntax and casting rules, see [Use Runtime Variables](use-runtime-variables.md). + +### Common configurations + +A quick reference for choosing values across both Advanced fields: + +| Use case | Maximum Record Anomalies per Check | Maximum Source Examples per Anomaly | +| --- | --- | --- | +| Demo or proof-of-value | 1 | 1,000 | +| Production (most customers) | 10 (default) | 1,000 to 10,000 | +| Full enrichment pipeline | 1 | Sized by downstream consumer | + +For a conceptual reference of every Scan Setting and the field-masking behavior, see [Scan Settings (deep dive)](../deep-dive/scan-settings.md). + +## Start the scan + +You can now either start the scan immediately or schedule it for a future run. + +- Click the **Run Now** button to perform the scan operation immediately. +- Click the **Schedule** button to configure a recurring run. See [Schedule Options](schedule-options.md). + +After the scan finishes, see [Scan β€” Success](../../runs/by-types/scan/success.md) (or the page matching the run's terminal state) to read the operation summary and walk through the Scan Results modal. + +## Examples + +**Government tax agency: default production setup**: A tax collection agency processes 40M filings per fiscal year across `returns_individual`, `returns_corporate`, and `withholdings`. Their nightly Full scan keeps all three Anomaly Options on (Archive Duplicate, Reactivate Recurring, Auto Resolve), with Maximum Source Examples at 10. The agency's review team works from a clean Active queue every morning because duplicates archive themselves and resolved anomalies clear automatically. + +**Real estate marketplace: higher source examples for debugging a new check**: A real estate platform rolls out a new check that `listings.price_per_sqm` should be between p5 and p95 of the city's distribution. To inspect the outliers during the first week, they raise Maximum Source Examples per Anomaly to **1,000** before running the scan. The data team uses the broader sample to refine the percentile bounds, then lowers the setting back to 10 once the check is stable. + +**University: Auto Resolve disabled for FERPA compliance**: A university maintains `student_grades` and `enrollments`, both subject to FERPA review. Every anomaly flagged on these tables must be acknowledged manually by the Registrar's office before its status changes. The data team turns **Auto Resolve Anomalies** off on the scan that targets these containers, so no anomaly is closed automatically. + +**Apparel retail chain: relying on the rollup default for known legacy issues**: A clothing retailer keeps a legacy `historical_skus` table (frozen but kept for reporting) that contains around 90K rows with a known formatting issue. Without the rollup, the nightly scan would flood the queue with 90K individual record anomalies. The team keeps Maximum Record Anomalies per Check at **10** (the default): the scan emits 10 representative record anomalies for the check and consolidates the remainder into a single rolled-up shape anomaly that preserves the full 90K violation count, keeping the open anomaly list manageable. + +## Where to go next + +
+ +- :material-numeric-5-circle:{ .lg .middle } **Schedule Options** + + --- + + Set up a recurring run, or skip this step and use **Run Now**. + + [:octicons-arrow-right-24: Schedule Options](schedule-options.md) + +- :material-numeric-1-circle:{ .lg .middle } **Select Tables** + + --- + + Choose the containers to scan: All, Specific, or by Tag. + + [:octicons-arrow-right-24: Select Tables](select-tables.md) + +- :material-numeric-2-circle:{ .lg .middle } **Select Check Categories** + + --- + + Choose Metadata, Data Integrity, or both. + + [:octicons-arrow-right-24: Select Check Categories](select-check-categories.md) + +- :material-numeric-3-circle:{ .lg .middle } **Read Settings** + + --- + + Pick Incremental or Full, set an optional starting threshold, and the record limit. + + [:octicons-arrow-right-24: Read Settings](read-settings.md) + +
diff --git a/docs/operations/scan/how-tos/schedule-options.md b/docs/operations/scan/how-tos/schedule-options.md new file mode 100644 index 0000000000..b6a7bd3375 --- /dev/null +++ b/docs/operations/scan/how-tos/schedule-options.md @@ -0,0 +1,160 @@ +# :material-numeric-5-circle:{ .middle style="color: var(--q-brick)" } Schedule Options + +This is **Step 5 of 5** in the Scan Operation modal. Use the Schedule option to run scans automatically on a recurring basis. Schedules are created from the Scan form using one of four preset frequencies (Hourly, Daily, Weekly, Monthly) or a custom cron expression. + +!!! info "Timezone-aware scheduling" + Schedules can run in any IANA timezone (for example, `America/New_York`, `Europe/Paris`, `Asia/Tokyo`), and Daylight Saving Time transitions are handled automatically. **UTC is the default** for new and existing schedules. The configured timezone is shown on the schedule card as an abbreviation, such as `Schedule (UTC)` by default or `Schedule (EST)` after selecting another timezone. + +!!! info "Deactivating a schedule keeps its cron expression" + When you deactivate a schedule, its cron expression is kept. Reactivating it later resumes the same schedule without setting it up again. + + If a schedule was deactivated before May 7, 2026 and doesn't run after you reactivate it, re-enter its cron expression once to restore it. Schedules deactivated on or after that date keep working normally. + +## Open the Schedule form + +Click the **Schedule** button at the bottom of the Scan Settings step (Step 4) to open the Schedule form. The form has three configurable fields, highlighted in the screenshot below. + +![form-fields](../../../assets/operations/scan/how-tos/schedule-options/step-1-form-fields.png) + +**1. Timezone.** Pick the IANA timezone for this schedule. UTC is selected by default. To run in a different timezone, type to search by city, region, or abbreviation, and pick from the list. The selected timezone applies to every frequency tab below; the banner above the tabs shows the current time in that timezone. + +**2. Frequency tabs.** Pick the cadence. The form has five tabs (Hourly, Daily, Weekly, Monthly, Advanced); switch the tab to configure the one you want. See the [tabs below](#configure-the-frequency). + +**3. Schedule Name.** Give the schedule a descriptive name to identify it later in the [Activity tab](../../../explore/activity.md). The name is required. + +Once all three fields are set, click the **Schedule** button at the bottom of the form to save the schedule. The new schedule appears in the Activity tab under the Schedule sub-section. + +## Configure the frequency + +Switch the tab below to see how each frequency option is configured. + +=== "Hourly" + + Schedule the scan to run every N hours at a specified minute. Define the frequency in hours and the exact minute within the hour the scan should start. + + ![hourly](../../../assets/operations/scan/how-tos/schedule-options/step-2-hourly.png) + + **Example:** Set to **Every 1 hour(s) on minute 0** so the scan fires every hour at the top of the hour (`01:00`, `02:00`, `03:00`, …). + + **When to use:** Near real-time monitoring during ingestion windows, backfills, or incident-response situations where data quality drift must be caught within an hour. + +=== "Daily" + + Schedule the scan to run once every N days at a specific time. You specify the number of days between scans and the exact time of day in the selected timezone. + + ![daily](../../../assets/operations/scan/how-tos/schedule-options/step-3-daily.png) + + **Example:** Set to **Every 1 day(s) at 00:00** with the timezone left at UTC so the scan runs every day at midnight UTC. + + **When to use:** Standard nightly comprehensive scans across the catalog. Pairs well with Full read strategy and Auto Resolve so anomalies that no longer reproduce are cleared automatically each night. + +=== "Weekly" + + Schedule the scan to run on specific days of the week at a set time. Select the days of the week and the exact time of day in the selected timezone. + + ![weekly](../../../assets/operations/scan/how-tos/schedule-options/step-4-weekly.png) + + **Example:** Configure to run on **Sunday** and **Friday** at `00:00` with the timezone set to UTC so the scan executes at midnight UTC on those two days. + + **When to use:** Deeper end-of-week or weekend scans that complement a lighter daily cadence, or business-specific schedules tied to weekly reporting cycles. + +=== "Monthly" + + Schedule the scan to run once a month on a specific day at a set time. You specify the day of the month and the time of day in the selected timezone. + + ![monthly](../../../assets/operations/scan/how-tos/schedule-options/step-5-monthly.png) + + **Example:** Set to **On the 1st day of every 1 month(s) at 00:00** with the timezone set to UTC so the scan runs on the first day of each month at midnight UTC. + + **When to use:** Archives or slowly changing reference data that only need periodic validation, such as monthly regulatory filings or quarterly reporting tables. + +=== "Advanced" + + The Advanced tab lets you set up more complex and custom schedules using **cron expressions**. Useful for defining specific times and intervals with precision that the four presets do not cover. + + ![advanced](../../../assets/operations/scan/how-tos/schedule-options/step-6-advanced.png) + + Cron expressions use five fields: + + - Minute (0 - 59) + - Hour (0 - 23) + - Day of the month (1 - 31) + - Month (1 - 12) + - Day of the week (0 - 6) (Sunday to Saturday) + + Each field can be defined using specific values, ranges, or special characters to create the desired schedule. + + **Example:** The expression `0 0 * * *` schedules the scan operation to run at midnight (`00:00`) every day: + + - `0` (Minute): the 0th minute + - `0` (Hour): the 0th hour (midnight) + - `*` (Day of the month): every day + - `*` (Month): every month + - `*` (Day of the week): every day of the week + + **Other examples:** + + - `0 12 * * 1-5` runs at 12:00 PM Monday to Friday. + - `30 14 1 * *` runs at 2:30 PM on the first day of every month. + - `0 22 * * 6` runs at 10:00 PM every Saturday. + + To define a custom schedule, enter the cron expression in the **Custom Cron Schedule** field. The field label shows the abbreviation for the currently selected timezone (for example, `Custom Cron Schedule (UTC)` or `Custom Cron Schedule (EST)`), and the cron fields are interpreted in that timezone. + +!!! note "Daylight Saving Time" + When you pick a timezone that observes DST (such as `America/New_York` or `Europe/London`), the schedule automatically shifts with each transition. A job set to run at 9:00 AM in `America/New_York` runs at 9:00 AM local time year-round, regardless of whether the zone is in EST or EDT at the time. No reconfiguration is required. + +!!! note "Notification on completion" + You will receive a notification when the scan operation is completed. + +!!! note "Auto Resolve Anomalies on scheduled scans" + Scheduled scans use the same **Auto Resolve Anomalies** behavior as ad-hoc scans. The setting is configured from **Scan Settings** (Step 4) and only applies when the schedule's read strategy is **Full**. See [Scan Settings](scan-settings.md) for the toggle and [Auto-Resolve on Full Scans](../deep-dive/read-strategies.md#auto-resolve-on-full-scans) for the behavior. + +## Examples + +**Pharmaceutical R&D: nightly UTC**: A pharmaceutical R&D platform runs clinical-trial scans every night at `00:00 UTC` across `subjects`, `lab_results`, and `adverse_events`. They pick **Daily** with timezone left at UTC. Researchers in the US, EU, and Singapore all see the same scan results synced to a common timestamp, with no per-region offset to reconcile. + +**Stock exchange: business-hours Mon-Fri**: A regional stock exchange wants `trade_executions` and `order_book_snapshots` validated before the analytics desk arrives. They pick **Advanced** and enter `0 9 * * 1-5` with timezone `America/New_York`. The scan fires at 9:00 AM local time every business day, and the schedule shifts automatically across DST transitions without re-configuration. + +**Cybersecurity SaaS: hourly during incident response**: During an active threat investigation, a cybersecurity provider needs near real-time validation of `detected_events` and `alerts_triaged`. They pick **Hourly** with frequency `Every 1 hour(s) on minute 0` and pair with Incremental in Step 3. The scan validates only events from the last hour, alerting the on-call SOC analyst within minutes of any quality drift in the threat feed. + +**Airline operations: twice daily before flight ops**: An airline scans `flight_manifests` and `crew_assignments` twice a day, before each shift starts. They pick **Advanced** with `0 6,18 * * *` and timezone `Europe/Amsterdam`, then set the Schedule Name to `flight-ops-am-pm` so the dispatchers can find it in the Activity tab. The 06:00 run catches issues for the morning rotation; the 18:00 run covers the evening rotation. + +**Regulatory body: monthly archive validation**: A financial regulator scans `regulatory_filings_archive` once a month on the 1st at `02:00` in their reporting timezone (`Europe/London`). The archive only changes at month-end when the previous month's filings are sealed, so a monthly cadence is sufficient and avoids the cost of daily scans against a table that does not change between cycles. + +## Where to go next + +
+ +- :material-numeric-1-circle:{ .lg .middle } **Select Tables** + + --- + + Choose the containers to scan: All, Specific, or by Tag. + + [:octicons-arrow-right-24: Select Tables](select-tables.md) + +- :material-numeric-2-circle:{ .lg .middle } **Select Check Categories** + + --- + + Choose Metadata, Data Integrity, or both. + + [:octicons-arrow-right-24: Select Check Categories](select-check-categories.md) + +- :material-numeric-3-circle:{ .lg .middle } **Read Settings** + + --- + + Pick Incremental or Full, set an optional starting threshold, and the record limit. + + [:octicons-arrow-right-24: Read Settings](read-settings.md) + +- :material-numeric-4-circle:{ .lg .middle } **Scan Settings** + + --- + + Anomaly Options (including Auto Resolve), record-anomaly limits, and source examples. + + [:octicons-arrow-right-24: Scan Settings](scan-settings.md) + +
diff --git a/docs/operations/scan/how-tos/select-check-categories.md b/docs/operations/scan/how-tos/select-check-categories.md new file mode 100644 index 0000000000..a92bad90fe --- /dev/null +++ b/docs/operations/scan/how-tos/select-check-categories.md @@ -0,0 +1,133 @@ +# :material-numeric-2-circle:{ .middle style="color: var(--q-brick)" } Select Check Categories + +This is **Step 2 of 5** in the Scan Operation modal. You decide which categories of quality checks the scan should evaluate against the containers selected in Step 1. The two categories cover different layers of data quality, and how you combine them changes which subsequent steps the modal asks you to fill in. + +## Open the step + +After clicking **Next** in Step 1, the modal shows the **Choose check categories** prompt with two checkboxes. Both are checked by default, so unless you uncheck one, the scan evaluates the entire check inventory of the selected containers. + +![categories](../../../assets/operations/scan/how-tos/select-check-categories/step-1-categories.png) + +## How the selection works + +- Both options are **independent checkboxes**. You can keep both, pick only one, or pick the other, but you cannot proceed with zero categories selected. +- The **Next button is disabled** when both boxes are cleared, and an inline alert reads *"At least one check category must be selected"*. +- The default state when the modal first opens is **both categories checked**. +- The label *table* in the helper text adapts to the container type of the selected datastore: it shows *file* for DFS datastores, *view* when the container is a view, and so on. The behavior described below is identical regardless of the container type. + +### Option 1: Metadata + +Selecting **Metadata** tells the scan to evaluate checks that describe **what the dataset itself should look like**, without reading the individual records. Two rule families are classified as Metadata: + +- **Volumetric** checks (row counts, partition counts, expected magnitudes). +- **Freshness** checks (staleness against an expected last-update window). + +These checks compare against catalog-level statistics and the most recent table metadata gathered by [Sync](../../sync/sync.md) and [Profile](../../profile/profile.md). They do not require the platform to read the underlying records. + +#### What changes in the modal when only Metadata is selected + +Selecting **only Metadata** (unchecking Data Integrity) does not remove Step 3 from the stepper. You still navigate through it, but with two changes: + +- **The Read Settings form is disabled.** When you reach Step 3, every field is greyed out and a banner reads *"This step is skipped since Metadata checks don't require data to be loaded"*. There is nothing to fill in, so you just click **Next** to advance. +- **The Read Settings fields are reset.** The read strategy, [starting threshold](../faq.md#what-is-a-starting-threshold-and-when-do-i-need-one), and per-table record limit are cleared so they cannot accidentally affect a Metadata-only scan. + +If you later re-enable Data Integrity (still on Step 2), the Step 3 form becomes editable again and you must configure it before proceeding. + +#### When to use Metadata only + +- **Fast post-load validation** when you only need to confirm row counts and freshness after an ETL job finished, without paying for a full data scan. +- **High-frequency monitoring** on very large tables, where reading the data hourly would be too expensive but catching a sudden drop in row count or a missed refresh is critical. +- **Cheapest possible regression check** for catalog-level expectations after a schema change. + +### Option 2: Data Integrity + +Selecting **Data Integrity** tells the scan to evaluate checks that describe **what the values inside the records should look like**. This category covers every rule type that is **not** Metadata, including: + +- Null and presence rules (Not Null, Any Not Null, Required Values). +- Comparison rules (Equal To, Greater Than, Less Than, Between, and their field-to-field variants). +- Range and length rules (Min Value, Max Value, Min Length, Max Length). +- Uniqueness rules (single-field and composite, including [Unique](../../../data-quality-checks/unique/introduction.md)). +- Pattern and content rules (Matches Pattern, Contains Email, Contains Credit Card, Contains SSN, Contains URL, Expected Values). +- Identity and shape rules (Is Type, Is Credit Card, Is Address, Is Replica Of, Distinct Count, Time Distribution Size). +- Cross-container rules (Exists In, Not Exists In, Data Diff, Entity Resolution). +- ML and statistical rules (Predicted By, Metric, Sum). +- Date-related rules (After Date, Before Date Time, Not Future). + +For the canonical list of rule types, see [Rule Types Overview](../../../data-quality-checks/rule-types-overview.md). + +#### What happens in the modal when Data Integrity is selected + +Whenever Data Integrity is checked (alone or with Metadata), Step 3 (Read Settings) is enabled. The scan needs to read records to evaluate per-row checks, so the read strategy, starting threshold, and record limit must be configured. + +#### When to use Data Integrity + +- **Standard quality validation** across data values, covering missing fields, out-of-range values, broken patterns, and so on. +- **Targeted re-runs** after editing per-field checks or after a backfill that may have introduced bad values. +- **Compliance scans** that need to verify PII detection rules (Contains Email, Contains SSN, and similar) against actual content. + +## Both categories together (default) + +Keeping both checkboxes checked is the most thorough configuration: every check authored or [AI Managed (inferred)](../../../data-quality-checks/ai-managed/getting-started.md) against the selected containers is evaluated, regardless of rule type. This is the default for new scans and is the right choice for nightly comprehensive runs. + +When both are selected, the modal flow is the full five steps: Read Settings (Step 3) is enabled and required. + +## AI Managed (inferred) checks follow the same filter + +The category selection is applied identically to authored checks and to AI Managed (inferred) checks. There is no separate toggle for inferred checks: they are evaluated when their rule type matches one of the selected categories, and skipped when it does not. A scan with only **Metadata** selected still runs inferred Volumetric and Freshness checks; a scan with only **Data Integrity** still runs inferred value-level checks. + +## Continue to the next step + +Click **Next** to advance to [Read Settings](read-settings.md). What you do on that page depends on the current category selection: + +- **Both categories** or **Data Integrity only**: configure the read strategy, the optional starting threshold, and the per-table record limit. +- **Metadata only**: the Read Settings form is disabled and a banner says *"This step is skipped since Metadata checks don't require data to be loaded"*. Click **Next** again to skip through to [Scan Settings](scan-settings.md) without configuring anything. + +The **Back** button returns you to [Select Tables](select-tables.md), unless the modal was opened from a container's detail page (in that case, the entry point was implicitly Step 2 and Back is disabled). + +## Examples + +**Insurance company: nightly comprehensive validation**: A property insurance company scans its claims warehouse every night at 02:00 UTC. They keep both categories checked so that `claims`, `policies`, and `adjudications` get row-count and freshness checks (Metadata) plus content-level checks (Data Integrity), all in one operation. Pairing this with a Full read strategy and Auto Resolve clears stale anomalies that were fixed during the day. + +**Logistics company: hourly volume sanity check**: A logistics provider ingests `shipment_events` from carrier APIs every two hours. Between Data Integrity scans (which run nightly), they schedule a separate **Metadata-only** scan hourly. The scan only checks that the expected ~25K rows landed and that the freshness window is met. Each run finishes in seconds because no records are read, and the team is alerted within an hour if the ingestion pipeline stalls. + +**Healthcare reference data: content audit only**: A clinical data team manages a small reference table `icd10_codes` (~70K rows, refreshed quarterly with each WHO update). Row counts and freshness do not matter (the team controls the upload). They scan with **Data Integrity only** to validate that each `code` follows the regex pattern and that `description` is never null, ignoring volumetric checks that would not add signal. + +**Bank: HIPAA-style compliance sweep on PII tables**: A bank tagged `customer_pii` on `customers`, `cardholder_data`, and `kyc_documents`. The compliance team scans on that tag weekly with both categories checked: Metadata catches volumetric drift (an unexpected drop in `kyc_documents` could signal a deletion bug), while Data Integrity runs Contains Email, Contains SSN, and pattern checks for credit card numbers across the same tables in one run. + +## Where to go next + +
+ +- :material-numeric-3-circle:{ .lg .middle } **Read Settings** + + --- + + Pick Incremental or Full, set an optional starting threshold, and the record limit. + + [:octicons-arrow-right-24: Read Settings](read-settings.md) + +- :material-numeric-4-circle:{ .lg .middle } **Scan Settings** + + --- + + Anomaly Options (including Auto Resolve), record-anomaly limits, and source examples. + + [:octicons-arrow-right-24: Scan Settings](scan-settings.md) + +- :material-numeric-5-circle:{ .lg .middle } **Schedule Options** + + --- + + Set up a recurring run, or skip this step and use **Run Now**. + + [:octicons-arrow-right-24: Schedule Options](schedule-options.md) + +- :material-numeric-1-circle:{ .lg .middle } **Select Tables** + + --- + + Choose the containers to scan: All, Specific, or by Tag. + + [:octicons-arrow-right-24: Select Tables](select-tables.md) + +
diff --git a/docs/operations/scan/how-tos/select-tables.md b/docs/operations/scan/how-tos/select-tables.md new file mode 100644 index 0000000000..a99714b02d --- /dev/null +++ b/docs/operations/scan/how-tos/select-tables.md @@ -0,0 +1,152 @@ +# :material-numeric-1-circle:{ .middle style="color: var(--q-brick)" } Select Tables + +This is **Step 1 of 5** in the Scan Operation modal. Before any other configuration, you choose **which containers** the scan will read. The choice is made once per scan and cannot be combined: a single scan picks one of three modes (**All**, **Specific**, or **Tag**) and advances to Step 2 with that selection locked in. + +For JDBC datastores, containers are tables and views. For DFS datastores, they are file patterns. Scan also supports `.txt.gz` and `.csv.gz` files in DFS datastores. + +## Open the Scan Operation modal + +From the datastore detail page, click the **Run** button under **Scan**. The modal opens at Step 1, **Select Tables**, with the section header **Choose tables to be scanned** and three radio options below it. + +The stepper at the top of the modal shows the full configuration flow: **1. Select Tables β†’ 2. Select Check Categories β†’ 3. Read Settings β†’ 4. Scan Settings β†’ 5. Schedule Options**. Stepper navigation lets you go back to revisit a previous step, but advancing always requires a valid selection in the current one. + +## Choose tables to be scanned + +Pick exactly one of the three options. The selection determines which containers are read in this run; the same datastore can be scanned with different modes on different runs or schedules. + +!!! note "Switching options resets the previous selection" + Switching from one option to another wipes the selection of the previous one. For example, picking containers in **Specific** and then switching to **Tag** clears the container selection. Switching back to **Specific** starts with an empty selection. + +=== "All" + + Includes every table or file pattern currently available in the datastore. The modal shows the total count next to the option (for example, *"Includes all 16 tables currently available to scan"*), and that count reflects the latest result of a [Sync operation](../../sync/sync.md) against this datastore. + + ![all](../../../assets/operations/scan/how-tos/select-tables/step-1-all.png) + + **What gets scanned** + + - Every container of the supported type currently in the datastore (tables and views for JDBC, file patterns for DFS). The count next to the option is the live, denormalized container count that is refreshed after every Sync (and after any other change that adds or removes a container). + - Containers marked **Unloadable** are still included in the scan request. They surface the same Unloadable error per container until the status is cleared. See [Troubleshooting: Unloadable Container Error](../troubleshooting.md#unloadable-container-error) for the resolution steps. + + **Future tables are included on scheduled scans** + + When this option is selected on a **scheduled scan**, any container added to the datastore after the schedule was created is automatically included in subsequent runs. There is no need to edit the schedule when new tables arrive: the schedule stores an empty container list and re-resolves it against the live datastore at every execution. + + **When to use** + + - Nightly or weekly comprehensive scans that should cover every container. + - A datastore that grows over time and where you want new containers picked up without manual intervention. + - A baseline run after a check edit that may affect many containers. + +=== "Specific" + + Lets you hand-pick one or more containers from the datastore's current list. The list is searchable and paginated, and each row shows the container's last scanned timestamp, record and field counts, active check count, and an icon indicating whether the container has an [incremental identifier](../../../glossary.md#incremental) configured. + + ![specific](../../../assets/operations/scan/how-tos/select-tables/step-2-specific.png) + + **What gets scanned** + + - Only the containers checked in the list. Unchecked containers are skipped entirely, even if they currently have open anomalies. + - Use the **Search tables** input at the top of the list to filter by container name. The search is sent to the server with a 500 ms debounce. + - Pagination (`1-10 of N`) lets you navigate large datastores without overwhelming the modal. + - A page-scoped **Select all** / **Deselect all** checkbox next to the pagination toggles every visible row on the current page. Selection is page-local: switching pages keeps the selections from the previous page intact, but the toggle only acts on the page you are looking at. + - A counter chip with a **Clear selection** button appears above the list when at least one container is selected. Clicking it clears the entire selection in one step. + + **Containers that cannot be selected** + + Rows for containers whose status prevents them from being scanned (for example, **Unloadable**) render as disabled with a tooltip explaining the status. Unprofiled containers still appear in the list but show " has not been profiled" in place of the record and field metrics. + + **Selection is recorded on the operation** + + The exact list of containers is stored on the scan operation. If you schedule the scan instead of running it immediately, the schedule keeps that same list and reuses it on every execution. Containers added to the datastore **after** the schedule is created are **not** picked up automatically; if you want new containers to be included, use **All** or **Tag** instead. + + **When to use** + + - Ad-hoc validation of a single new table or view. + - Targeted re-scan after editing checks on a specific container. + - Excluding containers that are known to be unstable, in maintenance, or out of scope for this run. + - Performance-sensitive runs where you want to avoid the cost of scanning everything. + +=== "Tag" + + Scans every container associated with one or more tags. Tags are applied to containers separately (from the container's detail page or via bulk-edit in the [Tags section](../../../source-datastore/tags/getting-started.md)), so this option only makes sense when your datastore is already tagged. + + ![tag](../../../assets/operations/scan/how-tos/select-tables/step-3-tag.png) + + **What gets scanned** + + - Every container associated with **at least one** of the tags you pick. The matching is OR-based: a container tagged `Production` is included if `Production` is selected, regardless of whether other tags like `HIPAA` are also selected. + - You can pick multiple tags at once; the result set is deduplicated so a container tagged with two of the selected tags is scanned exactly once. + - Each tag preserves its color in the picker (in the screenshot above: Production, HIPAA, PCI, PII, Financial) so you can recognize the policy or domain each tag represents. + + **The tag picker is workspace-wide** + + The picker lists every tag defined in the workspace. It is **not** filtered to tags currently applied in this datastore. If you select a tag that no container in this datastore carries, the Next button is still enabled (the rule only requires at least one tag to be picked), but the scan will resolve to zero containers and finish immediately. Apply tags to containers (from the container's detail page or via the Tags section) before relying on Tag-based scans. + + **Future tables are included via tags on scheduled scans** + + Like **All**, scheduled scans configured with **Tag** automatically include newly tagged containers. If a table is added to the datastore later and you apply the `Production` tag to it, the next scheduled run with `Production` selected picks it up without editing the schedule. + + **When to use** + + - Compliance sweeps against a tag like `HIPAA`, `PCI`, or `PII`. + - Domain-driven scans where each business area (Finance, Marketing, HR) has its own tag and is scanned on its own cadence. + - Sharing a single scan strategy across many containers without having to list each one individually. + +## Continue to the next step + +Once the option is set, click **Next** to continue to [Select Check Categories](select-check-categories.md). The Next button is enabled as soon as a valid selection exists: + +- **All** is valid by itself. +- **Specific** requires at least one container to be checked. +- **Tag** requires at least one tag to be selected. + +## Examples + +**Regional bank: nightly comprehensive scan**: A mid-size bank runs data quality every night against its core analytics warehouse. The catalog contains `transactions`, `accounts`, `customer_profiles`, `loan_originations`, plus 60+ smaller dimension tables, and the data team adds new mart tables roughly once a month. They pick **All** on the schedule so each newly synced table is included automatically on the next run, with no need to edit the schedule when the catalog grows. + +**Online retailer: ad-hoc validation of a new product table**: The data engineering team at an e-commerce company ships a new `product_recommendations` table generated by a freshly trained model. Before exposing it to downstream BI, they pick **Specific** and select only `product_recommendations`. The scan finishes in minutes instead of re-validating the 300 unrelated tables in the warehouse. + +**Health system: HIPAA compliance sweep**: A hospital network has 14 tables containing Protected Health Information (`patients`, `encounters`, `claims`, `lab_results`, and so on), all tagged with `HIPAA`. The compliance team scans on the `HIPAA` tag every Monday to validate Contains Email, Contains SSN, and Not Null checks on `patient_id` and `member_id`. When the data team adds a new PHI-bearing table later in the year, applying the `HIPAA` tag to it is enough for the next scheduled scan to pick it up. + +**Fintech: targeted re-scan after editing a check**: A payments fintech edits the Expected Values check on `transactions.payment_method` to add the new `pix_instant` method. They pick **Specific** and select only `transactions`, `payment_attempts`, and `refunds`. The scan validates the updated check against the three relevant tables and skips everything else. + +**B2B SaaS: domain-driven recurring scans**: A B2B SaaS company has tables labeled with three domain tags: `billing`, `product_analytics`, and `customer_success`. Each domain has its own data steward and its own cadence. They create three schedules using **Tag**: `billing` runs hourly (cash-flow critical), `product_analytics` runs daily at 02:00 UTC, and `customer_success` runs weekly on Mondays. Each schedule reuses the same tag-based selection logic without listing tables explicitly. + +## Where to go next + +
+ +- :material-numeric-2-circle:{ .lg .middle } **Select Check Categories** + + --- + + Choose Metadata, Data Integrity, or both. + + [:octicons-arrow-right-24: Select Check Categories](select-check-categories.md) + +- :material-numeric-3-circle:{ .lg .middle } **Read Settings** + + --- + + Pick Incremental or Full, set an optional starting threshold, and the record limit. + + [:octicons-arrow-right-24: Read Settings](read-settings.md) + +- :material-numeric-4-circle:{ .lg .middle } **Scan Settings** + + --- + + Anomaly Options (including Auto Resolve), record-anomaly limits, and source examples. + + [:octicons-arrow-right-24: Scan Settings](scan-settings.md) + +- :material-numeric-5-circle:{ .lg .middle } **Schedule Options** + + --- + + Set up a recurring run, or skip this step and use **Run Now**. + + [:octicons-arrow-right-24: Schedule Options](schedule-options.md) + +
diff --git a/docs/operations/scan/how-tos/use-runtime-variables.md b/docs/operations/scan/how-tos/use-runtime-variables.md new file mode 100644 index 0000000000..3fa0cee749 --- /dev/null +++ b/docs/operations/scan/how-tos/use-runtime-variables.md @@ -0,0 +1,38 @@ +# :material-code-tags:{ .middle style="color: var(--q-brick)" } Use Runtime Variables + +Some advanced use cases require options that are not yet exposed in the UI but are available through the Qualytics API. Runtime variable assignment is one of them. + +## Runtime Variable Assignment + +It is possible to reference a variable in a check definition (declared in double curly braces) and then assign that variable a value when a Scan operation is initiated. Variables are supported within any Spark SQL expression and are most commonly used in a check filter. + +If a Scan is meant to assert a check with a variable, a value for that variable must be supplied as part of the Scan operation's `check_variables` property. + +When using a variable inside a filter, the filter **must be a valid [Spark SQL WHERE](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-where.html){:target="_blank"} expression**. + +For example, a check might include a filter: + +```sql +transaction_date = {{ checked_date }} +``` + +For the Scan operation payload, users must apply **explicit casting** inside the `check_variables` section. Since variables may represent different data types (integer, string, timestamp, etc.), each variable must be cast to the correct type to avoid parsing or evaluation errors. + +In this case, that value would be assigned by passing the following payload when calling `/api/operations/run`: + +```json +{ + "type": "scan", + "datastore_id": 42, + "container_names": ["my_container"], + "incremental": true, + "remediation": "none", + "max_records_analyzed_per_partition": 0, + "check_variables": { + "checked_date": "TIMESTAMP '2026-05-15'" + }, + "high_count_rollup_threshold": 10 +} +``` + +For the full payload reference, see [API](../api.md). diff --git a/docs/operations/scan/scan.md b/docs/operations/scan/scan.md deleted file mode 100644 index 5c57636d25..0000000000 --- a/docs/operations/scan/scan.md +++ /dev/null @@ -1,675 +0,0 @@ -# Scan Operation - -The Scan Operation in Qualytics is performed on a datastore to enforce data quality checks for various data collections, such as tables, views, and files. It supports centralized configuration through the Datastore Enrichment Settings, where options like the Remediation Strategy, Source Record Limit, and Anomaly Rollup Threshold are defined. While these defaults are applied automatically during a scan, users retain the flexibility to adjust the Source Record Limit and Anomaly Rollup Threshold directly within the scan form. This operation has several key functions: - -- **Record Anomalies:** Identifies a single record (row) as anomalous and provides specific details regarding why it is considered anomalous. The simplest form of a record anomaly is a row that lacks an expected value for a field. - -- **Shape Anomalies:** Identifies structural issues within a dataset at the column or schema level. It highlights broader patterns or distributions that deviate from expected norms. If a dataset is expected to have certain fields and one or more fields are missing or contain inconsistent patterns, this would be flagged as a shape anomaly. - -- **Anomaly Data Recording:** All identified anomalies, along with related analytical data, are recorded in the associated Enrichment Datastore for further examination. - -Additionally, the Scan Operation offers flexible options, including the ability to: - -- Perform checks on incremental loads versus full loads. -- Limit the number of records scanned. -- Run scans on a selected list of tables or files. -- Schedule scans for future execution. - -Let's get started! πŸš€ - -## Navigation to Scan Operation - -**Step 1:** Select a source datastore from the side menu on which you would like to perform the scan operation. - -![side-menu](../../assets/operations/scan/step-1-side-menu.png) - -**Step 2:** Clicking on your preferred datastore will navigate you to the datastore details page. Within the overview tab (default view), click on the **Run** button under **Scan** to initiate the scan operation. - -![details-page](../../assets/operations/scan/step-2-details-page.png) - -!!! note - Scanning operation can be commenced once the sync operation and profile operation are completed. - -## Configuration - -**Step 1:** Click on the **Run** button to initiate the scan operation. - -![run](../../assets/operations/scan/step-3-run.png) - -**Step 2:** Select tables (in your JDBC datastore) or file patterns (in your DFS datastore) and tags you would like to be scanned. - -!!! note - Scan operation also supports .txt.gz and .csv.gz files in DFS datastores. - -**1. All Tables/File Patterns** - -This option includes all tables or file patterns currently available for scanning in the datastore. It means that every table or file pattern recognized in your datastore will be subjected to the defined data quality checks. Use this when you want to perform a comprehensive scan covering all the available data without any exclusions. - -![all-operation](../../assets/operations/scan/step-4-all-operation.png) - -**2. Specific Tables/File Patterns** - -This option allows you to manually select the individual table(s) or file pattern(s) in your datastore to scan. Upon selecting this option, all the tables or file patterns associated with your datastore will be automatically populated, allowing you to select the datasets you want to scan. - -You can also search the tables/file patterns you want to scan directly using the search bar. Use this option when you need to target particular datasets or when you want to exclude certain files from the scan for focused analysis or testing purposes. - -![specific](../../assets/operations/scan/step-5-specific.png) - -**3. Tag** - -This option enables you to automatically scan file patterns associated with the selected tags. Tags can be predefined or created to categorize and manage file patterns effectively. - -![tag](../../assets/operations/scan/step-6-tag.png) - -**Step 3:** Click on the **Next** button to Configure **Select Check Categories**. - -![next](../../assets/operations/scan/step-7-next.png) - -**Step 4:** Configure **Select Check Categories** Setting - -Users can choose one or more check categories when initiating a scan. This allows for flexible selection based on the desired scope of the operation: - -- **Metadata**: Include checks that define the expected properties of the table, such as volume. It belongs to the Volumetric rule type. - -- **Data Integrity**: Include checks that specify the expected values for the data stored in the table. It belongs to all rule types except volumetric. - -![select-check](../../assets/operations/scan/step-8-select-check.png) - -**Step 5:** Click on the **Next** button to Configure the **Read Settings**. - -![next](../../assets/operations/scan/step-9-nextt.png) - -**Step 6:** Configure Read Settings, Starting Threshold (Optional), and the Record Limit. - - 1. Select the **Read Strategy** for your scan operation. - -- **Incremental:** This strategy is used to scan only the new or updated records since the last scan operation. On the initial run, a full scan is conducted unless a specific starting threshold is set. For subsequent scans, only the records that have changed since the last scan are processed. If tables or views do not have a defined incremental key, a full scan will be performed. Ideal for regular scans where only changes need to be tracked, saving time and computational resources. - -!!! note - Incremental scans fully support Apache Iceberg table, significantly expanding the range of asset types eligible for incremental scanning operations. - -- **Full**: This strategy performs a comprehensive scan of all records within the specified data collections, regardless of any previous changes or scans. Every scan operation will include all records, ensuring a complete check each time. Suitable for periodic comprehensive checks or when incremental scanning is not feasible due to the nature of the data. - -![incremental](../../assets/operations/scan/step-10-incremental.png) - -!!! warning - If any selected tables do not have an incremental identifier, a full scan will be performed for those tables. - -!!! info - When running an Incremental Scan for the first time, Qualytics automatically performs a full scan, saving the incremental field for subsequent runs. - - - This ensures that the system establishes a baseline and captures all relevant data. - - - Once the initial full scan is completed, the system intelligently uses the saved incremental field to execute future Incremental Scans efficiently, focusing only on the new or updated data since the last scan. - - - This approach optimizes the scanning process while maintaining data quality and consistency. - - 2. Define the Starting Threshold **(Optional)** i.e., specify a minimum incremental identifier value to set a starting point for the scan. - -* **Greater Than Time:** This option applies only to tables with an incremental timestamp strategy. Users can specify a timestamp to scan records that were modified after this time. - -* **Greater Than Batch:** This option applies to tables with an incremental batch strategy. Users can set a batch value, ensuring that only records with a batch identifier greater than the specified value are scanned. - -![starting-threshold](../../assets/operations/scan/step-11-starting-threshold.png) - - 3. Define the **Record Limit** - the maximum number of records to be scanned per table after any initial filtering. This is a crucial feature for managing large datasets. - -![record-limit-line](../../assets/operations/scan/step-12-record-limit-line.png) - - You can manually enter a custom value in the text field or quickly select from a dropdown menu with commonly used limits such as 1M, 10M, 100M, and All. - - ![record-limit-options](../../assets/operations/scan/step-13-record-limit-options.png) - -!!! note - The number of records must be between 1 and 1,000,000,000. - -**Step 7:** Click on the **Next** button to Configure the **Scan Settings**. - -![next-button](../../assets/operations/scan/step-14-next-button.png) - -**Step 8:** Configure the **Scan Settings**. - -**1. Anomaly Options:** Manage duplicate anomalies efficiently by archiving duplicates or reactivating recurring ones. These settings help streamline anomaly tracking and maintain data accuracy. - -- **Archive Duplicate Anomalies:** Automatically archive duplicate anomalies from previous scans that overlap with the current scan to enhance data management efficiency. - -- **Reactivate Recurring Anomalies:** Enabling **Reactivate Recurring Anomalies** marks new anomalies as duplicates of archived ones, reactivates the original anomaly, and creates a [Fingerprint](../../enrichment/enrichment-tables.md#_failed_checks-table) column in the Enrichment Datastore. - -![anomaly-option](../../assets/operations/scan/step-15-anomaly-option.png) - -**2. Maximum Record Anomalies per Check:** Set the maximum number of anomalies generated per check before they are merged into a single rolled-up anomaly. This helps manage anomaly volume and simplifies review. - -![anomaly-option](../../assets/operations/scan/step-16-anomalyy.png) - -**3. Maximum Source Examples per Anomaly:** This setting decides **how many source records are kept for each anomaly**. When the scan runs and finds an anomaly, only the specified number of records are kept. These are the **only records you can view or download later**. - -![source-record-limit](../../assets/operations/scan/step-17-source-record-limit.png) - -For example, if this value is set to **10**, only **10 source records per anomaly** will be kept, even if more records caused the anomaly. - -If you need to **download more records**, increase this value **before running the scan**. Changes made after the scan finishes will not affect the results. - -![record-limit](../../assets/operations/scan/step-18-record-limit.png) - -## Field Masking and Scanning - -Scan operations run normally on [masked fields](../../fields/field-status/concepts/field-masking.md) β€” masking does not affect anomaly detection or quality check execution. The platform evaluates all checks using the actual source data. - -However, when scan results are displayed, masked field values are obfuscated in the following surfaces: - -- **Anomaly Source Records** β€” values are hidden by default; users with Editor permission can reveal them per anomaly -- **Anomaly Descriptions** β€” check failure messages permanently show `` in place of actual values; the original value is not stored -- **Enrichment Datastore** β€” source record values written during a [Materialize operation](../../operations/materialize-operation/materialize-operation.md#field-masking-and-materialize) are obfuscated for masked fields - -For more details, see [Masked Fields in Source Records](../../anomalies/deep-dive/source-record.md#masked-fields-in-source-records). - -## Run Instantly - -Click on the **Run Now** button to perform the scan operation immediately. - -![run-now](../../assets/operations/scan/step-19-run-now.png) - -## Schedule - -!!! info "Timezone-aware scheduling" - Schedules can run in any IANA timezone (for example, `America/New_York`, `Europe/Paris`, `Asia/Tokyo`), and Daylight Saving Time transitions are handled automatically. **UTC is the default** for new and existing schedules. The configured timezone is shown on the schedule card as an abbreviation, such as `Schedule (UTC)` by default or `Schedule (EST)` after selecting another timezone. - -!!! info "Deactivating a schedule keeps its cron expression" - When you deactivate a schedule, its cron expression is kept. Reactivating it later resumes the same schedule without setting it up again. - - If a schedule was deactivated before May 7, 2026 and doesn't run after you reactivate it, re-enter its cron expression once to restore it. Schedules deactivated on or after that date keep working normally. - -**Step 1:** Click on the **Schedule** button to configure the available schedule options for your scan operation. - -![click-schedule](../../assets/operations/scan/step-20-click-schedule.png) - -**Step 2:** Choose the **Timezone** for this schedule. UTC is selected by default. To run in a different timezone, type to search by city, region, or abbreviation and pick an IANA timezone from the list. The selected timezone applies to every tab below, and the banner above the tabs shows the current time in that timezone. - - - -**Step 3:** Set the scheduling preferences for the scan operation. - -**1. Hourly:** This option allows you to schedule the scan to run every hour at a specified minute. You can define the frequency in hours and the exact minute within the hour the scan should start. Example: If set to **Every 1 hour(s) on minute 0,** the scan will run every hour at the top of the hour (e.g., 1:00, 2:00, 3:00). - -![hourly](../../assets/operations/scan/step-21-hourly.png) - -**2. Daily:** This option schedules the scan to run once every day at a specific time. You specify the number of days between scans and the exact time of day in the selected timezone. Example: If set to **Every 1 day(s) at 00:00** with the timezone set to UTC, the scan will run every day at midnight UTC. - -![daily](../../assets/operations/scan/step-22-daily.png) - -**3. Weekly:** This option schedules the scan to run on specific days of the week at a set time. You select the days of the week and the exact time of day in the selected timezone for the scan to run. Example: If configured to run on "Sunday" and "Friday" at 00:00 with the timezone set to UTC, the scan will execute at midnight UTC on these days. - -![weekly](../../assets/operations/scan/step-23-weekly.png) - -**4. Monthly:** This option schedules the scan to run once a month on a specific day at a set time. You specify the day of the month and the time of day in the selected timezone. If set to "On the 1st day of every 1 month(s), at 00:00" with the timezone set to UTC, the scan will run on the first day of each month at midnight UTC. - -![monthly](../../assets/operations/scan/step-24-monthly.png) - -**5. Advanced:** The advanced section for scheduling operations allows users to set up more complex and custom scheduling using Cron expressions. This option is particularly useful for defining specific times and intervals for scan operations with precision. - -Cron expressions are a powerful and flexible way to schedule tasks. They use a syntax that specifies the exact timing of the task based on five fields: - -- Minute (0 - 59) -- Hour (0 - 23) -- Day of the month (1 - 31) -- Month (1 - 12) -- Day of the week (0 - 6) (Sunday to Saturday) - -Each field can be defined using specific values, ranges, or special characters to create the desired schedule. - -**Example:** The Cron expression `0 0 * * *` schedules the scan operation to run at midnight (00:00) every day. Here’s a breakdown of this expression: - -- 0 (Minute) - The task will run at the 0th minute. -- 0 (Hour) - The task will run at the 0th hour (midnight). -- *(Day of the month) - The task will run every day of the month. -- *(Month) - The task will run every month. -- *(Day of the week) - The task will run every day of the week. - -Users can define other specific schedules by adjusting the Cron expression. For example: - -- 0 12 * * 1-5 - Runs at 12:00 PM from Monday to Friday. -- 30 14 1 * * - Runs at 2:30 PM on the first day of every month. -- 0 22 * * 6 - Runs at 10:00 PM every Saturday. - -To define a custom schedule, enter the appropriate Cron expression in the **Custom Cron Schedule** field before specifying the schedule name. The field label shows the abbreviation for the currently selected timezone (for example, `Custom Cron Schedule (UTC)` or `Custom Cron Schedule (EST)`), and the cron fields are interpreted in that timezone. - -![advanced](../../assets/operations/scan/step-25-advanced.png) - -**Step 4:** Define the **Schedule Name** to identify the scheduled operation at the running time. - -![schedule-name](../../assets/operations/scan/step-26-schedule-name.png) - -**Step 5:** Click on the **Schedule** button to schedule your scan operation. - -![schedule](../../assets/operations/scan/step-27-schedule.png) - -!!! note "Daylight Saving Time" - When you pick a timezone that observes DST (such as `America/New_York` or `Europe/London`), the schedule automatically shifts with each transition. A job set to run at 9:00 AM in `America/New_York` runs at 9:00 AM local time year-round, regardless of whether the zone is in EST or EDT at the time. No reconfiguration is required. - -!!! note - You will receive a notification when the scan operation is completed. - -## Advanced Options - -The advanced use cases described below require options that are not yet exposed in our user interface but possible through interaction with the Qualytics API. - -### Runtime Variable Assignment - -It is possible to reference a variable in a check definition (declared in double curly braces) and then assign that variable a value when a Scan operation is initiated. Variables are supported within any Spark SQL expression and are most commonly used in a check filter. - -If a Scan is meant to assert a check with a variable, a value for that variable must be supplied as part of the Scan operation's `check_variables` property. - -When using a variable inside a filter, the filter **must be a valid [Spark SQL WHERE](https://spark.apache.org/docs/latest/sql-ref.html){target="_blank"} expression**. - -For example, a check might include a filter: - -```sql -transaction_date = {{ checked_date }} -``` - -For the Scan operation payload, users must apply **explicit casting** inside the `check_variables` section. Since variables may represent different data types (integer, string, timestamp, etc.), each variable must be cast to the correct type to avoid parsing or evaluation errors. - -In this case, that value would be assigned by passing the following payload when calling `/api/operations/run`: - -```json -{ - "type": "scan", - "datastore_id": 42, - "container_names": ["my_container"], - "incremental": true, - "remediation": "none", - "max_records_analyzed_per_partition": 0, - "check_variables": { - "checked_date": "TIMESTAMP '2023-10-15'" - }, - "high_count_rollup_threshold": 10 -} -``` -## Operations Insights - -When the scan operation is completed, you will receive the notification and can navigate to the Activity tab for the datastore on which you triggered the Scan Operation and learn about the scan results. - -### Top Panel - -**1. Runs (Default View):** Provides insights into the operations that have been performed. - -**2. Schedule:** Provides insights into the scheduled operations. - -**3. Search:** Search for any operation (including scan) by entering the operation ID. - -**4. Sort by:** Organize the list of operations based on the **Created Date** or the **Duration**. - -**5. Filter:** Narrow down the list of operations based on: - -- Operation Type -- Operation Status -- Table - -![activity-operation](../../assets/operations/scan/step-28-activity-operation.png) - -### Activity Heatmap - -The activity heatmap shown in the snippet below represents activity levels over a period, with each square indicating a day and the color intensity representing the number of operations or activities on that day. It is useful in tracking the number of operations performed on each day within a specific timeframe. - -!!! tip - You can click on any of the squares from the Activity Heatmap to filter operations. - -![activity](../../assets/operations/scan/step-29-activity.png) - -### Operation Detail - -#### Running - -This status indicates that the scan operation is still running at the moment and is yet to be completed. A scan operation having a **running** status reflects the following details and actions: - -![running](../../assets/operations/scan/step-30-running.png) - -| No. | Parameter | Interpretation | -| --- | -------------------------- | ----------------------------------------------------------------------------------------------------------- | -| 1 | Operation ID and Type | Unique identifier and type of operation performed (sync, profile, or scan). | -| 2 | Timestamp | Timestamp when the operation was started. | -| 3 | Progress Bar | The progress of the operation. | -| 4 | Triggered By | The author who triggered the operation. | -| 5 | Schedule | Indicates whether the operation was scheduled or not. | -| 6 | Incremental Field | Indicates whether Incremental was enabled or disabled in the operation. | -| 7 | Remediation | Indicates whether Remediation was enabled or disabled in the operation. | -| 8 | Anomalies Identified | Provides a count of the number of anomalies detected during the running operation.| -| 9 | Read Record Limit | Defines the maximum number of records to be scanned per table after initial filtering.| -| 10 | Check Categories | Indicates which categories should be included in the scan (e.g., Metadata, Data Integrity).| -| 11 | Archive Duplicate Anomalies | Indicates whether Archive Duplicate Anomalies was enabled or disabled in the operation. -| 12 | Reactivate Recurring Anomalies | Indicates whether previously detected anomalies that reappear in subsequent scans will be reactivated. | -| 13 | Source Record Limit | Indicates the limit on records stored in the enrichment datastore for each detected anomaly.| -| 14 | Anomaly Rollup Threshold | Number of anomalies grouped together for rollup reporting.| -| 15 | Results | View the details of the ongoing scan operation. This includes information on which tables are currently being scanned, the anomalies identified so far (if any), and other related data collected during the active scan.| -| 16 | Abort | The Abort button enables you to stop the ongoing scan operation.| -| 17 | Summary | The summary section provides an overview of the scan operation in progress. It includes:
  • **Tables Requested**: The total number of tables that were scheduled for scanning. Click on the adjacent magnifying glass icon to view the tables requested.
  • **Tables Scanned**: The number of tables that have been scanned so far. Click on the adjacent magnifying glass icon to view the tables scanned.
  • **Partitions Scanned**: The number of partitions scanned during the ongoing operation.
  • **Records Scanned**: The total number of records processed up to this point.
  • **Anomalies Identified**: The total number of detected anomalies, with a breakdown of open and archived ones.
| - -#### Aborted - -This status indicates that the scan operation was manually stopped before it could be completed. A scan operation having an **aborted** status reflects the following details and actions: - -![aborted-operation](../../assets/operations/scan/step-31-aborted-operation.png) - -| **No.** | **Parameter** | **Interpretation** | -|---------|---------------------------|------------------------------------------------------------------------------------| -| 1 | Operation ID and Type | Unique identifier and type of operation performed (sync, profile, or scan). | -| 2 | Timestamp | Timestamp when the operation was started | -| 3 | Progress Bar | The progress of the operation | -| 4 | Aborted By | The author who triggered the operation | -| 5 | Schedule | Whether the operation was scheduled or not | -| 6 | Incremental Field | Indicates whether Incremental was enabled or disabled in the operation | -| 7 | Remediation | Indicates whether Remediation was enabled or disabled in the operation | -| 8 | Anomalies Identified | Provides a count on the number of anomalies detected before the operation was aborted| -| 9 | Read Record Limit | Defines the maximum number of records to be scanned per table after initial filtering| -| 10 | Check Categories | Indicates which categories should be included in the scan (Metadata, Data Integrity)| -| 11 | Archive Duplicate Anomalies| Indicates whether Archive Duplicate Anomalies was enabled or disabled in the operation| -| 12 | Reactivate Recurring Anomalies | Indicates whether previously detected anomalies that reappear in subsequent scans will be reactivated. | -| 13 | Source Record Limit | Indicates the limit on records stored in the enrichment datastore for each detected anomaly| -| 14 | Anomaly Rollup Threshold | Number of anomalies grouped together for rollup reporting.| -| 15 | Results | View the details of the scan operation that was aborted, including tables scanned and anomalies identified| -| 16 | Resume | Provides an option to continue the scan operation from where it left off | -| 17 | Rerun | The "Rerun" button allows you to start a new scan operation using the same settings as the aborted scan| -| 18 | Delete | Removes the record of the aborted scan operation from the system, permanently deleting scan results and anomalies| -| 19 | Summary | The summary section provides an overview of the scan operation up to the point it was aborted. It includes:
  • **Tables Requested**: The total number of tables that were scheduled for scanning. Click on the adjacent magnifying glass icon to view the tables requested.
  • **Tables Scanned**: The number of tables that have been scanned so far. Click on the adjacent magnifying glass icon to view the tables scanned.
  • **Partitions Scanned**: The number of partitions scanned before the operation was aborted.
  • **Records Scanned**: The total number of records processed before the scan was stopped.
  • **Anomalies Identified**: The total number of detected anomalies, with a breakdown of open and archived ones.
| - -#### Warning - -This status signals that the scan operation encountered some issues and displays the logs that facilitate improved tracking of the blockers and issue resolution. A scan operation having a **completed with warning** status reflects the following details and actions: - -![warning](../../assets/operations/scan/step-32-warning.png) - -| **No.** | **Parameter** | **Interpretation** | -|---------|---------------------------|------------------------------------------------------------------------------------| -| 1 | Operation ID and Type | Unique identifier and type of operation performed (sync, profile, or scan). | -| 2 | Timestamp | Timestamp when the operation was started | -| 3 | Progress Bar | The progress of the operation | -| 4 | Triggered By | The author who triggered the operation | -| 5 | Schedule | Whether the operation was scheduled or not | -| 6 | Incremental Field | Indicates whether Incremental was enabled or disabled in the operation | -| 7 | Remediation | Indicates whether Remediation was enabled or disabled in the operation | -| 8 | Anomalies Identified | Provides a count on the number of anomalies detected before the operation was warned.| -| 9 | Read Record Limit | Defines the maximum number of records to be scanned per table after initial filtering| -| 10 | Check Categories | Indicates which categories should be included in the scan (Metadata, Data Integrity)| -| 11 | Archive Duplicate Anomalies| Indicates whether Archive Duplicate Anomalies was enabled or disabled in the operation| -| 12 | Source Record Limit | Indicates the limit on records stored in the enrichment datastore for each detected anomaly| -| 13 | Anomaly Rollup Threshold | Number of anomalies grouped together for rollup reporting.| -| 14 | Result | View the details of the scan operation that was completed with warning, including tables scanned and anomalies identified| -| 15 | Rerun | The "Rerun" button allows you to start a new scan operation using the same settings as the warning scan| -| 16 | Delete | Removes the record of the warning operation from the system, permanently deleting scan results and anomalies| -| 17 |Summary | The summary section provides an overview of the scan operation, highlighting any warnings encountered. It includes:
  • **Tables Requested**: The total number of tables that were scheduled for scanning. Click on the adjacent magnifying glass icon to view the tables requested.
  • **Tables Scanned**: The number of tables that have been scanned so far. Click on the adjacent magnifying glass icon to view the tables scanned.
  • **Partitions Scanned**: The number of partitions scanned during the operation, including any partitions that triggered warnings.
  • **Records Scanned**: The total number of records processed during the scan, along with any records that raised warnings.
  • **Anomalies Identified**: The total number of detected anomalies, with a breakdown of open and archived ones.
| -| 18 |Logs | Logs include error messages, warnings, and other pertinent information that occurred during the execution of the Scan Operation.| - -#### Success - -The summary section provides an overview of the **scan** operation upon successful completion. It includes: - -![success](../../assets/operations/scan/step-33-success.png) - -| **No.** | **Parameter** | **Interpretation** | -|---------|---------------------------|------------------------------------------------------------------------------------| -| 1 | Operation ID and Type | Unique identifier and type of operation performed (sync, profile, or scan). | -| 2 | Timestamp | Timestamp when the operation was started | -| 3 | Progress Bar | The progress of the operation | -| 4 | Triggered By | The author who triggered the operation | -| 5 | Schedule | Whether the operation was scheduled or not | -| 6 | Incremental Field | Indicates whether Incremental was enabled or disabled in the operation | -| 7 | Remediation | Indicates whether Remediation was enabled or disabled in the operation | -| 8 | Anomalies Identified | Provides a count of the number of anomalies detected during the successful completion of the operation.| -| 9 | Read Record Limit | Defines the maximum number of records to be scanned per table after initial filtering| -| 10 | Check Categories | Indicates which categories should be included in the scan (Metadata, Data Integrity)| -| 11 | Archive Duplicate Anomalies| Indicates whether Archive Duplicate Anomalies was enabled or disabled in the operation| -| 12 | Source Record Limit | Indicates the limit on records stored in the enrichment datastore for each detected anomaly| -| 13 | Anomaly Rollup Threshold | Number of anomalies grouped together for rollup reporting.| -| 14 | Results | View the details of the completed scan operation. This includes information on which tables were scanned, the anomalies identified (if any), and other relevant data collected throughout the successful completion of the scan.| -| 15 | Rerun | The "Rerun" button allows you to start a new scan operation using the same settings as the success scan| -| 16 | Delete | Removes the record of the aborted scan operation from the system, permanently deleting scan results and anomalies| -| 17 | Summary | The summary section provides an overview of the scan operation upon successful completion. It includes:
  • **Tables Requested**: The total number of tables that were scheduled for scanning. Click on the adjacent magnifying glass icon to view the tables requested.
  • **Tables Scanned**: The number of tables that have been scanned successfully. Click on the adjacent magnifying glass icon to view the tables scanned.
  • **Partitions Scanned**: The number of partitions scanned.
  • **Records Scanned**: The total number of records processed.
  • **Anomalies Identified**: The total number of detected anomalies, with a breakdown of open and archived ones.
| - -#### Full View of Metrics in Operation Summary - -Users can now hover over abbreviated metrics to see the full value for better clarity. For demonstration purposes, we are hovering over the **Records Scanned** field to display the full value. - -![records-scan-operation](../../assets/operations/scan/step-34-records-scan-operation.png) - -#### Post Operation Details - -**Step 1:** Click on any of the successful **Scan Operations** from the list and hit the **Results** button. - -![result-scan-operation](../../assets/operations/scan/step-35-result-scan-operation.png) - -**Step 2:** The **Scan Results** modal demonstrates the highlighted anomalies (if any) identified in your datastore with the following properties: - -![result](../../assets/operations/scan/step-36-result.png) - -| Ref. | Scan Properties | Description | -|------|-----------------|------------------------| -| 1. | Table/File | The table or file where the anomaly is found.| -| 2. | Field | The field(s) where the anomaly is present. | -| 3. | Location | Fully qualified location of the anomaly.| -| 4. | Rule | AI Managed and Authored checks that failed assertions. | -| 5. | Description | Human-readable, auto-generated description of the anomaly.| -| 6. | Status | The status of the anomaly: Active, Acknowledged, Resolved, or Invalid. | -| 7. | Type | The type of anomaly (e.g., Record or Shape) | -| 8. | Date time | The date and time when the anomaly was found. | - -**Step 3:** By clicking the **dropdown** button next to the **All** button, you can filter anomalies based on their status. - -![drop-down](../../assets/operations/scan/step-37-drop-down.png) - -## API Payload Examples - -This section provides payload examples for running, scheduling, and checking the status of scan operations. Replace the placeholder values with data specific to your setup. - -### Running a Scan operation - -To run a scan operation, use the API payload example below and replace the placeholder values with your specific values. - -#### Endpoint (Post): - -```/api/operations/run (post)``` - -=== "Option I: Running a scan operation of all containers" - * **container_names:** `[]` means that it will scan all containers. - * **max_records_analyzed_per_partition:** `null` means that it will scan all records of all containers. - * **Remediation:** `append` replicates source containers using an append-first strategy. - - ```json - { - "type":"scan", - "name":null, - "datastore_id": datastore-id, - "container_names":[], - "remediation":"append", - "incremental":false, - "max_records_analyzed_per_partition":null, - "enrichment_source_record_limit":10 - } - ``` -=== "Option II: Running a scan operation of specific containers" - * **container_names:** `["table_name_1", "table_name_2"]` means that it will scan only the tables table_name_1 and table_name_2. - * **max_records_analyzed_per_partition:** `1000000` means that it will scan a maximum of 1 million records per partition. - * **Remediation:** `overwrite` replicates source containers using an overwrite strategy. - - ```json - { - "type":"scan", - "name":null, - "datastore_id":datastore-id, - "container_names":[ - "table_name_1", - "table_name_2" - ], - "max_records_analyzed_per_partition":1000000, - "enrichment_source_record_limit":10 - } - ``` -### Scheduling scan operation of all containers - -To schedule a scan operation, use the API payload example below and replace the placeholder values with your specific values. - -#### Endpoint (Post): - -```/api/operations/schedule (post)``` - -This payload is to run a scheduled scan operation every day at 00:00 - -=== "Scheduling scan operation of all containers" - - ```json - { - "type":"scan", - "name":"My scheduled Scan operation", - "datastore_id":"datastore-id", - "container_names":[], - "remediation": "overwrite", - "incremental": false, - "max_records_analyzed_per_partition":null, - "enrichment_source_record_limit":10, - "crontab":"00 00 */2 * *" - } - ``` - -### Retrieving Scan Operation Information - -#### Endpoint (Get) - -`/api/operations/{id} (get)` - -=== "Example result response" - ```json - { - "items": [ - { - "id": 12345, - "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "type": "scan", - "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "result": "success", - "message": null, - "triggered_by": "user@example.com", - "datastore": { - "id": 101, - "name": "Datastore-Sample", - "store_type": "jdbc", - "type": "db_type", - "enrich_only": false, - "enrich_container_prefix": "data_prefix", - "favorite": false - }, - "schedule": null, - "incremental": false, - "remediation": "none", - "max_records_analyzed_per_partition": -1, - "greater_than_time": null, - "greater_than_batch": null, - "high_count_rollup_threshold": 10, - "enrichment_source_record_limit": 10, - "status": { - "total_containers": 2, - "containers_analyzed": 2, - "partitions_scanned": 2, - "records_processed": 28, - "anomalies_identified": 2 - }, - "containers": [ - { - "id": 234, - "name": "Container1", - "container_type": "table", - "table_type": "table" - }, - { - "id": 235, - "name": "Container2", - "container_type": "table", - "table_type": "table" - } - ], - "container_scans": [ - { - "id": 456, - "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "container": { - "id": 235, - "name": "Container2", - "container_type": "table", - "table_type": "table" - }, - "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "records_processed": 8, - "anomaly_count": 1, - "result": "success", - "message": null - }, - { - "id": 457, - "created": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "container": { - "id": 234, - "name": "Container1", - "container_type": "table", - "table_type": "table" - }, - "start_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "end_time": "YYYY-MM-DDTHH:MM:SS.ssssssZ", - "records_processed": 20, - "anomaly_count": 1, - "result": "success", - "message": null - } - ], - "tags": [] - } - ], - "total": 1, - "page": 1, - "size": 50, - "pages": 1 - } - ``` - -## Troubleshooting - -### Unloadable Container Error - -When running a scan, a profile, or a quality check validation, you may encounter the following error for specific containers: - -``` -Container '' is marked as Unloadable. No attempt was made to load the container due to multiple consecutive failures in prior operations. -``` - -**Cause:** This error occurs when a container has failed in 3 consecutive scan or profile operations. The count combines scans and profiles chronologically. To prevent repeated failed attempts and optimize performance, Qualytics marks the container as "Unloadable" and skips it in subsequent operations. Any later attempt to use the container (running another scan or profile, or validating a quality check against it) surfaces the same error until the status is cleared. - -**Resolution:** The steps to clear the **Unloadable** status depend on the container type. - -#### Tables, views, and file patterns - -These containers are discovered from the datastore catalog, so a successful Sync operation resets their status. - -1. Navigate to your source datastore. -2. Run a [Sync Operation](../sync/sync.md) on the datastore. -3. Once the sync operation completes successfully, the container status is reset. -4. Re-run the operation that surfaced the error (scan, profile, or check validation). The previously unloadable container should now be processed normally. - -#### Computed assets (Computed Tables, Computed Files, Computed Joins) - -Computed assets are defined inside Qualytics, not discovered from the datastore catalog, so a Sync operation does not reset their status. Instead, you need to force an edit on the asset and re-save its definition. Re-saving runs the validation against the datastore again and clears the **Unloadable** status. - -1. Navigate to the source datastore where the computed asset lives. -2. Open the computed asset and click **Edit** to force the asset into edit mode. -3. Review the SQL query or transformation and apply any change needed to make it work again (for example, after a renamed source column, a removed table, or a schema change in an upstream container). If the definition still looks correct, you can re-save it as-is to force Qualytics to re-evaluate it. -4. Click **Validate** to confirm the query runs successfully against the datastore. -5. Click **Save**. Saving a successfully validated definition clears the **Unloadable** status, and the asset becomes available again for scans, profiles, and check validations. - -For details on editing computed assets, see [Computed Tables](../../container/computed-tables-and-files/computed-tables.md), [Computed Files](../../container/computed-tables-and-files/computed-files.md), and [Computed Joins](../../container/computed-join.md). - -!!! tip - If the container continues to fail in scans, profiles, or check validations after a Sync (tables, views, or file patterns) or after Validate + Save (computed assets), investigate the underlying cause of the failures. Common causes include: - - - Permission issues accessing the container. - - Schema changes that invalidate existing configurations. - - Network connectivity problems to the data source. - - Resource constraints or timeouts during data loading. - - For computed assets: SQL referencing a renamed, dropped, or relocated upstream object. diff --git a/docs/operations/scan/troubleshooting.md b/docs/operations/scan/troubleshooting.md new file mode 100644 index 0000000000..2ae7d8ca0a --- /dev/null +++ b/docs/operations/scan/troubleshooting.md @@ -0,0 +1,45 @@ +# :material-tools:{ .middle style="color: var(--q-brick)" } Scan Troubleshooting + +This page documents known scan-related errors and the steps to resolve them. Each section describes the error, its cause, and a step-by-step resolution. + +## Unloadable Container Error + +When running a scan, a profile, or a quality check validation, you may encounter the following error for specific containers: + +``` +Container '' is marked as Unloadable. No attempt was made to load the container due to multiple consecutive failures in prior operations. +``` + +**Cause:** This error occurs when a container has failed in 3 consecutive scan or profile operations. The count combines scans and profiles chronologically. To prevent repeated failed attempts and optimize performance, Qualytics marks the container as "Unloadable" and skips it in subsequent operations. Any later attempt to use the container (running another scan or profile, or validating a quality check against it) surfaces the same error until the status is cleared. + +**Resolution:** The steps to clear the **Unloadable** status depend on the container type. + +### Tables, views, and file patterns + +These containers are discovered from the datastore catalog, so a successful Sync operation resets their status. + +1. Navigate to your source datastore. +2. Run a [Sync Operation](../sync/sync.md) on the datastore. +3. Once the sync operation completes successfully, the container status is reset. +4. Re-run the operation that surfaced the error (scan, profile, or check validation). The previously unloadable container should now process normally. + +### Computed assets ([Computed Tables](../../container/computed-tables-and-files/computed-tables.md), [Computed Files](../../container/computed-tables-and-files/computed-files.md), [Computed Joins](../../container/computed-join.md)) + +Computed assets are defined inside Qualytics, not discovered from the datastore catalog, so a Sync operation does not reset their status. Instead, you need to force an edit on the asset and re-save its definition. Re-saving runs the validation against the datastore again and clears the **Unloadable** status. + +1. Navigate to the source datastore where the computed asset lives. +2. Open the computed asset and click **Edit** to force the asset into edit mode. +3. Review the SQL query or transformation and apply any change needed to make it work again (for example, after a renamed source column, a removed table, or a schema change in an upstream container). If the definition still looks correct, you can re-save it as-is to force Qualytics to re-evaluate it. +4. Click **Validate** to confirm the query runs successfully against the datastore. +5. Click **Save**. Saving a successfully validated definition clears the **Unloadable** status, and the asset becomes available again for scans, profiles, and check validations. + +For details on editing computed assets, see [Computed Tables](../../container/computed-tables-and-files/computed-tables.md), [Computed Files](../../container/computed-tables-and-files/computed-files.md), and [Computed Joins](../../container/computed-join.md). + +!!! tip + If the container continues to fail in scans, profiles, or check validations after a Sync (tables, views, or file patterns) or after Validate + Save (computed assets), investigate the underlying cause of the failures. Common causes include: + + - Permission issues accessing the container. + - Schema changes that invalidate existing configurations. + - Network connectivity problems to the data source. + - Resource constraints or timeouts during data loading. + - For computed assets: SQL referencing a renamed, dropped, or relocated upstream object. diff --git a/docs/operations/sync/sync.md b/docs/operations/sync/sync.md index 287b6045fa..7a5d45089a 100644 --- a/docs/operations/sync/sync.md +++ b/docs/operations/sync/sync.md @@ -1,4 +1,4 @@ -# Sync Operation +# :material-database-sync-outline:{ .middle style="color: var(--q-brick)" } Sync Operation !!! warning This operation was renamed from **Catalog** to **Sync** in release **2026.3.20**. @@ -40,7 +40,7 @@ After a Sync operation, each container in your datastore is assigned one of the Containers that were previously **Inaccessible** or **Unloadable** are automatically restored to **Available** or **Changed** when they are successfully analyzed in a subsequent Sync operation. !!! tip - A container is also marked **Unloadable** when it fails 3 consecutive scan or profile operations. For tables, views, and file patterns, a successful Sync operation clears the status. Computed assets (Computed Tables, Computed Files, Computed Joins) are not discovered from the datastore catalog, so a Sync does not reset them. To restore a computed asset, force an edit on it (click **Edit** to enter edit mode), then click **Validate** and **Save** to re-evaluate the definition. See [Unloadable Container Error](../scan/scan.md#unloadable-container-error) for the full resolution steps. + A container is also marked **Unloadable** when it fails 3 consecutive scan or profile operations. For tables, views, and file patterns, a successful Sync operation clears the status. Computed assets (Computed Tables, Computed Files, Computed Joins) are not discovered from the datastore catalog, so a Sync does not reset them. To restore a computed asset, force an edit on it (click **Edit** to enter edit mode), then click **Validate** and **Save** to re-evaluate the definition. See [Unloadable Container Error](../scan/troubleshooting.md#unloadable-container-error) for the full resolution steps. ## Initialization & Operation Options diff --git a/docs/source-datastore/data-quality-score/deep-dive/introduction.md b/docs/source-datastore/data-quality-score/deep-dive/introduction.md index 31478f9519..a379cdc800 100644 --- a/docs/source-datastore/data-quality-score/deep-dive/introduction.md +++ b/docs/source-datastore/data-quality-score/deep-dive/introduction.md @@ -141,7 +141,7 @@ This allows you to align the scoring system with your organization's data govern Quality scores are automatically recalculated when: -- A [**Scan operation**](../../../operations/scan/scan.md){:target="_blank"} completes (anomalies detected or clean scan). +- A [**Scan operation**](../../../operations/scan/getting-started.md){:target="_blank"} completes (anomalies detected or clean scan). - A [**Profile operation**](../../../operations/profile/profile.md){:target="_blank"} completes (new field statistics available). - An **anomaly status changes** (acknowledged, resolved, etc.). - A **quality check is deleted**. diff --git a/docs/source-datastore/data-quality-score/deep-dive/permissions.md b/docs/source-datastore/data-quality-score/deep-dive/permissions.md index e61d4af889..e26e7d93d6 100644 --- a/docs/source-datastore/data-quality-score/deep-dive/permissions.md +++ b/docs/source-datastore/data-quality-score/deep-dive/permissions.md @@ -50,7 +50,7 @@ To **edit quality score settings**, a user must satisfy both layers: ## Triggering Recalculations -Quality scores are recalculated automatically after Scan and Profile operations. Running these operations requires the **Editor** team permission. See the [Scan Operation](../../../operations/scan/scan.md){:target="_blank"} and [Profile Operation](../../../operations/profile/profile.md){:target="_blank"} pages for details on operation permissions. +Quality scores are recalculated automatically after Scan and Profile operations. Running these operations requires the **Editor** team permission. See the [Scan Operation](../../../operations/scan/getting-started.md){:target="_blank"} and [Profile Operation](../../../operations/profile/profile.md){:target="_blank"} pages for details on operation permissions. !!! info "Full Permissions Reference" For the complete permissions and roles matrix across all Qualytics features, see the [Team Permissions](../../../settings/security/teams/team-permissions/overview.md){:target="_blank"} page. diff --git a/docs/source-datastore/data-quality-score/faq.md b/docs/source-datastore/data-quality-score/faq.md index 4dca39a654..20b2866304 100644 --- a/docs/source-datastore/data-quality-score/faq.md +++ b/docs/source-datastore/data-quality-score/faq.md @@ -96,7 +96,7 @@ Yes. Weights range from **0.0 to 2.0** (in 0.1 increments). A weight of 2.0 doub Quality scores are automatically recalculated when: -- A [**Scan operation**](../../operations/scan/scan.md){:target="_blank"} completes (anomalies detected or clean scan). +- A [**Scan operation**](../../operations/scan/getting-started.md){:target="_blank"} completes (anomalies detected or clean scan). - A [**Profile operation**](../../operations/profile/profile.md){:target="_blank"} completes (new field statistics). - An **anomaly status changes** (acknowledged, resolved, etc.). - A **quality check is deleted**. diff --git a/docs/source-datastore/datastore/dfs/overview-of-a-dfs-datastore.md b/docs/source-datastore/datastore/dfs/overview-of-a-dfs-datastore.md index 8b08b26e50..38ad0fc1bd 100644 --- a/docs/source-datastore/datastore/dfs/overview-of-a-dfs-datastore.md +++ b/docs/source-datastore/datastore/dfs/overview-of-a-dfs-datastore.md @@ -100,7 +100,7 @@ Once a DFS datastore is added, you can run the following operations: | :--- | :--- | | [Sync](../../../operations/sync/sync.md){:target="_blank"} | Walks the directory tree, reads files with supported extensions, and creates containers based on file metadata and naming patterns. Detects new, changed, or removed files incrementally. | | [Profile](../../../operations/profile/profile.md){:target="_blank"} | Analyzes records across containers to compute statistics, detect data patterns, and automatically infer quality checks. | -| [Scan](../../../operations/scan/scan.md){:target="_blank"} | Executes quality checks against the data, measures data quality metrics, and detects anomalies at the record and schema levels. | +| [Scan](../../../operations/scan/getting-started.md){:target="_blank"} | Executes quality checks against the data, measures data quality metrics, and detects anomalies at the record and schema levels. | | [External Scan](../../../operations/external-scan/external-scan.md){:target="_blank"} | Runs scan operations using externally provided data files. | !!! tip diff --git a/docs/source-datastore/datastore/jdbc/athena/troubleshooting.md b/docs/source-datastore/datastore/jdbc/athena/troubleshooting.md index 0889a19237..b2d1deea2f 100644 --- a/docs/source-datastore/datastore/jdbc/athena/troubleshooting.md +++ b/docs/source-datastore/datastore/jdbc/athena/troubleshooting.md @@ -1,4 +1,4 @@ -# ![](../../../../assets/shared/connector-logos/logo-athena.svg){ width="36" style="vertical-align: middle;" } Athena Troubleshooting +# :material-tools:{ .middle style="color: var(--q-brick)" } Athena Troubleshooting Connection failures from AWS are translated by Qualytics into concise, actionable messages prefixed with `AWS connection failed (Athena):`. For the common credential, IAM Role, permission, and S3 output errors, Qualytics shows the translated message instead of the raw AWS SDK trace (canonical strings, signatures, request IDs). diff --git a/docs/source-datastore/datastore/jdbc/overview-of-a-jdbc-datastore.md b/docs/source-datastore/datastore/jdbc/overview-of-a-jdbc-datastore.md index 0415f1b996..33035ebdc3 100644 --- a/docs/source-datastore/datastore/jdbc/overview-of-a-jdbc-datastore.md +++ b/docs/source-datastore/datastore/jdbc/overview-of-a-jdbc-datastore.md @@ -84,7 +84,7 @@ Once a JDBC datastore is added, you can run the following operations to manage a | :--- | :--- | | [Sync](../../../operations/sync/sync.md){:target="_blank"} | Discovers tables, views, and fields from your database. Detects new, changed, or removed containers incrementally. This is always the first operation after adding a datastore. | | [Profile](../../../operations/profile/profile.md){:target="_blank"} | Analyzes records across containers to compute statistics, detect data patterns, and have Qualytics AI generate quality checks (governed by the **AI Effort** setting). | -| [Scan](../../../operations/scan/scan.md){:target="_blank"} | Executes quality checks against the data, measures data quality metrics, and detects anomalies at the record and schema levels. | +| [Scan](../../../operations/scan/getting-started.md){:target="_blank"} | Executes quality checks against the data, measures data quality metrics, and detects anomalies at the record and schema levels. | | [External Scan](../../../operations/external-scan/external-scan.md){:target="_blank"} | Runs scan operations using externally provided data files instead of reading directly from the database. | !!! tip diff --git a/docs/source-datastore/datastore/jdbc/sap-hana/troubleshooting.md b/docs/source-datastore/datastore/jdbc/sap-hana/troubleshooting.md index 3d5333845b..9e8bf0e599 100644 --- a/docs/source-datastore/datastore/jdbc/sap-hana/troubleshooting.md +++ b/docs/source-datastore/datastore/jdbc/sap-hana/troubleshooting.md @@ -1,4 +1,4 @@ -# ![](../../../../assets/shared/connector-logos/logo-sap-hana.svg){ width="36" style="vertical-align: middle;" } SAP HANA Troubleshooting +# :material-tools:{ .middle style="color: var(--q-brick)" } SAP HANA Troubleshooting Most failures while creating or running a SAP HANA datastore fall into three categories: network, credentials, and privileges. The table below maps each common error message returned by the SAP HANA JDBC driver to its likely cause and a concrete fix. diff --git a/docs/source-datastore/datastore/overview-of-a-datastore.md b/docs/source-datastore/datastore/overview-of-a-datastore.md index 3682a07c15..71d89a4291 100644 --- a/docs/source-datastore/datastore/overview-of-a-datastore.md +++ b/docs/source-datastore/datastore/overview-of-a-datastore.md @@ -323,7 +323,7 @@ Once a datastore is added in Qualytics, you can perform three key operations to Finally, the Scan Operation enforces data quality checks on the collections. It identifies anomalies at the record and schema levels, highlights structural issues, and records all findings for further analysis. Flexible options allow for incremental scans, specific table/file scans, and scheduling future scans. - For more details about the scan operation, refer to the "[**Scan Operation**](../../operations/scan/scan.md)" document. + For more details about the scan operation, refer to the "[**Scan Operation**](../../operations/scan/getting-started.md)" document. By performing these operations sequentially, you can efficiently manage and ensure the quality of your data in Qualytics. diff --git a/docs/source-datastore/tags/deep-dive/introduction.md b/docs/source-datastore/tags/deep-dive/introduction.md index 24361fcd9f..42aa4d0644 100644 --- a/docs/source-datastore/tags/deep-dive/introduction.md +++ b/docs/source-datastore/tags/deep-dive/introduction.md @@ -46,7 +46,7 @@ Tags can be used to filter which containers are included in Profile and Scan ope - This is especially useful for large datastores where you want to focus quality checks on specific subsets of data. !!! tip "Configuring Tag Filters in Operations" - When running a Profile or Scan operation, select the **Tag** option in the container selection step to filter by tags. See the [Scan Operation](../../../operations/scan/scan.md){:target="_blank"} or [Profile Operation](../../../operations/profile/profile.md){:target="_blank"} documentation for step-by-step instructions. + When running a Profile or Scan operation, select the **Tag** option in the container selection step to filter by tags. See the [Scan Operation](../../../operations/scan/getting-started.md){:target="_blank"} or [Profile Operation](../../../operations/profile/profile.md){:target="_blank"} documentation for step-by-step instructions. !!! note Tags and container selection are mutually exclusive in operations β€” you can filter by tags **or** by specific container names, but not both at the same time. diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css index 7814607179..3b1ac8c3f9 100644 --- a/docs/stylesheets/extra.css +++ b/docs/stylesheets/extra.css @@ -243,6 +243,13 @@ --q-orange: #F96719; --q-invalid: #5A678F; + /* Frontend Palette Sync (Quasar tokens used in the app) */ + --q-tertiary: #5B86AA; + --q-info: #2163D0; + --q-cyan: #33CFFF; + --q-navy: #001A49; + --q-amber: #F59E0B; + /* MkDocs Variables */ --md-default-fg-color: var(--q-black); --md-default-fg-color--light: var(--q-black); @@ -560,6 +567,13 @@ --q-orange: #F96719; --q-invalid: #828db0; + /* Frontend Palette Sync (Quasar tokens used in the app) */ + --q-tertiary: #5B86AA; + --q-info: #2163D0; + --q-cyan: #33CFFF; + --q-navy: #001A49; + --q-amber: #F59E0B; + /* MkDocs Variables */ --md-default-fg-color: var(--q-white); --md-default-fg-color--light: var(--q-white); @@ -1602,3 +1616,39 @@ div.highlight { color: var(--q-negative) !important; fill: var(--q-negative) !important; } + +/* Stacked icon: filled triangle base with a smaller check-bold overlay. + Used to document the Anomalies Auto-Resolved indicator in Scan operations. */ +.md-typeset .qua-icon-auto-resolved { + position: relative; + display: inline-block; + width: 1.125em; + height: 1.125em; + vertical-align: middle; +} +.md-typeset .qua-icon-auto-resolved .twemoji { + position: absolute; + display: flex; + align-items: center; + justify-content: center; + height: auto; +} +.md-typeset .qua-icon-auto-resolved .twemoji:first-child { + inset: 0; +} +.md-typeset .qua-icon-auto-resolved .twemoji:first-child svg { + width: 1.125em; + height: 1.125em; + fill: currentColor; +} +.md-typeset .qua-icon-auto-resolved .twemoji:last-child { + top: 60%; + left: 50%; + transform: translate(-50%, -50%); + z-index: 1; +} +.md-typeset .qua-icon-auto-resolved .twemoji:last-child svg { + width: 0.55em; + height: 0.55em; + fill: var(--md-default-bg-color); +} diff --git a/mkdocs.yml b/mkdocs.yml index 3d814c3bac..2771135598 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -256,12 +256,58 @@ nav: - Computed Fields Details: fields/computed-fields/computed-fields-details.md - Add Computed Fields: fields/computed-fields/add-computed-fields.md - Operations: + - Overview: operations/overview.md + - Runs: + - Getting Started: operations/runs/getting-started.md + - Deep Dive: + - Introduction: operations/runs/deep-dive/introduction.md + - Lifecycle: operations/runs/deep-dive/lifecycle.md + - Available Actions: operations/runs/deep-dive/actions.md + - Permissions: operations/runs/deep-dive/permissions.md + - By Types: + # - Sync: + # - Success: operations/runs/by-types/sync/success.md + # - Success with Warning: operations/runs/by-types/sync/success-with-warning.md + # - Failure: operations/runs/by-types/sync/failure.md + # - Aborted: operations/runs/by-types/sync/aborted.md + # - Running: operations/runs/by-types/sync/running.md + # - Queued: operations/runs/by-types/sync/queued.md + # - Profile: + # - Success: operations/runs/by-types/profile/success.md + # - Success with Warning: operations/runs/by-types/profile/success-with-warning.md + # - Failure: operations/runs/by-types/profile/failure.md + # - Aborted: operations/runs/by-types/profile/aborted.md + # - Running: operations/runs/by-types/profile/running.md + # - Queued: operations/runs/by-types/profile/queued.md + - Scan: + - Success: operations/runs/by-types/scan/success.md + - Success with Warning: operations/runs/by-types/scan/success-with-warning.md + - Failure: operations/runs/by-types/scan/failure.md + - Aborted: operations/runs/by-types/scan/aborted.md + - Running: operations/runs/by-types/scan/running.md + # - Queued: operations/runs/by-types/scan/queued.md + - API: operations/runs/api.md + - FAQ: operations/runs/faq.md - Sync: - Sync: operations/sync/sync.md - Profile: - Profile: operations/profile/profile.md - Scan: - - Scan: operations/scan/scan.md + - Getting Started: operations/scan/getting-started.md + - Deep Dive: + - Read Strategies: operations/scan/deep-dive/read-strategies.md + - Scan Settings: operations/scan/deep-dive/scan-settings.md + - Permissions: operations/scan/deep-dive/permissions.md + - How-tos: + - 1. Select Tables: operations/scan/how-tos/select-tables.md + - 2. Select Check Categories: operations/scan/how-tos/select-check-categories.md + - 3. Read Settings: operations/scan/how-tos/read-settings.md + - 4. Scan Settings: operations/scan/how-tos/scan-settings.md + - 5. Schedule Options: operations/scan/how-tos/schedule-options.md + - Use Runtime Variables: operations/scan/how-tos/use-runtime-variables.md + - Troubleshooting: operations/scan/troubleshooting.md + - API: operations/scan/api.md + - FAQ: operations/scan/faq.md - External Scan: - External Scan: operations/external-scan/external-scan.md - Export Operation: @@ -987,7 +1033,7 @@ plugins: 'source-datastore/catalog.md': 'operations/sync/sync.md' 'source-datastore/operations/catalog.md': 'operations/sync/sync.md' 'source-datastore/profile.md': 'operations/profile/profile.md' - 'source-datastore/scan.md': 'operations/scan/scan.md' + 'source-datastore/scan.md': 'operations/scan/getting-started.md' 'source-datastore/external-scan.md': 'operations/external-scan/external-scan.md' 'source-datastore/right-click-options.md': 'source-datastore/tips-and-tricks/right-click-options.md' 'source-datastore/assign-tags.md': 'source-datastore/tags/how-tos/assign-tags.md' @@ -1134,7 +1180,7 @@ plugins: # Operations consolidated into top-level Operations section 'source-datastore/operations/sync.md': 'operations/sync/sync.md' 'source-datastore/operations/profile.md': 'operations/profile/profile.md' - 'source-datastore/operations/scan.md': 'operations/scan/scan.md' + 'source-datastore/operations/scan.md': 'operations/scan/getting-started.md' 'source-datastore/operations/external-scan.md': 'operations/external-scan/external-scan.md' 'container/operations/export-operation.md': 'operations/export-operation/export-operation.md' 'container/operations/materialize-operation.md': 'operations/materialize-operation/materialize-operation.md' @@ -1274,6 +1320,8 @@ plugins: 'container/operations/promote-computed-fields.md': 'operations/promote/managing-promotions/promote-computed-fields.md' 'source-datastore/operations/promote-computed-tables.md': 'operations/promote/managing-promotions/promote-computed-tables.md' 'source-datastore/operations/promote-computed-files.md': 'operations/promote/managing-promotions/promote-computed-files.md' + # Scan operation split into multi-page section + 'operations/scan/scan.md': 'operations/scan/getting-started.md' - print-site: add_to_navigation: false print_page_title: 'Print Site'