Skip to content

QUA-1024: Restructure Entity Resolution check docs#1139

Open
RafaelOsiro wants to merge 2 commits into
mainfrom
qua-1024-improve-check-rule-type-user-guide
Open

QUA-1024: Restructure Entity Resolution check docs#1139
RafaelOsiro wants to merge 2 commits into
mainfrom
qua-1024-improve-check-rule-type-user-guide

Conversation

@RafaelOsiro

Copy link
Copy Markdown
Contributor

Overview

Restructures the Entity Resolution check page into the five-page set introduced for Unique (introduction, how-it-works, examples, api, faq) and aligns the content with the current Multi-Field Entity Resolution shape on develop.

Key Changes

  • introduction: definition, multi-field overview, distinction-field accepted types, target-field type matrix, filter-only General Properties, shape-only Anomaly Types, Next Steps grid.
  • how-it-works: 5-step evaluation flow, target field types (String/Numeric/Datetime) with per-type match_type tables, three optional knobs on fuzzy strings, weighted composite formula, composite match threshold, filter behavior, cluster identifier _qualytics_entity_id, Shape Anomaly message template, source-records behavior, performance considerations, relationship with Unique/Not Null/Satisfies Expression.
  • examples: three production scenarios (customer master dedup, business consolidation with phonetic and substring overrides, tenant-scoped resolution with a blocking field). Each scenario shows the actual Source Records the platform renders.
  • api: endpoints, full payload example with upickle_type discriminator, top-level field notes, per-target-field-type tables (String/Numeric/Datetime) each with a upickle_type row and correct match_type values.
  • faq: 12 questions covering behavior, anomaly reporting, and configuration.
  • mkdocs.yml: nav expanded into the five-page set; redirects from checks/entity-resolution.md and data-quality-checks/entity-resolution.md to the new introduction so existing bookmarks keep working.
  • overview-of-a-check.md and rule-types-overview.md: link targets updated to entity-resolution/introduction.md with descriptions matching the current rule shape.

Pages to Test

…xamples/api/faq

Mirrors the page set introduced for the Unique check (QUA-1806) and updates
content to match the current Multi-Field Entity Resolution shape on develop:

- Shape Anomaly (not Record), target_fields array with per-field match types,
  composite_match_threshold, weighted composite scoring, transitive grouping,
  blocking via match_type: exact, and Source Records that show one example
  row per distinct distinction-field value within each non-compliant cluster.
- Target field shapes documented with the upickle_type discriminator
  (StringTargetField, NumericTargetField, DateTimeTargetField) so payloads
  copied from the docs deserialize correctly.
- mkdocs.yml: nav expanded into the five-page set; redirects from
  checks/entity-resolution.md and data-quality-checks/entity-resolution.md
  to the new introduction so existing bookmarks keep working.
- overview-of-a-check.md and rule-types-overview.md: link targets updated
  and descriptions revised to reflect the current rule shape.
@RafaelOsiro RafaelOsiro added the documentation Improvements or additions to documentation label Jun 10, 2026
@RafaelOsiro RafaelOsiro self-assigned this Jun 10, 2026
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the single-page Entity Resolution doc with a five-page set (Introduction, How It Works, Examples, API, FAQ) aligned with the current multi-field rule shape, and wires up redirects so existing bookmarks continue to work.

  • Behavioral change surfaced in docs: the old page used the record-only anomaly-type include; the new introduction correctly uses shape-only, reflecting that Entity Resolution emits only a Shape Anomaly at the cluster level, not per-row Record Anomalies.
  • API shape updated: the new API page documents the target_fields array with per-entry upickle_type discriminators (StringTargetField / NumericTargetField / DateTimeTargetField), removes the old top-level pair_substrings/pair_homophones/spelling_similarity_threshold properties, and correctly notes that coverage and anomaly_message_field are not applicable. Both legacy URL paths (checks/entity-resolution.md and data-quality-checks/entity-resolution.md) now redirect to the new introduction via the mkdocs-redirects plugin.
  • Minor discrepancy: the PR description and testing checklist reference "12 questions" in the FAQ, but the page as written contains 13 Q&As (an extra question under the Configuration section — "Can the same field appear as both a blocking field and a fuzzy field?"). This does not affect correctness, but the checklist count may need a small update.

Confidence Score: 5/5

Documentation-only restructuring; no executable code changed. All internal cross-links resolve to existing files, include-markdown markers are present in the component files, and both legacy redirects are correctly wired in mkdocs.yml.

The five new pages are internally consistent, cross-links to Unique/Not Null/Satisfies Expression all resolve, the include-markdown paths follow the same pattern used by the working Unique introduction page, and the redirect entries cover both legacy URL paths. The only discrepancy is the PR checklist calling out 12 questions while the FAQ contains 13 — a counting error in the PR description rather than a content defect.

No files require special attention. The mkdocs.yml redirect entries and the introduction.md include-markdown markers are the two places most likely to cause a silent build failure, and both are correct.

Important Files Changed

Filename Overview
docs/data-quality-checks/entity-resolution/introduction.md New intro page: definition, multi-field overview, distinction-field type matrix, correct filter-only general-props include, shape-only anomaly-types include (changed from old record-only), and 4-card Next Steps grid.
docs/data-quality-checks/entity-resolution/how-it-works.md New detailed reference page: 5-step evaluation flow, per-type match_type tables (String/Numeric/Datetime), weighted composite formula, threshold tuning, filter behavior, cluster identifier, Shape Anomaly message template, source-records rules, performance tips, and cross-links to Unique/Not Null/Satisfies Expression (all verified to exist).
docs/data-quality-checks/entity-resolution/examples.md New examples page: three tabbed scenarios (customer dedup, business consolidation with homophones, tenant-scoped with blocking field); each has payload, source-records table with _qualytics_entity_id, anomaly message, and flowchart; all upickle_type discriminators are present and correct.
docs/data-quality-checks/entity-resolution/api.md New API reference page: endpoints table, full payload with upickle_type discriminator, top-level field notes (correctly omits max_distinct_records, correctly notes anomaly_message_field is silently ignored, correctly notes coverage is not accepted), and per-type target-field tables with correct match_type values.
docs/data-quality-checks/entity-resolution/faq.md New FAQ page: 13 Q&As across Behavior, Anomaly Reporting, and Configuration sections; content is internally consistent with how-it-works.md on NULL handling, filter order, source-records rules, and shape-only anomaly type.
docs/data-quality-checks/entity-resolution.md Old single-page doc deleted; a redirect entry in mkdocs.yml maps the old URL to the new introduction page, so bookmarks remain intact.
mkdocs.yml Nav expanded to 5-page sub-tree under Entity Resolution; two redirect entries added (checks/entity-resolution.md updated to new intro, data-quality-checks/entity-resolution.md newly added) so both legacy URL paths keep working.
docs/data-quality-checks/overview-of-a-check.md Entity Resolution row link updated to entity-resolution/introduction.md with a description that reflects the new multi-field shape; no other rows changed.
docs/data-quality-checks/rule-types-overview.md Entity Resolution row link updated to entity-resolution/introduction.md with matching description; no other rows changed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    OLD["data-quality-checks/entity-resolution.md\n(deleted)"] -- "redirect" --> INTRO
    LEGACY["checks/entity-resolution.md\n(redirect)"] -- "redirect" --> INTRO
    INTRO["entity-resolution/introduction.md\n• Definition\n• Field Scope\n• General Properties (filter-only)\n• Anomaly Types (shape-only)\n• Next Steps grid"]
    INTRO --> HIW
    INTRO --> EX
    INTRO --> API
    INTRO --> FAQ
    HIW["how-it-works.md\n• 5-step evaluation flow\n• match_type tables\n• Composite formula\n• Shape Anomaly template"]
    EX["examples.md\n• Customer dedup\n• Business homophones\n• Tenant-scoped blocking"]
    API["api.md\n• Endpoints\n• Full payload\n• Target-field tables"]
    FAQ["faq.md\n• 13 Q&As"]
    OVR["overview-of-a-check.md"] -- "link updated" --> INTRO
    RTO["rule-types-overview.md"] -- "link updated" --> INTRO
Loading

Reviews (1): Last reviewed commit: "docs(entity-resolution): restructure int..." | Re-trigger Greptile

…s tables

Replaces the legacy text-negative spans (red text only) in the three
Source Records tables across entity-resolution/examples.md with the new
.anomalous-cell utility class, which renders each anomalous cell with
an orange outline and warning-tinted background. This mirrors the
source-records-container.vue treatment in the Qualytics frontend and
keeps the Sample Data visual treatment consistent with the new pattern
introduced for Expected Values.

The .anomalous-cell class is also added to docs/stylesheets/extra.css
on this branch (the same block is in the Expected Values restructure
PR; whichever PR merges first absorbs the other's overlap as a trivial
duplicate-class conflict).

@ets ets left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — greptile-apps scored this 5/5.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants