fix: overhaul duplicate detection scoring, add address matching, trigger after imports#22
Open
bashar-qassis wants to merge 4 commits intomainfrom
Open
fix: overhaul duplicate detection scoring, add address matching, trigger after imports#22bashar-qassis wants to merge 4 commits intomainfrom
bashar-qassis wants to merge 4 commits intomainfrom
Conversation
…ger after imports The duplicate detection worker had several bugs preventing it from catching obvious duplicates: - Scoring formula (name*0.4 + email*0.35 + phone*0.25 with threshold 0.4) meant contacts sharing the same email but with different names scored 0.35, below the threshold — silently missed. - Email comparison was case-sensitive. - Only one side of email/phone field pairs had its type verified. - Address data was completely ignored. - No import worker triggered duplicate detection after completion. Fixes: - Replace additive scoring with max-signal + bonus approach where each signal independently qualifies (email=0.85, phone=0.75, address=0.60, name=similarity) - Add case-insensitive email matching via LOWER() fragments - Filter both cf1 and cf2 contact_field_types in email/phone queries - Use LIKE 'mailto%' pattern to handle protocol colon inconsistency - Add address matching on normalized line1 + postal_code - Enqueue DuplicateDetectionWorker after successful completion in all three import workers (MonicaApiCrawlWorker, ImportSourceWorker, ImportWorker) - Add comprehensive test suite (20 tests) for the detection worker
5850c3f to
38cadb8
Compare
list_candidates now takes limit/offset opts (default 20 per page). The LiveView loads one page at a time with a "Load more" button. Dismiss removes the candidate from the current list without reloading.
The /contacts/duplicates route uses ContactLive.Index, not the standalone Duplicates LiveView. Added limit/offset pagination with Load more button and optimistic dismiss (no full re-query) to match the standalone page.
Photos with the same content_hash on both contacts caused a unique constraint violation during merge. Now deletes duplicate photos from the non-survivor before transferring the rest, matching the pattern used for contact_tags and activity_contacts. Also collapsed the merge flow from 4 steps to 3 by combining the preview and confirm steps into a single "Review & merge" step. From the duplicates page (contact preselected), merge is now 2 clicks instead of 3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
line1 + postal_code(case-insensitive, trimmed).LIKE 'mailto%'pattern, handling the colon inconsistency between seeded and custom-created field types.DuplicateDetectionWorkeron successful completion.Test plan
mix compile --warnings-as-errors— cleanmix test— 1035 tests, 0 failuresmix quality— format, credo, sobelow, dialyzer all pass