fix: add fingerprint and commit-hash dedup#133
Conversation
…oding
Adds two new dedup checks before vector similarity in QdrantService::upsert():
- Fingerprint tag matching: entries with `fingerprint:{hash}` tags are checked
against existing entries via Qdrant scroll filter
- Title+commit hash matching: same title + same commit hash = same CI event,
preventing duplicate snapshots from test runs
- Stores `commit` field in payload for future dedup checks
Includes 4 new unit tests covering both rejection and pass-through cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughImplements enhanced duplicate detection in QdrantService with fingerprint-based and title+commit-based dedup mechanisms alongside existing content-hash dedup. Adds three private helper methods to support the new dedup logic and stores the commit field in payload during upsert operations. Includes comprehensive unit tests covering all new dedup scenarios. Changes
Sequence DiagramsequenceDiagram
actor Caller
participant QdrantService
participant Database as Database/Qdrant
Caller->>QdrantService: upsert(entry, project, checkDuplicates=true)
alt checkDuplicates enabled
QdrantService->>QdrantService: extractFingerprint(tags)
alt fingerprint exists
QdrantService->>Database: findByFingerprint(fingerprint, project)
Database-->>QdrantService: match found
QdrantService->>Caller: throw DuplicateEntryException ❌
end
alt commit hash provided
QdrantService->>Database: findByTitleAndCommit(title, commit, project)
Database-->>QdrantService: match found
QdrantService->>Caller: throw DuplicateEntryException ❌
end
end
QdrantService->>QdrantService: computeContentHash(entry)
QdrantService->>Database: upsertPoints(points, collection)
Database-->>QdrantService: success
QdrantService->>Caller: return true ✓
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage Report
Files Below Threshold
🏆 Synapse Sentinel Gate |
🏆 Sentinel Certified✅ Tests & Coverage: 0 tests passed Add this badge to your README: [](https://github.com/conduit-ui/knowledge/actions/workflows/gate.yml) |
Summary
fingerprint:{hash}tags are matched against existing entries via Qdrant scroll filter before vector similarity checkcommitfield in Qdrant payload for future dedup lookupsContext
The
defaultproject has 4,549 entries, many identical CI test snapshots. This prevents future flooding by catching duplicates at two new layers before the existing vector similarity check.Test plan
DuplicateEntryExceptionDuplicateEntryException🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests