Skip to content

feat(#782): sk learn --dedupe — pre-write FTS5 similarity check#788

Merged
magicpro97 merged 4 commits into
mainfrom
feat/i782-learn-dedupe
May 31, 2026
Merged

feat(#782): sk learn --dedupe — pre-write FTS5 similarity check#788
magicpro97 merged 4 commits into
mainfrom
feat/i782-learn-dedupe

Conversation

@magicpro97
Copy link
Copy Markdown
Owner

Implements #782. FTS5 BM25 similarity check before INSERT in learn.py. --dedupe warn/block/off. --merge . Closes #782

…ear-duplicates

- _find_similar_entries(): BM25 query against knowledge_fts, threshold -3.0
- --dedupe warn (default): print warning, allow write
- --dedupe block: prevent write if near-duplicate found
- --dedupe off: skip check (for batch ingest)
- --merge <id>: UPDATE existing entry instead of INSERT

Closes #782

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 31, 2026 05:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Linh Ngo and others added 3 commits May 31, 2026 13:19
The dedupe check opened a get_db() connection and explicitly closed it
before _write_learn_entry. Tests patch get_db() to return a single
shared connection, so closing it made the later json_mode db.execute()
fail with 'Cannot operate on a closed database'.

Use del to drop the local reference instead; in production CPython the
connection is released immediately (refcount → 0), while in tests the
shared connection stays alive because the test closure still holds it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that call learn.main() directly without patching get_db() would
hit the real get_db() → sys.exit(1) when no knowledge.db exists in CI.
Wrap the dedupe DB open in try/except SystemExit so missing DB causes
skip (no similar entries) rather than aborting the write.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@magicpro97 magicpro97 merged commit e0c4217 into main May 31, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(learn): --dedupe pre-write FTS5 similarity check — warn on near-duplicate entries before INSERT

2 participants