Skip to content

Clarify DuckDB BM25 index lifecycle#75

Merged
t-kalinowski merged 6 commits into
mainfrom
fix/issue-74-build-index-errors
Apr 27, 2026
Merged

Clarify DuckDB BM25 index lifecycle#75
t-kalinowski merged 6 commits into
mainfrom
fix/issue-74-build-index-errors

Conversation

@t-kalinowski

Copy link
Copy Markdown
Member

User Facing Changes

  • Update the README quickstart example to include the required store.build_index() call.

  • retrieve() and retrieve_bm25() with a DuckDB store now raise a clearer error when retrieval is called before the BM25 index has been built, or after writes have made the BM25 index stale.

For example, this now fails with an actionable error:

from raghilda.store import DuckDBStore

store = DuckDBStore.create("raghilda.db", embed=None)

# ... upsert chunked documents ...

store.retrieve_bm25("query", top_k=5)
RuntimeError: DuckDBStore retrieval requires a current BM25 index. Call `store.build_index("bm25")` after inserting or updating documents and before calling `retrieve_bm25()` or `retrieve()`.

Internal Changes

  • Track BM25 freshness in DuckDB store metadata with bm25_index_is_current.
  • Mark BM25 stale after upsert().
  • Mark BM25 current after build_index("bm25").
  • Restore BM25 state on reconnect without adding a metadata query to the retrieval hot path.
  • Keep legacy stores backward-compatible by trusting an existing DuckDB FTS index when the new metadata column is missing.
  • Leave HNSW behavior unchanged because DuckDB maintains HNSW indexes across writes, while DuckDB FTS/BM25 requires rebuilding.

BM25 freshness is tracked per store handle. The implementation intentionally avoids re-reading metadata on every retrieval; multiple live DuckDBStore handles pointing at the same database file are unsupported.

Closes #74

@t-kalinowski t-kalinowski force-pushed the fix/issue-74-build-index-errors branch from 0ca5619 to 15220a2 Compare April 24, 2026 12:33
@t-kalinowski t-kalinowski marked this pull request as ready for review April 24, 2026 13:21
@t-kalinowski t-kalinowski requested a review from dfalbel April 24, 2026 13:21
@t-kalinowski t-kalinowski merged commit e0349e4 into main Apr 27, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CatalogException: Scalar Function with name match_bm25 does not exist when using DuckDBStore.retrieve_bm25()

2 participants