Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/indexing/fts-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Check FTS index status using the API:
</CodeBlock>
</CodeGroup>

`wait_for_index(...)` waits until the named FTS index exists and `index_stats(...)` reports `num_unindexed_rows == 0`. It can time out if writes keep adding rows faster than the index catches up. If a table has multiple FTS indexes, specify the target text column when querying instead of relying on implicit selection.

### Asynchronous API

When using async connections (`connect_async`), use `create_index` with the `FTS` configuration:
Expand All @@ -48,6 +50,8 @@ When using async connections (`connect_async`), use `create_index` with the `FTS
The `create_fts_index` method is not available on `AsyncTable`. Use `create_index` with `FTS` config instead.
</Note>

In TypeScript, create an FTS index with `table.createIndex("text", { config: lancedb.Index.fts() })` and query it with `table.query().nearestToText(...)`.

## Configuration Options

### FTS Parameters
Expand Down
3 changes: 2 additions & 1 deletion docs/indexing/gpu-indexing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ into a synchronous process by waiting until the index is built.

</Info>

`wait_for_index(...)` waits for the index to exist and for `index_stats(...)` to report `num_unindexed_rows == 0`. It can time out if the table is receiving continuous writes while the build is trying to catch up.

## Manual GPU indexing in LanceDB OSS

You can use the Python SDK to manually create the `IVF_PQ` index on a GPU. You'll need
Expand Down Expand Up @@ -62,4 +64,3 @@ to enable GPU training on your device.

If you encounter the error `AssertionError: Torch not compiled with CUDA enabled`,
you need to [install PyTorch with CUDA support](https://pytorch.org/get-started/locally/).

6 changes: 6 additions & 0 deletions docs/indexing/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ LanceDB provides a comprehensive suite of indexing strategies for different data
TypeScript currently doesn't support `IvfSq` (IVF with Scalar Quantization).
</Note>

<Info>
**Operational checks**

For vector indexes, use the same distance metric when creating the index and searching it. After appends or other writes, use `optimize()` to fold new rows into existing indexes, then check `index_stats(...)` or `wait_for_index(...)` if you need to confirm the index has caught up. `wait_for_index(...)` waits until the named indexes exist and report `num_unindexed_rows == 0`; it can time out if writes keep adding unindexed rows.
</Info>

### Quantization Types

Vector indexes can use different quantization methods to compress vectors and improve search performance:
Expand Down
3 changes: 3 additions & 0 deletions docs/indexing/quantization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,15 @@ Use quantization when:

LanceDB currently exposes multiple quantized vector index types, including:
- `IVF_PQ` -- Inverted File index with Product Quantization (default). See the [vector indexing guide](/indexing/vector-index) for `IVF_PQ` examples.
- `IVF_SQ` -- Inverted File index with Scalar Quantization. This is available in Python and Rust; TypeScript does not currently expose `IvfSq`.
- `IVF_RQ` -- Inverted File index with **RaBitQ** quantization (binary, 1 bit per dimension). Requires vector dimensions divisible by `8`. See [below](#rabitq-quantization) for details.
- `IVF_HNSW_SQ` -- IVF partitions with an **HNSW graph per partition** plus **Scalar Quantization**. Strong recall/latency/size trade-off for most workloads.
- `IVF_HNSW_PQ` -- IVF partitions with an **HNSW graph per partition** plus **Product Quantization**. Prefer when PQ-level compression matters and you still want HNSW-style in-partition search.

Two axes are being combined here: whether partitions are searched flatly or via an HNSW graph (`IVF_*` vs. `IVF_HNSW_*`), and which quantizer compresses the vectors (`PQ`, `RQ`, or `SQ`). `IVF_PQ` is the default and works well in many cases. For more drastic compression, RaBitQ (`IVF_RQ`) is a reasonable option. For higher recall at low latency, the HNSW-backed variants are usually the right pick. The ["Choose the Right Index"](/indexing/vector-index#choose-the-right-index) table on the vector indexing page is the canonical decision tool.

Use the same distance metric when training the index and running queries against it. For IVF-based indexes, `num_partitions` controls the number of groups and `sample_rate` controls how many training vectors are sampled per partition, so the training sample is roughly `sample_rate * num_partitions`.

## RaBitQ quantization

RaBitQ is a binary quantization method that represents each normalized embedding using **1 bit per dimension**, plus a couple of small corrective scalars. In practice, a 1,024-dimensional `float32` vector that would normally take 4 KB can be compressed to roughly a few hundred bytes with RaBitQ, while still maintaining reasonable recall.
Expand Down
5 changes: 2 additions & 3 deletions docs/indexing/reindexing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Table optimization performs three maintenance operations:

1. **Compaction**: merges small fragments into larger ones to improve read performance
2. **Pruning/Cleanup**: removes files from versions older than a retention window (7 days by default)
3. **Index update**: adds newly-ingested data to existing indexes
3. **Index update**: adds newly-ingested data to existing vector, scalar, and FTS indexes

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
Expand All @@ -36,7 +36,7 @@ Table optimization performs three maintenance operations:
LanceDB Enterprise support incremental reindexing through an automated background process. When new data is added to a table, the system automatically triggers a new index build. As the dataset grows, indexes are asynchronously updated in the background.

- While indexes are being rebuilt, queries use brute force methods on unindexed rows, which may temporarily increase latency. To avoid this, set `fast_search=True` to search only indexed data.
- Use `index_stats()` to view the number of unindexed rows. This will be zero when indexes are fully up-to-date.
- Use `index_stats()` to view the number of unindexed rows. This will be zero when indexes are fully up-to-date. If you call `wait_for_index(...)`, it polls the same status and can time out while continuous writes keep adding unindexed rows.

The benefit of using LanceDB Enterprise is that it automates the reindexing process
and operates continuously in the background, minimizing the impact on latency under high loads.
Expand All @@ -57,4 +57,3 @@ If you need to reclaim space more aggressively in OSS, use a shorter retention w
```
</CodeGroup>


3 changes: 2 additions & 1 deletion docs/indexing/scalar-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ If you are using LanceDB Enterprise, the `create_scalar_index` API returns immed
</CodeBlock>
</CodeGroup>

`wait_for_index(...)` waits until the named scalar indexes exist and `index_stats(...)` reports `num_unindexed_rows == 0`. If a table is receiving steady writes, that fully indexed state may not stabilize before the timeout.

### 3. Update the Index

Updating the table data (adding, deleting, or modifying records) requires that you also update the scalar index. This can be done by calling `optimize`, which will trigger an update to the existing scalar index.
Expand Down Expand Up @@ -139,4 +141,3 @@ LanceDB supports scalar indexes on UUID columns (stored as `FixedSizeBinary(16)`
{ScalarIndexUuidUpsert}
</CodeBlock>
</CodeGroup>

2 changes: 2 additions & 0 deletions docs/indexing/vector-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ You can call `create_index()` with different parameters to create a new index --
Although the `create_index` API returns immediately, the building of the vector index is asynchronous. To wait until all data is fully indexed, you can specify the `wait_timeout` parameter.
</Note>

Use the same distance metric for index creation and search. Once a vector index exists, queries use the metric stored with that index. If you need to confirm an async build or refresh is finished, `wait_for_index(...)` waits for the named index to exist and for `index_stats(...)` to report `num_unindexed_rows == 0`; it can time out if new writes keep arriving.

## Choose the Right Index

Use this table as a quick starting point for choosing the right index type and quantization method for your use case:
Expand Down
6 changes: 5 additions & 1 deletion docs/reranking/custom-reranker.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,18 @@ cover, and only override the ones you need. The base class leaves `rerank_vector
overridden raises `NotImplementedError` rather than silently returning unsorted results. That's a
useful guard, but worth knowing about before you wire up a query path you didn't plan for.

The Python base class exposes hybrid, vector-only, and FTS-only rerank hooks. TypeScript and Rust
currently expose the custom reranker interface for hybrid reranking.

## Interface

The `Reranker` base interface comes with a `merge_results()` method that can be used to combine the
results of semantic and full-text search. This is a vanilla merging algorithm that simply concatenates
the results and removes the duplicates without taking the scores into consideration. It only keeps the
first copy of the row encountered. This works well in cases that don't require the scores of semantic
and full-text search to combine the results. If you want to use the scores or want to support
`return_score="all"`, you'll need to implement your own merging algorithm.
`return_score="all"`, you'll need to implement your own merging algorithm. The base
`return_score` option accepts only `"relevance"` and `"all"`.

Whichever methods you override, your reranker has one job on the way out: attach a
`_relevance_score` column with the most relevant rows at the top. LanceDB will reject the result
Expand Down
5 changes: 5 additions & 0 deletions docs/reranking/eval.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ score-based path most readers encounter first; `LinearCombinationReranker` is an
score-based strategy you opt into explicitly.
</Info>

By default, rerankers return `_relevance_score`. Pass `return_score="all"` when a reranker
supports it, and you also need the original vector or FTS scores for debugging.
Evaluation code can rely on returned rows being ordered by descending `_relevance_score`. Empty
reranked result sets still include the `_relevance_score` column.

The hybrid `rerank(...)` method also accepts a `normalize` argument that controls how the raw
vector and FTS scores are made comparable before reranking:

Expand Down
16 changes: 15 additions & 1 deletion docs/reranking/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ with models from Cohere, Sentence-Transformers, and more.

To use a reranker, you perform a search and then pass the results to the `rerank()` method.

Note that `CohereReranker()` requires the `cohere` package and either
`COHERE_API_KEY` in the environment or an `api_key` argument.

<CodeGroup>
```python Python icon="python"
import lancedb
Expand Down Expand Up @@ -42,6 +45,17 @@ LanceDB supports several rerankers out of the box. Here are a few examples:

You can find more details about these and other rerankers in the [integrations](/integrations/reranking) section.

Python also includes score-based rerankers such as `RRFReranker`, `LinearCombinationReranker`,
and `MRRReranker`, plus provider rerankers for OpenAI, Jina, Voyage AI, Answer.AI, and Cohere.
Provider rerankers usually need the provider package installed and either an API key argument or
the provider-specific environment variable.

Rerankers add `_relevance_score` and return rows ordered by descending relevance. Python rerankers
accept `return_score="relevance"` or `return_score="all"` − use `"all"` when you want to keep the
original vector distance or FTS score columns for debugging. Model-based rerankers read from
`column="text"` by default, so either return that column in the search results or pass a different
column.

<Note>
**SDK coverage differs across languages**

Expand Down Expand Up @@ -84,4 +98,4 @@ the `deduplicate` flag.

LanceDB also allows you to create custom rerankers by extending the base `Reranker` class. The custom reranker
should implement the `rerank` method that takes a list of search results and returns a reranked list of
search results. This is covered in more detail in the [creating custom rerankers](/reranking/custom-reranker/) section.
search results. This is covered in more detail in the [creating custom rerankers](/reranking/custom-reranker/) section.
4 changes: 3 additions & 1 deletion docs/search/filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ with filtering capabilities even on datasets containing billions of records.

**Pre-filtering** means LanceDB applies the metadata `where(...)` condition before running vector search, so the search only considers rows that already match the filter. **Post-filtering** means LanceDB runs vector search first and only then filters the returned candidates. Pre-filtering is enabled by default. In practice, pre-filtering is better when the filter is part of the result contract; post-filtering can be lower-latency for expensive or non-indexable filters, but it can return fewer than `limit` rows, or even zero, if the nearest neighbors do not pass the filter.

On hybrid queries, the same `where(...)` filter is applied to both the vector and full-text halves of the query. The prefilter or postfilter choice controls whether that happens before each subquery scores candidates or after the subquery top-k is produced.

## Example: Metadata Filtering

To illustrate filtering capabilities, let's try four data points with combinations of vectors and metadata:
Expand Down Expand Up @@ -228,4 +230,4 @@ For a column of type LIST(T), you can use `LABEL_LIST` to create a scalar index.

Both **pre-filtering** and **post-filtering** can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to call `bypass_vector_index()` if you know that the filter is highly selective.

Similarly, a highly selective post-filter can lead to false positives. Increasing both `nprobes` and `refine_factor` can mitigate this issue. When deciding between pre-filtering and post-filtering, pre-filtering is generally the safer choice if you're uncertain.
Similarly, a highly selective post-filter can lead to false positives. Increasing both `nprobes` and `refine_factor` can mitigate this issue. When deciding between pre-filtering and post-filtering, pre-filtering is generally the safer choice if you're uncertain.
2 changes: 2 additions & 0 deletions docs/search/full-text-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ If you want to specify which columns to search use `fts_columns="text"`
LanceDB automatically searches on the existing FTS index if the input to the search is of type `str`. If you provide a vector as input, LanceDB will search the ANN index instead.
</Note>

If a table has more than one FTS index, specify the indexed text column in the query. In Python you can use `fts_columns` or the query builder's `nearest_to_text(..., columns=...)`; in TypeScript, use `query().nearestToText(..., columns)`. The newer Lance-native FTS does not accept legacy Tantivy-only index parameters.

### Keeping the index up to date

Rows you add after building an FTS index aren't part of the index until you optimize the table. Until then, queries fall back to a flat scan over the unindexed fragments to keep results complete, which slows them down as the unindexed tail grows. Call `table.optimize()` to fold new rows into the existing index — it's the same operation used for vector indexes:
Expand Down
4 changes: 4 additions & 0 deletions docs/search/hybrid-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ text_query = "flower moon"

Hybrid queries inherit the same builder API as vector and FTS queries, so the same knobs for filtering, distance bounds, and row identity apply. These compose with `.rerank(...)` and the explicit `.vector()` / `.text()` form shown above.

<Info>
Always set `.limit(...)` on production hybrid queries. Without an explicit cap, the query builder does not give you a useful top-k contract to tune, and it may materialize more rows than you intended before reranking.
</Info>

### Returning row IDs

Pass `with_row_id(True)` (Python) or `withRowId()` (TypeScript) to include the internal `_rowid` column in the results. This is useful for joining hybrid results back to a primary table, or for deduping across multiple queries:
Expand Down
7 changes: 7 additions & 0 deletions docs/search/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,10 @@ icon: "list"
| [Hybrid Search](/search/hybrid-search/) | Combines vector and full-text search with reranking |
| [Filtering](/search/filtering/) | Filter results based on metadata fields |
| [SQL Queries](/search/sql/index) | SQL query capabilities for data exploration and analytics |

## Before you search

- Vector search can run without an ANN index as an exhaustive scan. That's useful while prototyping, but build a vector index before relying on low-latency searches over larger tables.
- Full-text and hybrid text search require an FTS index on the text column you query. If a table has multiple FTS indexes, specify the target column. FTS also supports phrase, boolean, boosted, multi-match, and fuzzy query forms when you need more than plain terms.
- Multivector search currently uses cosine similarity and accepts either one query vector or a matrix of query vectors; every query vector must match the inner dimension of the multivector column.
- Set an explicit `.limit(...)` for production queries. Query builders also support controls such as prefilter/postfilter, distance ranges, row-id inclusion, offset pagination, and Arrow/Pandas/list result materialization.
2 changes: 2 additions & 0 deletions docs/search/multivector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Each item in your dataset can have a column containing multiple vectors, which L
Currently, only the `cosine` metric is supported for multivector search. The vector value type can be `float16`, `float32`, or `float64`.
</Warning>

Each query vector must match the inner vector dimension in the multivector column. This applies to both single-vector queries and multi-vector query matrices.

## Computing Similarity

MaxSim (Maximum Similarity) is a key concept in late-interaction models that:
Expand Down
5 changes: 5 additions & 0 deletions docs/search/optimize-queries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ Executes the query and provides detailed runtime metrics, including:

Together, these tools offer a comprehensive view of query performance, from planning to execution. Use `explain_plan` to verify your query structure and `analyze_plan` to measure and optimize actual performance.

Metadata filters are prefiltered by default, which usually shows the filter pushed into the
`LanceScan` or index scan. If you set `prefilter=False`, expect a separate `FilterExec` after
search instead; that can be useful for some expensive filters, but it changes both latency and
the number of rows available after filtering.

## Reading the Execution Plan

To demonstrate query performance analysis, we'll use a table containing 1.2M rows sampled from the [Wikipedia dataset](https://huggingface.co/datasets/wikimedia/wikipedia). Initially, the table has no indices, allowing us to observe the impact of optimization.
Expand Down
2 changes: 2 additions & 0 deletions docs/search/sql/fts-sql.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ thoroughly and being prepared to update your queries as newer versions of LanceD

LanceDB provides support for full-text search via SQL queries using the `fts()` User-Defined Table Function (UDTF). This allows you to incorporate keyword-based search (based on BM25) in your SQL queries for powerful text retrieval.

The SQL `fts()` table function expects exactly two string literals: the table name and the JSON FTS query. Build the JSON query in your application, pass it as a SQL string literal, and keep filtering, grouping, or joining in the surrounding SQL.

## Table Setup

First, set up your FlightSQL client connection. See [SQL Queries documentation](/search/sql) for detailed client setup instructions.
Expand Down
Loading
Loading