From 3fb506fe33fb35de1a2425733d55bd4d2ad83e71 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Tue, 19 May 2026 15:56:26 +0000 Subject: [PATCH] docs: document nested field paths for scalar, vector, and FTS indexes --- docs/indexing/fts-index.mdx | 19 +++++++++++++++++++ docs/indexing/scalar-index.mdx | 16 ++++++++++++++++ docs/indexing/vector-index.mdx | 25 +++++++++++++++++++++++++ 3 files changed, 60 insertions(+) diff --git a/docs/indexing/fts-index.mdx b/docs/indexing/fts-index.mdx index 64fac40..b7068e1 100644 --- a/docs/indexing/fts-index.mdx +++ b/docs/indexing/fts-index.mdx @@ -83,3 +83,22 @@ Enable phrase queries by setting: |:----------|:---------------|:--------| | `with_position` | `True` | Track token positions for phrase matching | | `remove_stop_words` | `False` | Preserve stop words for exact phrase matching | + +## Indexing nested string fields + +You can build an FTS index on a string field inside a struct by passing its full dotted path, like `nested.text`. The same path is used when you query the index through `fts_columns`, and the indexed column is reported back as the full path from `list_indices()`. + +```python +# Schema: pa.struct([pa.field("text", pa.string())]) stored under the `nested` column. +table.create_fts_index("nested.text") + +results = ( + table.search("puppy", query_type="fts", fts_columns="nested.text") + .limit(5) + .to_list() +) +``` + + +Use the canonical Lance path: dot-separate each struct field from root to leaf (for example, `metadata.author.name`). The same convention applies to scalar and vector indexes. + diff --git a/docs/indexing/scalar-index.mdx b/docs/indexing/scalar-index.mdx index e41980b..d557685 100644 --- a/docs/indexing/scalar-index.mdx +++ b/docs/indexing/scalar-index.mdx @@ -89,6 +89,22 @@ Scalar indexes can also speed up scans containing a vector search or full text s +## Indexing nested fields + +Scalar indexes can target a scalar field inside a struct by passing its full dotted path. The path is preserved end to end: it's the value you pass to `create_scalar_index`, it's what `list_indices()` reports under `columns`, and it's the column reference you use in filter predicates. + +```python +# Schema: pa.struct([pa.field("user_id", pa.int32())]) stored under the `metadata` column. +table.create_scalar_index("metadata.user_id", name="metadata_user_id_idx") + +# The same dotted path works in WHERE clauses. +table.search().where("metadata.user_id = 42").limit(1).to_list() +``` + + +Nested paths follow Lance field-path semantics: dot-separate each struct field from root to leaf (for example, `metadata.author.name`). The same convention applies to FTS and vector indexes. + + ## Index UUID Columns LanceDB supports scalar indexes on UUID columns (stored as `FixedSizeBinary(16)`), enabling efficient lookups and filtering on UUID-based primary keys. diff --git a/docs/indexing/vector-index.mdx b/docs/indexing/vector-index.mdx index 6054595..a837070 100644 --- a/docs/indexing/vector-index.mdx +++ b/docs/indexing/vector-index.mdx @@ -137,6 +137,31 @@ Create an `IVF_PQ` index with `cosine` similarity. Specify `vector_column_name` +#### Indexing nested vector fields + +If your vector column lives inside a struct, pass its full dotted path as `vector_column_name`. The same path is used at query time and is what `list_indices()` reports under `columns`: + +```python +# Schema: pa.struct([pa.field("embedding", pa.list_(pa.float32(), 2))]) +# stored under the `image` column. +table.create_index( + vector_column_name="image.embedding", + num_partitions=1, + num_sub_vectors=1, + name="image_embedding_idx", +) + +results = ( + table.search([0.0, 1.0], vector_column_name="image.embedding") + .limit(1) + .to_list() +) +``` + + +Nested paths follow Lance field-path semantics: dot-separate each struct field from root to leaf (for example, `image.thumbnail.embedding`). The same convention applies to FTS and scalar indexes. + + ### Async API and Config Objects With asynchronous Python connections, create vector indexes with `await table.create_index("vector", config=...)`. The `config` object carries the same index choices you configure in the synchronous API, such as distance metric, partition count, and quantization settings: