From 8a2549b47e23d56ca1f486e9264400a6e93ca870 Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 20 Jan 2026 11:18:23 -0800 Subject: [PATCH 1/6] Secondary index --- docs/defradb/Concepts/secondary-index.md | 422 ++++++++++++++++++ docs/defradb/How-to Guides/secondary-index.md | 255 ----------- .../How-to Guides/seconday-index-how-to.md | 290 ++++++++++++ 3 files changed, 712 insertions(+), 255 deletions(-) create mode 100644 docs/defradb/Concepts/secondary-index.md delete mode 100644 docs/defradb/How-to Guides/secondary-index.md create mode 100644 docs/defradb/How-to Guides/seconday-index-how-to.md diff --git a/docs/defradb/Concepts/secondary-index.md b/docs/defradb/Concepts/secondary-index.md new file mode 100644 index 0000000..6e89c20 --- /dev/null +++ b/docs/defradb/Concepts/secondary-index.md @@ -0,0 +1,422 @@ +--- +sidebar_label: Secondary index +sidebar_position: 10 +--- + +# Secondary indexes + +## Overview + +Secondary indexes in DefraDB enable efficient document lookups by creating optimized data structures that map field values to documents. Instead of scanning entire collections, indexes allow DefraDB to quickly locate documents matching specific criteria. + +**Key Points** + +DefraDB's secondary indexing system uses the `@index` directive on GraphQL schema fields to create indexes that **significantly improve query performance on filtered queries**. + +**Core capabilities:** + +- **Field-level indexes** – Index individual fields for fast lookups +- **Composite indexes** – Index multiple fields together for complex queries +- **Unique constraints** – Enforce uniqueness at the index level +- **Relationship indexes** – Index foreign key relationships between documents +- **JSON field indexes** – Index nested paths within JSON fields using inverted indexes + +**Performance trade-off:** Indexes improve read performance but add write overhead, as each document update must also update all relevant indexes. + +**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and plan indexes based on your application's query patterns. + +## How indexes work + +### Basic concept + +An index is a data structure that maps field values to document identifiers. Instead of scanning every document in a collection (a "table scan"), DefraDB can use the index to directly locate matching documents. + +**Without an index:** +``` +Query: Find users with age = 30 +Process: Scan all user documents → Check each age field → Return matches +Cost: O(n) where n = total documents +``` + +**With an index on age:** +``` +Query: Find users with age = 30 +Process: Look up "30" in age index → Return matching document IDs +Cost: O(log n) for lookup + O(m) for retrieval where m = matching documents +``` + +### Index structure + +DefraDB stores indexes as sorted key-value pairs where: +- **Key**: The indexed field value(s) +- **Value**: Document identifier (_key) + +For a User collection with an indexed `name` field: +``` +Index entries: +"Alice" → [doc_id_1] +"Bob" → [doc_id_2, doc_id_3] +"Charlie" → [doc_id_4] +``` + +When you query for `name = "Bob"`, DefraDB looks up "Bob" in the index and immediately retrieves `doc_id_2` and `doc_id_3`. + +## Index types + +### Single-field indexes + +The simplest form of index covers a single field: + +```graphql +type User { + name: String @index + email: String @index(unique: true) +} +``` + +Each indexed field creates a separate index structure. The `unique: true` parameter adds a constraint ensuring no duplicate values. + +### Composite indexes + +Composite indexes span multiple fields and are optimized for queries filtering on those fields together: + +```graphql +type Article @index(includes: [ + {field: "status"}, + {field: "publishedAt"} +]) { + status: String + publishedAt: DateTime +} +``` + +**Index structure:** +``` +("published", "2024-01-15") → [doc_id_1] +("published", "2024-01-16") → [doc_id_2, doc_id_3] +("draft", "2024-01-15") → [doc_id_4] +``` + +Composite indexes are efficient for queries like: +```graphql +filter: { + status: {_eq: "published"} + publishedAt: {_gt: "2024-01-01"} +} +``` + +But less efficient for queries filtering only on the second field (`publishedAt` alone). + +### Unique indexes + +Unique indexes enforce uniqueness constraints at the database level: + +```graphql +type User { + email: String @index(unique: true) +} +``` + +When you try to create a document with a duplicate email, DefraDB will reject it. This is more efficient than manually checking for duplicates in your application code. + +**Performance impact:** Unique indexes add validation overhead because DefraDB must check for existing values before every insert or update. + +## Relationship indexing + +### How relationship indexes work + +When you index a relationship field, DefraDB creates an index on the foreign key reference: + +```graphql +type User { + address: Address @primary @index +} + +type Address { + city: String @index +} +``` + +This creates two indexes: +1. User → Address foreign key index +2. Address city field index + +### Query optimization with relationship indexes + +Consider this query: +```graphql +User(filter: {address: {city: {_eq: "Montreal"}}}) +``` + +**Without indexes:** +1. Scan all User documents +2. For each User, fetch the related Address +3. Check if city matches "Montreal" +4. Return matching Users + +**With indexes:** +1. Look up "Montreal" in the Address city index → Get Address IDs +2. Look up those Address IDs in the User→Address relationship index → Get User IDs +3. Retrieve those User documents + +The indexed approach avoids scanning the entire User collection and performs direct lookups instead. + +### Enforcing relationship cardinality + +Unique relationship indexes enforce one-to-one relationships: + +```graphql +type User { + address: Address @primary @index(unique: true) +} +``` + +Without the unique constraint, the relationship defaults to one-to-many (multiple Users could reference the same Address). The unique index ensures exactly one User per Address. + +## JSON field indexing + +JSON fields present unique indexing challenges because they're hierarchical and semi-structured. DefraDB uses a specialized approach to handle them efficiently. + +### Path-aware indexing + +Unlike scalar fields (String, Int, Bool), JSON fields contain nested structures. DefraDB indexes every leaf node in the JSON tree along with its complete path: + +**Example document:** +```json +{ + "user": { + "device": { + "model": "iPhone", + "version": "15" + }, + "location": { + "city": "Montreal" + } + } +} +``` + +**Index entries created:** +``` +["user", "device", "model", "iPhone"] → doc_id_1 +["user", "device", "version", "15"] → doc_id_1 +["user", "location", "city", "Montreal"] → doc_id_1 +``` + +Each entry includes the full path to the value, ensuring DefraDB knows not just what the value is, but where it exists within the document structure. + +### Inverted indexes for JSON + +DefraDB uses **inverted indexes** for JSON fields. Traditional indexes map documents to values; inverted indexes map values to documents. + +**Traditional index (document → value):** +``` +doc_id_1 → {"user": {"device": {"model": "iPhone"}}} +doc_id_2 → {"user": {"device": {"model": "Android"}}} +``` + +**Inverted index (value → documents):** +``` +["user", "device", "model", "iPhone"] → [doc_id_1, doc_id_3, doc_id_7] +["user", "device", "model", "Android"] → [doc_id_2, doc_id_5] +``` + +When you query for a specific path and value, DefraDB directly looks it up in the inverted index and retrieves all matching documents. + +### Value normalization + +DefraDB normalizes JSON leaf values to ensure consistent ordering and comparisons: + +- **Type normalization**: Numbers are normalized to `int64` or `float64` +- **Path storage**: Each value stores its complete path +- **Sorting consistency**: Normalized values ensure predictable sort order + +This normalization allows DefraDB to: +- Compare values across documents reliably +- Sort results consistently +- Filter efficiently regardless of how values were originally stored + +### Query execution with JSON indexes + +**Query:** +```graphql +Collection(filter: { + jsonField: { + user: { + device: { + model: {_eq: "iPhone"} + } + } + } +}) +``` + +**Without index:** +1. Scan all documents +2. Parse each JSON field +3. Navigate to `user.device.model` +4. Compare value to "iPhone" +5. Return matches + +**With index:** +1. Look up `["user", "device", "model", "iPhone"]` in inverted index +2. Retrieve matching document IDs +3. Return those documents + +The indexed approach avoids JSON parsing and navigation during query execution. + +### Key format for JSON indexes + +DefraDB uses a hierarchical key format for JSON index entries: + +``` +//// +``` + +Example: +``` +users_col/idx_123/user/device/model/iPhone/doc_456 +users_col/idx_123/user/location/city/Montreal/doc_789 +``` + +This format allows efficient prefix scanning for partial path matches and supports complex queries on nested JSON structures. + +## Performance considerations + +### Read vs write trade-off + +Every index improves read performance but degrades write performance: + +**Reads (queries):** +- Index lookups are O(log n) instead of O(n) table scans +- Queries on indexed fields are significantly faster +- Complex filters benefit from composite indexes + +**Writes (inserts/updates):** +- Each indexed field adds overhead to document creation +- Updates require updating the document AND all relevant indexes +- More indexes = slower writes + +### When to use indexes + +**Good candidates for indexing:** +- Fields frequently used in query filters +- Foreign key relationships queried often +- Fields requiring uniqueness constraints +- Fields used for sorting + +**Poor candidates for indexing:** +- Fields rarely or never queried +- Fields that change frequently but are seldom filtered +- High-cardinality fields with mostly unique values (unless uniqueness is required) + +### Composite vs multiple single-field indexes + +**Composite index:** +```graphql +@index(includes: [{field: "status"}, {field: "date"}]) +``` + +**Multiple single-field indexes:** +```graphql +status: String @index +date: DateTime @index +``` + +**When to use composite:** +- Queries frequently filter on both fields together +- Order matters (queries filter on status, then date) +- Want optimal performance for that specific query pattern + +**When to use multiple single-field:** +- Queries filter on either field independently +- Need flexibility for different query combinations +- Don't mind slightly slower performance for multi-field queries + +### Index maintenance overhead + +Indexes aren't free: + +**Storage:** Each index consumes disk space proportional to the number of documents and indexed field size. + +**Memory:** Active indexes may be cached in memory for faster access. + +**CPU:** Index updates require computation during writes. + +**Balance these costs** against the query performance improvements to determine the optimal indexing strategy. + +## Direction and ordering + +### Index direction + +Indexes can be ordered ascending (ASC) or descending (DESC): + +```graphql +type Article { + publishedAt: DateTime @index(direction: DESC) +} +``` + +**Why direction matters:** + +Descending indexes are optimized for queries that want the most recent items first: +```graphql +Article(order: {publishedAt: DESC}, limit: 10) +``` + +If your index direction matches your query's sort order, DefraDB can use the index directly without additional sorting. + +### Composite index direction + +Each field in a composite index can have its own direction: + +```graphql +@index(includes: [ + {field: "status", direction: ASC}, + {field: "publishedAt", direction: DESC} +]) +``` + +This optimizes queries that sort by status ascending, then by publishedAt descending. + +## Benefits of DefraDB's indexing system + +### Efficient queries + +Indexes transform slow table scans into fast direct lookups, making queries scale logarithmically instead of linearly with dataset size. + +### Precise path tracking (JSON) + +JSON indexes maintain full path information, allowing accurate indexing and retrieval of deeply nested structures without ambiguity. + +### Scalable structure + +DefraDB's indexing system handles: +- Simple scalar fields efficiently +- Complex composite indexes for multi-field queries +- Deeply nested JSON with minimal overhead +- Large datasets with predictable performance characteristics + +### Flexible constraints + +Unique indexes provide database-level data integrity without requiring application-level validation. + +## Limitations and considerations + +### Index creation constraints + +- Indexes must be defined in the schema; you cannot add them to existing collections without schema migration +- All index definitions must be present when creating the schema +- Changing indexes requires schema migration + +### Query pattern dependency + +Indexes only help queries that use the indexed fields. If your query patterns change, you may need to adjust your indexing strategy. + +### Write amplification + +Heavy indexing can significantly slow down write operations. Monitor write performance and adjust indexes if writes become a bottleneck. + +### Storage overhead + +Large collections with many indexes can consume significant disk space. Plan storage capacity accordingly. diff --git a/docs/defradb/How-to Guides/secondary-index.md b/docs/defradb/How-to Guides/secondary-index.md deleted file mode 100644 index 18d2d1f..0000000 --- a/docs/defradb/How-to Guides/secondary-index.md +++ /dev/null @@ -1,255 +0,0 @@ ---- -sidebar_label: Secondary Indexes -sidebar_position: 60 ---- -# Seconday Indexes - -:::tip[Key Points] - -DefraDB's secondary indexing system enables efficient document lookups using the `@index` directive on GraphQL schema fields. Indexes trade write overhead for significantly faster read performance on filtered queries. - -**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add validation overhead). Plan indexes based on query patterns to balance read/write performance. - -::: - -## Introduction - -DefraDB provides a powerful and flexible secondary indexing system that enables efficient document lookups and queries. - -## Usage - -The `@index` directive can be used on GraphQL schema objects and field definitions to configure indexes. - -```graphql -@index(name: String, unique: Bool, direction: ORDERING, includes: [{ field: String, direction: ORDERING }]) -``` - -### `name` -Sets the index name. Defaults to concatenated field names with direction. - -### `unique` -Makes the index unique. Defaults to false. - -### `direction` -Sets the default index direction for all fields. Can be one of ASC (ascending) or DESC (descending). Defaults to ASC. - -If a field in the includes list does not specify a direction the default direction from this value will be used instead. - -### `includes` -Sets the fields the index is created on. - -When used on a field definition and the field is not in the includes list it will be implicitly added as the first entry. - -## Examples - -### Field level usage - -Creates an index on the User name field with DESC direction. - -```graphql -type User { - name: String @index(direction: DESC) -} -``` - -### Schema level usage - -Creates an index on the User name field with default direction (ASC). - -```graphql -type User @index(includes: {field: "name"}) { - name: String - age: Int -} -``` - -### Unique index - -Creates a unique index on the User name field with default direction (ASC). - -```graphql -type User { - name: String @index(unique: true) -} -``` - -### Composite index - -Creates a composite index on the User name and age fields with default direction (ASC). - -```graphql -type User @index(includes: [{field: "name"}, {field: "age"}]) { - name: String - age: Int -} -``` - -### Relationship index - -Creates a unique index on the User relationship to Address. The unique index constraint ensures that no two Users can reference the same Address document. - -```graphql -type User { - name: String - age: Int - address: Address @primary @index(unique: true) -} - -type Address { - user: User - city: String - street: String -} -``` - -## Performance considerations - -Indexes can greatly improve query performance, but they also impact system performance during writes. Each index adds write overhead since every document update must also update the relevant indexes. Despite this, the boost in read performance for indexed queries usually makes this trade-off worthwhile. - -#### To optimize performance: - -- Choose indexes based on your query patterns. Focus on fields frequently used in query filters to maximize efficiency. -- Avoid indexing rarely queried fields. Doing so adds unnecessary overhead. -- Be cautious with unique indexes. These require extra validation, making their performance impact more significant. - -Plan your indexes carefully to balance read and write performance. - -### Indexing related objects - -DefraDB supports indexing relationships between documents, allowing for efficient queries across related data. - -#### Example schema: Users and addresses - -```graphql -type User { - name: String - age: Int - address: Address @primary @index -} - -type Address { - user: User - city: String @index - street: String -} -``` - -Key indexes in this schema: - -- **City field in address:** Indexed to enable efficient queries by city. -- **Relationship between user and address**: Indexed to support fast lookups based on relationships. - -#### Query example - -The following query retrieves all users living in Montreal: - -```graphql -query { - User(filter: { - address: {city: {_eq: "Montreal"}} - }) { - name - } -} -``` - -#### How indexing improves efficiency - -**Without indexes:** -- Fetch all user documents. -- For each user, retrieve the corresponding Address. This approach becomes slow with large datasets. - -**With indexes:** -- Fetch address documents matching the city value directly. -- Retrieve the corresponding User documents. This method is much faster because indexes enable direct lookups. - -### Enforcing unique relationships -Indexes can also enforce one-to-one relationships. For instance, to ensure each User has exactly one unique Address: - -```graphql -type User { - name: String - age: Int - address: Address @primary @index(unique: true) -} - -type Address { - user: User - city: String @index - street: String -} -``` - -Here, the @index(unique: true) constraint ensures no two Users can share the same Address. Without it, the relationship defaults to one-to-many, allowing multiple Users to reference a single Address. - -By combining relationship indexing with cardinality constraints, you can create highly efficient and logically consistent data structures. - -## JSON field indexing - -DefraDB offers a specialized indexing system for JSON fields, designed to handle their hierarchical structure efficiently. - -### Overview - -JSON fields differ from other field types (e.g., Int, String, Bool) because they are semi-structured and encoded. DefraDB uses a path-aware system to manage these complexities, enabling traversal and indexing of all leaf nodes in a JSON document. - -### Example - -```json -{ - "user": { - "device": { - "model": "iPhone" - } - } -} -``` - -Here, the `iPhone` value is represented with its complete path: [`user`, `device`, `model`]. This path-aware representation ensures that the system knows not just the value, but where it resides within the document. - -Retrieve documents where the model is "iPhone": - -```graphql -query { - Collection(filter: { - jsonField: { - user: { - device: { - model: {_eq: "iPhone"} - } - } - } - }) -} -``` - -With indexes, the system directly retrieves matching documents, avoiding the need to scan and parse the JSON during queries. - -### How it works - -#### Inverted Indexes for JSON -DefraDB uses inverted indexes for JSON fields. These indexes reverse the traditional "document-to-value" relationship by starting with a value and quickly locating all documents containing that value. - -- Regular fields map to a single index entry. -- JSON fields generate multiple entries—one for each leaf node, incorporating both the path and the value. - -During indexing, the system traverses the entire JSON structure, creating these detailed index entries. - -#### Value normalization in JSON -DefraDB normalizes JSON leaf values to ensure consistency in ordering and comparisons. For example: - -- JSON values include their normalized value and path information. -- Scalar types (e.g., integers) are normalized to a standard type, such as `int64`. - -This ensures that operations like filtering and sorting are reliable and efficient. - -#### How indexing works -When indexing a document with JSON fields, the system: - -1. Traverses the JSON structure using the JSON interface. -1. Generates index entries for every leaf node, combining path and normalized value. -1. Stores entries efficiently, enabling direct querying. - -#### Benefits of JSON field indexing -- **Efficient queries**: Leverages inverted indexes for fast lookups, even in deeply nested structures. -- **Precise path tracking**: Maintains path information for accurate indexing and retrieval. -- **Scalable structure**: Handles complex JSON documents with minimal performance overhead. diff --git a/docs/defradb/How-to Guides/seconday-index-how-to.md b/docs/defradb/How-to Guides/seconday-index-how-to.md new file mode 100644 index 0000000..c35572a --- /dev/null +++ b/docs/defradb/How-to Guides/seconday-index-how-to.md @@ -0,0 +1,290 @@ +--- +sidebar_label: Secondary index +sidebar_position: 10 +--- + + +# Secondary indexes + +This guide provides step-by-step instructions for creating and using secondary indexes in DefraDB to improve query performance. + +:::tip[Key Points] + +DefraDB's secondary indexing system enables efficient document lookups using the `@index` directive on GraphQL schema fields. Indexes trade write overhead for significantly faster read performance on filtered queries. + +**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add validation overhead). Plan indexes based on query patterns to balance read/write performance. + +::: + +## Prerequisites + +Before following this guide, ensure you have: + +- DefraDB installed and running +- A defined schema for your collections +- Understanding of [secondary index concepts](/defradb/next/Concepts/secondary-indexes) + +## Create a basic index + +Add the `@index` directive to a field in your schema to create an index. + +### Index a single field + +```graphql +type User { + name: String @index + age: Int +} +``` + +This creates an ascending (ASC) index on the `name` field. + +### Specify index direction + +```graphql +type User { + name: String @index(direction: DESC) + age: Int +} +``` + +Use `direction: DESC` for descending order or `direction: ASC` (default) for ascending order. + +### Add the schema + +```bash +defradb client schema add -f schema.graphql +``` + +## Create a unique index + +Unique indexes ensure no two documents have the same value for the indexed field. + +```graphql +type User { + email: String @index(unique: true) + name: String +} +``` + +This prevents duplicate email addresses in your User collection. + +## Create a composite index + +Composite indexes span multiple fields, useful for queries filtering on multiple fields simultaneously. + +### Using schema-level directive + +```graphql +type User @index(includes: [{field: "name"}, {field: "age"}]) { + name: String + age: Int + email: String +} +``` + +### Specify different directions per field + +```graphql +type User @index(includes: [ + {field: "name", direction: ASC}, + {field: "age", direction: DESC} +]) { + name: String + age: Int +} +``` + +## Index relationships + +Index relationship fields to improve query performance across related documents. + +### Basic relationship index + +```graphql +type User { + name: String + age: Int + address: Address @primary @index +} + +type Address { + user: User + city: String @index + street: String +} +``` + +This indexes both: +- The relationship between User and Address +- The city field in Address + +### Query with relationship index + +```graphql +query { + User(filter: { + address: {city: {_eq: "Montreal"}} + }) { + name + } +} +``` + +With the indexes, DefraDB: +1. Quickly finds Address documents with `city = "Montreal"` +2. Retrieves the related User documents efficiently + +### Enforce unique relationships + +Use a unique index to enforce one-to-one relationships: + +```graphql +type User { + name: String + age: Int + address: Address @primary @index(unique: true) +} + +type Address { + user: User + city: String + street: String +} +``` + +This ensures no two Users can reference the same Address document. + +## Index JSON fields + +DefraDB supports indexing JSON fields for efficient queries on nested data. + +### Define a schema with JSON field + +```graphql +type Product { + name: String + metadata: JSON @index +} +``` + +### Query nested JSON paths + +```graphql +query { + Product(filter: { + metadata: { + user: { + device: { + model: {_eq: "iPhone"} + } + } + } + }) { + name + } +} +``` + +The index enables direct lookup of documents matching the nested path and value. + +## Name your indexes + +Assign custom names to indexes for easier identification. + +```graphql +type User { + name: String @index(name: "user_name_idx") + email: String @index(name: "user_email_unique_idx", unique: true) +} +``` + +Default names are auto-generated from field names and direction. + +## Query patterns for best performance + +### Index frequently filtered fields + +```graphql +type Article { + title: String + content: String + status: String @index # Frequently filtered + publishedAt: DateTime @index # Frequently filtered + author: String +} +``` + +Index fields commonly used in `filter` clauses. + +### Use composite indexes for multi-field filters + +```graphql +type Article @index(includes: [ + {field: "status"}, + {field: "publishedAt"} +]) { + title: String + status: String + publishedAt: DateTime +} +``` + +```graphql +query { + Article(filter: { + status: {_eq: "published"} + publishedAt: {_gt: "2024-01-01"} + }) { + title + } +} +``` + +This composite index efficiently handles queries filtering on both fields. + +### Avoid over-indexing + +Don't index fields that are rarely queried: + +```graphql +type User { + name: String @index # Good - frequently queried + email: String @index # Good - frequently queried + middleName: String # No index - rarely queried + internalNote: String # No index - rarely queried +} +``` + +Every index adds write overhead, so only index what you need. + +## Performance optimization tips + +- **Index your query patterns**: Analyze your application's queries and index the fields used in filters +- **Use unique indexes sparingly**: They add validation overhead on writes +- **Consider composite indexes**: More efficient than multiple single-field indexes for multi-field queries +- **Test query performance**: Use the [explain systems](/defradb/next/How-to%20Guides/explain-systems-how-to) to analyze query execution + +## Troubleshooting + +### Queries still slow after adding indexes + +**Issue**: Query performance hasn't improved after adding indexes. + +**Solutions**: +- Verify the index was created successfully +- Ensure your query filter uses the indexed field +- Check if you're querying in the reverse direction of a relationship (may need to index the other side) +- Use composite indexes if filtering on multiple fields + +### Unique constraint violations + +**Issue**: Cannot insert documents due to unique index constraint. + +**Solution**: Check for existing documents with the same value. Unique indexes prevent duplicates, so you must either update the existing document or use a different value. + +### Write performance degraded + +**Issue**: Document creation/updates are slower after adding indexes. + +**Solution**: This is expected behavior. Indexes trade write performance for read performance. Review your indexes and remove any that aren't essential for your query patterns. From e250c94d8dcdc0ce208c3d1321883bbd7b29514f Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 17 Feb 2026 10:29:38 -0800 Subject: [PATCH 2/6] update --- .../{seconday-index-how-to.md => secondary-index-how-to.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/defradb/How-to Guides/{seconday-index-how-to.md => secondary-index-how-to.md} (99%) diff --git a/docs/defradb/How-to Guides/seconday-index-how-to.md b/docs/defradb/How-to Guides/secondary-index-how-to.md similarity index 99% rename from docs/defradb/How-to Guides/seconday-index-how-to.md rename to docs/defradb/How-to Guides/secondary-index-how-to.md index c35572a..57ff0ef 100644 --- a/docs/defradb/How-to Guides/seconday-index-how-to.md +++ b/docs/defradb/How-to Guides/secondary-index-how-to.md @@ -22,7 +22,7 @@ Before following this guide, ensure you have: - DefraDB installed and running - A defined schema for your collections -- Understanding of [secondary index concepts](/defradb/next/Concepts/secondary-indexes) +- Understanding of [secondary index concepts](/defradb/next/Concepts/secondary-index) ## Create a basic index From 011596bbe1312946f1251f7d591bbd29ee1dca7a Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 17 Feb 2026 11:00:41 -0800 Subject: [PATCH 3/6] Update --- docs/defradb/Concepts/secondary-index.md | 231 +++++++----------- .../How-to Guides/secondary-index-how-to.md | 69 ++++-- 2 files changed, 128 insertions(+), 172 deletions(-) diff --git a/docs/defradb/Concepts/secondary-index.md b/docs/defradb/Concepts/secondary-index.md index 6e89c20..00ca8bb 100644 --- a/docs/defradb/Concepts/secondary-index.md +++ b/docs/defradb/Concepts/secondary-index.md @@ -20,8 +20,9 @@ DefraDB's secondary indexing system uses the `@index` directive on GraphQL schem - **Unique constraints** – Enforce uniqueness at the index level - **Relationship indexes** – Index foreign key relationships between documents - **JSON field indexes** – Index nested paths within JSON fields using inverted indexes +- **Array field indexes** – Index values within array fields -**Performance trade-off:** Indexes improve read performance but add write overhead, as each document update must also update all relevant indexes. +**Performance trade-off:** Indexes improve read performance but add write overhead, as each document update must also update all relevant indexes. Indexing arrays and JSON fields can fill up storage quickly with large data. **Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and plan indexes based on your application's query patterns. @@ -32,34 +33,46 @@ DefraDB's secondary indexing system uses the `@index` directive on GraphQL schem An index is a data structure that maps field values to document identifiers. Instead of scanning every document in a collection (a "table scan"), DefraDB can use the index to directly locate matching documents. **Without an index:** -``` + +```bash Query: Find users with age = 30 Process: Scan all user documents → Check each age field → Return matches Cost: O(n) where n = total documents ``` **With an index on age:** -``` + +```bash Query: Find users with age = 30 Process: Look up "30" in age index → Return matching document IDs -Cost: O(log n) for lookup + O(m) for retrieval where m = matching documents +Cost: O(1) for lookup + O(m) for retrieval where m = matching documents ``` ### Index structure -DefraDB stores indexes as sorted key-value pairs where: -- **Key**: The indexed field value(s) -- **Value**: Document identifier (_key) +For regular indexes, DefraDB stores index entries as key-value pairs where the document ID is part of the key and the value is empty: -For a User collection with an indexed `name` field: +```bash +/col_id/ind_id/field_values/_docID → {} ``` + +For unique indexes, the document ID is stored as the value instead: + +```bash +/col_id/ind_id/field_values → _docID +``` + +For a User collection with an indexed `name` field, the entries look like: + +```bash Index entries: -"Alice" → [doc_id_1] -"Bob" → [doc_id_2, doc_id_3] -"Charlie" → [doc_id_4] +"Alice/doc_id_1" → {} +"Bob/doc_id_2" → {} +"Bob/doc_id_3" → {} +"Charlie/doc_id_4" → {} ``` -When you query for `name = "Bob"`, DefraDB looks up "Bob" in the index and immediately retrieves `doc_id_2` and `doc_id_3`. +When you query for `name = "Bob"`, DefraDB looks up "Bob" in the index and retrieves matching documents one by one (e.g., `doc_id_2`, then `doc_id_3`). If a `limit: 1` is applied, only the first match is fetched. ## Index types @@ -91,21 +104,26 @@ type Article @index(includes: [ ``` **Index structure:** -``` -("published", "2024-01-15") → [doc_id_1] -("published", "2024-01-16") → [doc_id_2, doc_id_3] -("draft", "2024-01-15") → [doc_id_4] + +```bash +published/2024-01-15/doc_id_1 → {} +published/2024-01-16/doc_id_2 → {} +published/2024-01-16/doc_id_3 → {} +draft/2024-01-15/doc_id_4 → {} ``` +(Note: `col_id` and `index_id` are always prefixed but omitted here for clarity.) + Composite indexes are efficient for queries like: + ```graphql filter: { status: {_eq: "published"} - publishedAt: {_gt: "2024-01-01"} + publishedAt: {_gt: "2017-07-23T03:46:56-05:00"} } ``` -But less efficient for queries filtering only on the second field (`publishedAt` alone). +Queries filtering only on the second field (`publishedAt` alone) will not use this index at all. ### Unique indexes @@ -119,7 +137,7 @@ type User { When you try to create a document with a duplicate email, DefraDB will reject it. This is more efficient than manually checking for duplicates in your application code. -**Performance impact:** Unique indexes add validation overhead because DefraDB must check for existing values before every insert or update. +**Performance impact:** Unique indexes require an additional read operation on every insert or update to check for existing values. ## Relationship indexing @@ -138,23 +156,27 @@ type Address { ``` This creates two indexes: + 1. User → Address foreign key index 2. Address city field index ### Query optimization with relationship indexes Consider this query: + ```graphql User(filter: {address: {city: {_eq: "Montreal"}}}) ``` **Without indexes:** + 1. Scan all User documents 2. For each User, fetch the related Address 3. Check if city matches "Montreal" 4. Return matching Users **With indexes:** + 1. Look up "Montreal" in the Address city index → Get Address IDs 2. Look up those Address IDs in the User→Address relationship index → Get User IDs 3. Retrieve those User documents @@ -173,15 +195,20 @@ type User { Without the unique constraint, the relationship defaults to one-to-many (multiple Users could reference the same Address). The unique index ensures exactly one User per Address. +Note: 1-to-2-sided relations are automatically constrained by a unique index to enforce the 1-to-1 invariant. + ## JSON field indexing JSON fields present unique indexing challenges because they're hierarchical and semi-structured. DefraDB uses a specialized approach to handle them efficiently. +> **Storage warning:** Indexing JSON fields can consume significant disk space with large data, as every leaf node at every path is indexed separately. + ### Path-aware indexing Unlike scalar fields (String, Int, Bool), JSON fields contain nested structures. DefraDB indexes every leaf node in the JSON tree along with its complete path: **Example document:** + ```json { "user": { @@ -196,49 +223,39 @@ Unlike scalar fields (String, Int, Bool), JSON fields contain nested structures. } ``` -**Index entries created:** -``` -["user", "device", "model", "iPhone"] → doc_id_1 -["user", "device", "version", "15"] → doc_id_1 -["user", "location", "city", "Montreal"] → doc_id_1 +**Index entries created** (using `/col_id/ind_id/` prefix, JSON path parts separated by `.`): + +```bash +/1/1/user.device.model/iPhone/doc_id_1 → {} +/1/1/user.device.version/15/doc_id_1 → {} +/1/1/user.location.city/Montreal/doc_id_1 → {} ``` Each entry includes the full path to the value, ensuring DefraDB knows not just what the value is, but where it exists within the document structure. ### Inverted indexes for JSON -DefraDB uses **inverted indexes** for JSON fields. Traditional indexes map documents to values; inverted indexes map values to documents. +DefraDB uses **inverted indexes** for JSON fields. The whole idea is to tokenize key-value pairs that form a path, mapping values back to the documents that contain them. -**Traditional index (document → value):** -``` -doc_id_1 → {"user": {"device": {"model": "iPhone"}}} -doc_id_2 → {"user": {"device": {"model": "Android"}}} -``` +For context, a primary (non-inverted) index might look like: -**Inverted index (value → documents):** -``` -["user", "device", "model", "iPhone"] → [doc_id_1, doc_id_3, doc_id_7] -["user", "device", "model", "Android"] → [doc_id_2, doc_id_5] +```bash +/1/1/iPhone → {"user": {"device": {"model": "iPhone"}}} ``` -When you query for a specific path and value, DefraDB directly looks it up in the inverted index and retrieves all matching documents. +The inverted secondary index instead maps paths and values to document IDs: -### Value normalization - -DefraDB normalizes JSON leaf values to ensure consistent ordering and comparisons: - -- **Type normalization**: Numbers are normalized to `int64` or `float64` -- **Path storage**: Each value stores its complete path -- **Sorting consistency**: Normalized values ensure predictable sort order +```bash +/1/1/user.device.model/iPhone/doc_id_1 → {} +/1/1/user.device.model/Android/doc_id_2 → {} +``` -This normalization allows DefraDB to: -- Compare values across documents reliably -- Sort results consistently -- Filter efficiently regardless of how values were originally stored +When you query for a specific path and value, DefraDB directly looks it up in the inverted index and retrieves all matching documents. For more on inverted indexes, see the [CockroachDB RFC on inverted indexes](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171020_inverted_indexes.md). ### Query execution with JSON indexes **Query:** + ```graphql Collection(filter: { jsonField: { @@ -252,6 +269,7 @@ Collection(filter: { ``` **Without index:** + 1. Scan all documents 2. Parse each JSON field 3. Navigate to `user.device.model` @@ -259,7 +277,8 @@ Collection(filter: { 5. Return matches **With index:** -1. Look up `["user", "device", "model", "iPhone"]` in inverted index + +1. Look up `/user.device.model/iPhone` in inverted index 2. Retrieve matching document IDs 3. Return those documents @@ -269,14 +288,15 @@ The indexed approach avoids JSON parsing and navigation during query execution. DefraDB uses a hierarchical key format for JSON index entries: -``` +```bash //// ``` Example: -``` -users_col/idx_123/user/device/model/iPhone/doc_456 -users_col/idx_123/user/location/city/Montreal/doc_789 + +```bash +users_col/idx_123/user.device.model/iPhone/doc_456 +users_col/idx_123/user.location.city/Montreal/doc_789 ``` This format allows efficient prefix scanning for partial path matches and supports complex queries on nested JSON structures. @@ -285,71 +305,21 @@ This format allows efficient prefix scanning for partial path matches and suppor ### Read vs write trade-off -Every index improves read performance but degrades write performance: - -**Reads (queries):** -- Index lookups are O(log n) instead of O(n) table scans -- Queries on indexed fields are significantly faster -- Complex filters benefit from composite indexes - -**Writes (inserts/updates):** -- Each indexed field adds overhead to document creation -- Updates require updating the document AND all relevant indexes -- More indexes = slower writes +Every index improves read performance but adds write overhead. On reads, an `_eq` filter on an indexed field is O(1) for the lookup, plus O(m) to retrieve the m matching documents. On writes, each indexed field requires updating the index in addition to the document itself — so more indexes means slower writes. ### When to use indexes -**Good candidates for indexing:** -- Fields frequently used in query filters -- Foreign key relationships queried often -- Fields requiring uniqueness constraints -- Fields used for sorting - -**Poor candidates for indexing:** -- Fields rarely or never queried -- Fields that change frequently but are seldom filtered -- High-cardinality fields with mostly unique values (unless uniqueness is required) +Fields that are frequently used in query filters, foreign key relationships, or uniqueness constraints are good candidates. Fields that are rarely queried, change frequently without being filtered, or are in large JSON/array structures with big data volumes are generally poor candidates. ### Composite vs multiple single-field indexes -**Composite index:** -```graphql -@index(includes: [{field: "status"}, {field: "date"}]) -``` - -**Multiple single-field indexes:** -```graphql -status: String @index -date: DateTime @index -``` - -**When to use composite:** -- Queries frequently filter on both fields together -- Order matters (queries filter on status, then date) -- Want optimal performance for that specific query pattern - -**When to use multiple single-field:** -- Queries filter on either field independently -- Need flexibility for different query combinations -- Don't mind slightly slower performance for multi-field queries - -### Index maintenance overhead - -Indexes aren't free: - -**Storage:** Each index consumes disk space proportional to the number of documents and indexed field size. - -**Memory:** Active indexes may be cached in memory for faster access. - -**CPU:** Index updates require computation during writes. - -**Balance these costs** against the query performance improvements to determine the optimal indexing strategy. +A composite index like `@index(includes: [{field: "status"}, {field: "date"}])` is best when queries regularly filter on both fields together. Multiple single-field indexes offer more flexibility when queries filter on either field independently, at the cost of slightly slower multi-field queries. ## Direction and ordering -### Index direction +Index direction (ASC or DESC) plays a significant role primarily for **composite indexes**. For single-field indexes, the index fetcher can traverse entries in reverse order just as efficiently as the default order, so direction has minimal practical impact there. -Indexes can be ordered ascending (ASC) or descending (DESC): +For composite indexes, specifying direction can matter: ```graphql type Article { @@ -357,17 +327,6 @@ type Article { } ``` -**Why direction matters:** - -Descending indexes are optimized for queries that want the most recent items first: -```graphql -Article(order: {publishedAt: DESC}, limit: 10) -``` - -If your index direction matches your query's sort order, DefraDB can use the index directly without additional sorting. - -### Composite index direction - Each field in a composite index can have its own direction: ```graphql @@ -377,46 +336,20 @@ Each field in a composite index can have its own direction: ]) ``` -This optimizes queries that sort by status ascending, then by publishedAt descending. - -## Benefits of DefraDB's indexing system - -### Efficient queries +When the index direction matches the query's sort order, DefraDB can use the index directly without a separate sorting step. -Indexes transform slow table scans into fast direct lookups, making queries scale logarithmically instead of linearly with dataset size. +## Managing indexes -### Precise path tracking (JSON) +Indexes can be added or deleted at any time using CLI commands or the embedded client. GraphQL-based index management is not yet available. -JSON indexes maintain full path information, allowing accurate indexing and retrieval of deeply nested structures without ambiguity. - -### Scalable structure - -DefraDB's indexing system handles: -- Simple scalar fields efficiently -- Complex composite indexes for multi-field queries -- Deeply nested JSON with minimal overhead -- Large datasets with predictable performance characteristics - -### Flexible constraints - -Unique indexes provide database-level data integrity without requiring application-level validation. +Refer to the CLI reference for commands to create and drop indexes on existing collections. ## Limitations and considerations -### Index creation constraints - -- Indexes must be defined in the schema; you cannot add them to existing collections without schema migration -- All index definitions must be present when creating the schema -- Changing indexes requires schema migration - ### Query pattern dependency Indexes only help queries that use the indexed fields. If your query patterns change, you may need to adjust your indexing strategy. -### Write amplification - -Heavy indexing can significantly slow down write operations. Monitor write performance and adjust indexes if writes become a bottleneck. - -### Storage overhead +### Write amplification and Storage overhead -Large collections with many indexes can consume significant disk space. Plan storage capacity accordingly. +Heavy indexing can significantly slow down write operations. Monitor write performance and adjust your indexing strategy if writes become a bottleneck. Large collections with many indexes — especially on JSON or array fields — can consume significant disk space. Plan storage capacity accordingly. diff --git a/docs/defradb/How-to Guides/secondary-index-how-to.md b/docs/defradb/How-to Guides/secondary-index-how-to.md index 57ff0ef..6620e5f 100644 --- a/docs/defradb/How-to Guides/secondary-index-how-to.md +++ b/docs/defradb/How-to Guides/secondary-index-how-to.md @@ -12,7 +12,7 @@ This guide provides step-by-step instructions for creating and using secondary i DefraDB's secondary indexing system enables efficient document lookups using the `@index` directive on GraphQL schema fields. Indexes trade write overhead for significantly faster read performance on filtered queries. -**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add validation overhead). Plan indexes based on query patterns to balance read/write performance. +**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add an additional read operation on every write). Plan indexes based on query patterns to balance read/write performance. ::: @@ -50,12 +50,38 @@ type User { Use `direction: DESC` for descending order or `direction: ASC` (default) for ascending order. +:::note +Direction plays a significant role only for composite indexes. For single-field indexes, the fetcher can traverse entries in either direction equally efficiently. +::: + ### Add the schema ```bash defradb client schema add -f schema.graphql ``` +## Manage indexes with the CLI + +Indexes can be added or deleted at any time using CLI commands — you do not need to redefine the schema from scratch. + +```bash +# Create an index on an existing collection +defradb client index create --collection User --fields name + +# Create a unique index +defradb client index create --collection User --fields email --unique + +# Drop an index +defradb client index drop --collection User --name + +# List indexes on a collection +defradb client index list --collection User +``` + +:::note +GraphQL-based index management is not yet available. Use the CLI or embedded client. +::: + ## Create a unique index Unique indexes ensure no two documents have the same value for the indexed field. @@ -95,6 +121,10 @@ type User @index(includes: [ } ``` +:::note +A composite index is only used when the query filters on the leading field(s) of the index. Filtering on only a non-leading field (e.g. `age` alone in the example above) will not use this index at all. +::: + ## Index relationships Index relationship fields to improve query performance across related documents. @@ -116,6 +146,7 @@ type Address { ``` This indexes both: + - The relationship between User and Address - The city field in Address @@ -132,6 +163,7 @@ query { ``` With the indexes, DefraDB: + 1. Quickly finds Address documents with `city = "Montreal"` 2. Retrieves the related User documents efficiently @@ -153,12 +185,14 @@ type Address { } ``` -This ensures no two Users can reference the same Address document. +This ensures no two Users can reference the same Address document. Note that 1-to-2-sided relations are automatically constrained by a unique index to enforce the 1-to-1 invariant. ## Index JSON fields DefraDB supports indexing JSON fields for efficient queries on nested data. +> **Storage warning:** Indexing JSON fields can consume significant disk space with large datasets, as every leaf node at every path is indexed separately. + ### Define a schema with JSON field ```graphql @@ -234,36 +268,24 @@ type Article @index(includes: [ query { Article(filter: { status: {_eq: "published"} - publishedAt: {_gt: "2024-01-01"} + publishedAt: {_gt: "2017-07-23T03:46:56-05:00"} }) { title } } ``` -This composite index efficiently handles queries filtering on both fields. +This composite index efficiently handles queries filtering on both `status` and `publishedAt` together. If you only filter on `publishedAt` alone, this index won't be used — add a separate single-field index on `publishedAt` if that query pattern is also common. ### Avoid over-indexing -Don't index fields that are rarely queried: +Every index adds write overhead, so only index fields that are actually queried. Fields like `middleName` or `internalNote` that are rarely used in filters don't need indexes. -```graphql -type User { - name: String @index # Good - frequently queried - email: String @index # Good - frequently queried - middleName: String # No index - rarely queried - internalNote: String # No index - rarely queried -} -``` +## Performance considerations -Every index adds write overhead, so only index what you need. +Analyze your application's queries and index the fields used in filters. Use the [explain systems](/defradb/next/How-to%20Guides/explain-systems-how-to) to verify that indexes are being used as expected. -## Performance optimization tips - -- **Index your query patterns**: Analyze your application's queries and index the fields used in filters -- **Use unique indexes sparingly**: They add validation overhead on writes -- **Consider composite indexes**: More efficient than multiple single-field indexes for multi-field queries -- **Test query performance**: Use the [explain systems](/defradb/next/How-to%20Guides/explain-systems-how-to) to analyze query execution +Unique indexes should be used only when uniqueness is a hard requirement — they require an additional read on every insert and update. For JSON and array fields, be mindful that indexing large datasets can consume significant disk space. ## Troubleshooting @@ -272,10 +294,11 @@ Every index adds write overhead, so only index what you need. **Issue**: Query performance hasn't improved after adding indexes. **Solutions**: -- Verify the index was created successfully + +- Verify the index was created successfully using `defradb client index list` - Ensure your query filter uses the indexed field +- For composite indexes, confirm you are filtering on the leading field - Check if you're querying in the reverse direction of a relationship (may need to index the other side) -- Use composite indexes if filtering on multiple fields ### Unique constraint violations @@ -287,4 +310,4 @@ Every index adds write overhead, so only index what you need. **Issue**: Document creation/updates are slower after adding indexes. -**Solution**: This is expected behavior. Indexes trade write performance for read performance. Review your indexes and remove any that aren't essential for your query patterns. +**Solution**: This is expected — indexes trade write performance for read performance. Review your indexes and remove any that aren't serving active query patterns. From 3367dc0e5fb5885c3a6499c18b044724f3baf5d2 Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 17 Feb 2026 16:59:10 -0800 Subject: [PATCH 4/6] update --- docs/defradb/Concepts/secondary-index.md | 36 +++++++++++++----------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/docs/defradb/Concepts/secondary-index.md b/docs/defradb/Concepts/secondary-index.md index 00ca8bb..68e5a7e 100644 --- a/docs/defradb/Concepts/secondary-index.md +++ b/docs/defradb/Concepts/secondary-index.md @@ -34,7 +34,7 @@ An index is a data structure that maps field values to document identifiers. Ins **Without an index:** -```bash +``` Query: Find users with age = 30 Process: Scan all user documents → Check each age field → Return matches Cost: O(n) where n = total documents @@ -42,7 +42,7 @@ Cost: O(n) where n = total documents **With an index on age:** -```bash +``` Query: Find users with age = 30 Process: Look up "30" in age index → Return matching document IDs Cost: O(1) for lookup + O(m) for retrieval where m = matching documents @@ -52,19 +52,19 @@ Cost: O(1) for lookup + O(m) for retrieval where m = matching documents For regular indexes, DefraDB stores index entries as key-value pairs where the document ID is part of the key and the value is empty: -```bash +``` /col_id/ind_id/field_values/_docID → {} ``` For unique indexes, the document ID is stored as the value instead: -```bash +``` /col_id/ind_id/field_values → _docID ``` For a User collection with an indexed `name` field, the entries look like: -```bash +``` Index entries: "Alice/doc_id_1" → {} "Bob/doc_id_2" → {} @@ -105,7 +105,7 @@ type Article @index(includes: [ **Index structure:** -```bash +``` published/2024-01-15/doc_id_1 → {} published/2024-01-16/doc_id_2 → {} published/2024-01-16/doc_id_3 → {} @@ -225,7 +225,7 @@ Unlike scalar fields (String, Int, Bool), JSON fields contain nested structures. **Index entries created** (using `/col_id/ind_id/` prefix, JSON path parts separated by `.`): -```bash +``` /1/1/user.device.model/iPhone/doc_id_1 → {} /1/1/user.device.version/15/doc_id_1 → {} /1/1/user.location.city/Montreal/doc_id_1 → {} @@ -239,13 +239,13 @@ DefraDB uses **inverted indexes** for JSON fields. The whole idea is to tokenize For context, a primary (non-inverted) index might look like: -```bash +``` /1/1/iPhone → {"user": {"device": {"model": "iPhone"}}} ``` The inverted secondary index instead maps paths and values to document IDs: -```bash +``` /1/1/user.device.model/iPhone/doc_id_1 → {} /1/1/user.device.model/Android/doc_id_2 → {} ``` @@ -288,15 +288,15 @@ The indexed approach avoids JSON parsing and navigation during query execution. DefraDB uses a hierarchical key format for JSON index entries: -```bash +``` //// ``` -Example: +Example (using numeric collection ID `1` and index ID `1`): -```bash -users_col/idx_123/user.device.model/iPhone/doc_456 -users_col/idx_123/user.location.city/Montreal/doc_789 +``` +/1/1/user.device.model/iPhone/doc_id_1 +/1/1/user.location.city/Montreal/doc_id_1 ``` This format allows efficient prefix scanning for partial path matches and supports complex queries on nested JSON structures. @@ -350,6 +350,10 @@ Refer to the CLI reference for commands to create and drop indexes on existing c Indexes only help queries that use the indexed fields. If your query patterns change, you may need to adjust your indexing strategy. -### Write amplification and Storage overhead +### Write amplification + +Heavy indexing can significantly slow down write operations. Monitor write performance and adjust your indexing strategy if writes become a bottleneck. + +### Storage overhead -Heavy indexing can significantly slow down write operations. Monitor write performance and adjust your indexing strategy if writes become a bottleneck. Large collections with many indexes — especially on JSON or array fields — can consume significant disk space. Plan storage capacity accordingly. +Large collections with many indexes — especially on JSON or array fields — can consume significant disk space. Plan storage capacity accordingly. From 7daa8435f9c5b1ef0538856f02c841a6cb341d85 Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 17 Feb 2026 17:15:50 -0800 Subject: [PATCH 5/6] update --- docs/defradb/How-to Guides/secondary-index-how-to.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/defradb/How-to Guides/secondary-index-how-to.md b/docs/defradb/How-to Guides/secondary-index-how-to.md index 6620e5f..fbf3758 100644 --- a/docs/defradb/How-to Guides/secondary-index-how-to.md +++ b/docs/defradb/How-to Guides/secondary-index-how-to.md @@ -12,7 +12,7 @@ This guide provides step-by-step instructions for creating and using secondary i DefraDB's secondary indexing system enables efficient document lookups using the `@index` directive on GraphQL schema fields. Indexes trade write overhead for significantly faster read performance on filtered queries. -**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add an additional read operation on every write). Plan indexes based on query patterns to balance read/write performance. +**Best practices:** Index frequently filtered fields, avoid indexing rarely queried fields, and use unique indexes sparingly (they add a read operation on every write). Plan indexes based on query patterns to balance read/write performance. ::: From 07faa419fa6a92f3cec1fbf5ec95bc4c49fa4b3c Mon Sep 17 00:00:00 2001 From: pradhanashutosh Date: Tue, 3 Mar 2026 10:20:14 -0800 Subject: [PATCH 6/6] Update --- docs/defradb/How-to Guides/secondary-index-how-to.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/defradb/How-to Guides/secondary-index-how-to.md b/docs/defradb/How-to Guides/secondary-index-how-to.md index fbf3758..81990f3 100644 --- a/docs/defradb/How-to Guides/secondary-index-how-to.md +++ b/docs/defradb/How-to Guides/secondary-index-how-to.md @@ -283,7 +283,7 @@ Every index adds write overhead, so only index fields that are actually queried. ## Performance considerations -Analyze your application's queries and index the fields used in filters. Use the [explain systems](/defradb/next/How-to%20Guides/explain-systems-how-to) to verify that indexes are being used as expected. +Analyze your application's queries and index the fields used in filters. Use the [explain systems](explain-systems-how-to.md) to verify that indexes are being used as expected. Unique indexes should be used only when uniqueness is a hard requirement — they require an additional read on every insert and update. For JSON and array fields, be mindful that indexing large datasets can consume significant disk space.