feat: dynamic schema with policy-driven field handling#280
Merged
Conversation
Add DynamicFieldPolicy (Strict / Dynamic / Ignore) to control how
undeclared fields are treated at ingest time. Dynamic (new default)
infers a type per undeclared field and adds it to the schema; Strict
rejects the document; Ignore drops the field silently.
Core additions:
- DynamicFieldPolicy enum + SchemaBuilder::dynamic_field_policy and
try_build; Schema gains a dynamic_field_policy field (serde default)
- Type inference engine for text / integer / float / bool / geo
(object shape {lat|latitude, lon|lng|longitude} with range checks)
- Type coercion layer with documented lossy conversions (Integer
fields silently truncate incoming floats: 3.14 -> 3; Text fields
stringify any scalar); Ignore policy drops non-coercible fields
- Reserved field namespace: names starting with `_` are engine-owned
and rejected at schema build, add_field, and document ingest;
`_id` remains the only allow-listed system field
- Unified query parser rejects `field:value` clauses that reference
undeclared fields at parse time (typo detection)
Protocol / server:
- Proto Schema gains dynamic_field_policy (field 5) and DynamicFieldPolicy
enum; UNSPECIFIED maps to Dynamic for forward compatibility
- REST gateway accepts `"dynamic_field_policy"` JSON key with string
values "strict" / "dynamic" / "ignore"
Bindings (policy getter + setter on Schema):
- laurus-python: set_dynamic_field_policy / dynamic_field_policy
- laurus-nodejs: setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-wasm: setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-ruby: set_dynamic_field_policy / dynamic_field_policy
- laurus-php: setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-mcp: create_index tool description documents the new key
- Shared parser: impl FromStr for DynamicFieldPolicy in the core crate
Out of scope (tracked in ~/.claude/tasks/laurus/TODO.md):
- Multi-valued numeric fields (Int64Array / Float64Array) - requires
lexical store multi-value support; Dynamic inference returns a
"not yet supported" error for numeric JSON arrays
- Integer -> Float field promotion on type conflict; values are
truncated instead
BREAKING: Default behaviour for undeclared fields changes from
"silent drop" to "auto-add". Existing code that relied on the old
behaviour should set DynamicFieldPolicy::Ignore explicitly. The
Schema struct literal now requires dynamic_field_policy.
Tests: 32 new unit tests across schema / type_inference /
type_coercion / query / convert::schema; 9 end-to-end integration
tests in dynamic_schema_test.rs covering all three policies, the
`_` prefix guard, truncation, and DSL typo detection.
Docs (EN + JA): concepts/schema_and_fields, concepts/query_dsl,
laurus-cli/schema_format, laurus-server/{http_gateway,grpc_api},
laurus-{python,nodejs,wasm,ruby,php}/api_reference,
laurus-mcp/tools.
CI runs clippy with Rust 1.95, which introduces two new warn-by-default lints that the prior codebase did not meet. These are not related to the dynamic-schema feature but block the PR's CI check. Changes: - Replace `sort_by(|a, b| b.field.cmp(&a.field))` with `sort_by_key(|x| std::cmp::Reverse(x.field))` in facet.rs (3 sites), spelling/dictionary.rs, and the flat / hnsw / ivf segment managers. - Replace `sort_by(|a, b| a.path.depth().cmp(&b.path.depth()))` with `sort_by_key(|c| c.path.depth())` in facet.rs. - Collapse the nested `if` inside a `DataValue::Bytes(_, mime)` match arm into the arm's guard in vector/store.rs and vector/store/embedding_writer.rs, fixing clippy::collapsible_match. No behaviour change.
This was referenced Apr 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DynamicFieldPolicy(Strict/Dynamic/Ignore) so the schema can accept, infer-and-add, or reject undeclared document fields at ingest time; the new default isDynamic._-prefix), and Query DSL typo detection end-to-end, including updates to the gRPC proto, REST gateway, and every language binding (Python / Node.js / WASM / Ruby / PHP / MCP).Dynamic(3.14→3).Behaviour matrix
StrictDynamic(default)IgnoreKey additions
laurus/src/engine/schema.rs—DynamicFieldPolicyenum,impl FromStr,validate_field_name, extendedSchemaBuilderlaurus/src/engine/type_inference.rs—infer_option_from_data_value, JSON-basedinfer_from_json(public API for transport layers)laurus/src/engine/type_coercion.rs— documented lossy conversionslaurus/src/engine.rs—apply_dynamic_schemawired intoput_document/add_documentlaurus/src/engine/query.rs—UnifiedQueryParser::with_known_fields+ parse-time field validationlaurus-server/proto/laurus/v1/index.proto— newDynamicFieldPolicyenum +Schema.dynamic_field_policy(field 5)laurus-server/src/{convert/schema.rs, gateway/convert.rs}— proto / JSON conversion, with round-trip testslaurus-{python,nodejs,wasm,ruby,php}/src/schema.rs—setDynamicFieldPolicy/dynamicFieldPolicyaccessorslaurus-mcp/src/server.rs—create_indextool description and JSON sample updatedBreaking change (pre-release)
Schemanow has a mandatorydynamic_field_policyfield. Default behaviour for undeclared fields changes from "silent drop" to "auto-add". Existing integrations that relied on the old behaviour should explicitly setDynamicFieldPolicy::Ignore(the testschema_lexical_test.rsis updated to do so).Out of scope (tracked in
~/.claude/tasks/laurus/TODO.md)Int64Array/Float64Array) — requires lexical-store multi-value support; for now the Dynamic path returns a "not yet supported" error for JSON numeric arrays.infer_from_jsoninto the gateway's document-ingest path.Test plan
cargo fmt --checkcargo clippy -p laurus -p laurus-server -p laurus-cli -p laurus-mcp -p laurus-python -p laurus-nodejs -p laurus-wasm --all-targets -- -D warningscargo test -p laurus -p laurus-server -p laurus-cli -p laurus-mcp— 687 unit + 32 new unit + 9 end-to-end integration tests, all passingmarkdownlint-cli2 "docs/src/**/*.md" "docs/ja/src/**/*.md"— 0 errorsmdbook build docsandmdbook build docs/jasucceedDocumentation
Updated pages (EN + JA):
concepts/schema_and_fields.md— reserved fields, Dynamic Schema section, inference rules, type-conflict matrix, silent truncation warningconcepts/query_dsl.md— "Field validation" notelaurus-cli/schema_format.md—dynamic_field_policyTOML key,_prefix reservationlaurus-server/http_gateway.md— REST JSON sample with the new keylaurus-server/grpc_api.md— proto Schema definitionlaurus-{python,nodejs,wasm,ruby,php}/api_reference.md—setDynamicFieldPolicy/dynamicFieldPolicylaurus-mcp/tools.md—create_indexschema JSON sample