Skip to content

feat: dynamic schema with policy-driven field handling#280

Merged
mosuka merged 3 commits intomainfrom
feat/dynamic-schema
Apr 24, 2026
Merged

feat: dynamic schema with policy-driven field handling#280
mosuka merged 3 commits intomainfrom
feat/dynamic-schema

Conversation

@mosuka
Copy link
Copy Markdown
Owner

@mosuka mosuka commented Apr 24, 2026

Summary

  • Introduce DynamicFieldPolicy (Strict / Dynamic / Ignore) so the schema can accept, infer-and-add, or reject undeclared document fields at ingest time; the new default is Dynamic.
  • Ship the policy, type inference, type coercion, reserved-field namespace (_-prefix), and Query DSL typo detection end-to-end, including updates to the gRPC proto, REST gateway, and every language binding (Python / Node.js / WASM / Ruby / PHP / MCP).
  • Document the feature thoroughly in both English and Japanese, with an explicit warning about silent truncation when an integer field receives a float under Dynamic (3.143).

Behaviour matrix

Policy Undeclared field Type-conflict coercion
Strict reject document propagate coercion error
Dynamic (default) infer type + auto-add try conversion, error on true mismatch
Ignore drop silently drop the field, keep the rest

Key additions

  • laurus/src/engine/schema.rsDynamicFieldPolicy enum, impl FromStr, validate_field_name, extended SchemaBuilder
  • laurus/src/engine/type_inference.rsinfer_option_from_data_value, JSON-based infer_from_json (public API for transport layers)
  • laurus/src/engine/type_coercion.rs — documented lossy conversions
  • laurus/src/engine.rsapply_dynamic_schema wired into put_document / add_document
  • laurus/src/engine/query.rsUnifiedQueryParser::with_known_fields + parse-time field validation
  • laurus-server/proto/laurus/v1/index.proto — new DynamicFieldPolicy enum + Schema.dynamic_field_policy (field 5)
  • laurus-server/src/{convert/schema.rs, gateway/convert.rs} — proto / JSON conversion, with round-trip tests
  • laurus-{python,nodejs,wasm,ruby,php}/src/schema.rssetDynamicFieldPolicy / dynamicFieldPolicy accessors
  • laurus-mcp/src/server.rscreate_index tool description and JSON sample updated

Breaking change (pre-release)

Schema now has a mandatory dynamic_field_policy field. Default behaviour for undeclared fields changes from "silent drop" to "auto-add". Existing integrations that relied on the old behaviour should explicitly set DynamicFieldPolicy::Ignore (the test schema_lexical_test.rs is updated to do so).

Out of scope (tracked in ~/.claude/tasks/laurus/TODO.md)

  • Multi-valued numeric fields (Int64Array / Float64Array) — requires lexical-store multi-value support; for now the Dynamic path returns a "not yet supported" error for JSON numeric arrays.
  • Integer → Float field promotion on type conflict (currently truncates).
  • Migrating infer_from_json into the gateway's document-ingest path.

Test plan

  • cargo fmt --check
  • cargo clippy -p laurus -p laurus-server -p laurus-cli -p laurus-mcp -p laurus-python -p laurus-nodejs -p laurus-wasm --all-targets -- -D warnings
  • cargo test -p laurus -p laurus-server -p laurus-cli -p laurus-mcp — 687 unit + 32 new unit + 9 end-to-end integration tests, all passing
  • markdownlint-cli2 "docs/src/**/*.md" "docs/ja/src/**/*.md" — 0 errors
  • mdbook build docs and mdbook build docs/ja succeed
  • CI matrix (verify on merge)

Documentation

Updated pages (EN + JA):

  • concepts/schema_and_fields.md — reserved fields, Dynamic Schema section, inference rules, type-conflict matrix, silent truncation warning
  • concepts/query_dsl.md — "Field validation" note
  • laurus-cli/schema_format.mddynamic_field_policy TOML key, _ prefix reservation
  • laurus-server/http_gateway.md — REST JSON sample with the new key
  • laurus-server/grpc_api.md — proto Schema definition
  • laurus-{python,nodejs,wasm,ruby,php}/api_reference.mdsetDynamicFieldPolicy / dynamicFieldPolicy
  • laurus-mcp/tools.mdcreate_index schema JSON sample

mosuka added 3 commits April 25, 2026 00:31
Add DynamicFieldPolicy (Strict / Dynamic / Ignore) to control how
undeclared fields are treated at ingest time. Dynamic (new default)
infers a type per undeclared field and adds it to the schema; Strict
rejects the document; Ignore drops the field silently.

Core additions:

- DynamicFieldPolicy enum + SchemaBuilder::dynamic_field_policy and
  try_build; Schema gains a dynamic_field_policy field (serde default)
- Type inference engine for text / integer / float / bool / geo
  (object shape {lat|latitude, lon|lng|longitude} with range checks)
- Type coercion layer with documented lossy conversions (Integer
  fields silently truncate incoming floats: 3.14 -> 3; Text fields
  stringify any scalar); Ignore policy drops non-coercible fields
- Reserved field namespace: names starting with `_` are engine-owned
  and rejected at schema build, add_field, and document ingest;
  `_id` remains the only allow-listed system field
- Unified query parser rejects `field:value` clauses that reference
  undeclared fields at parse time (typo detection)

Protocol / server:

- Proto Schema gains dynamic_field_policy (field 5) and DynamicFieldPolicy
  enum; UNSPECIFIED maps to Dynamic for forward compatibility
- REST gateway accepts `"dynamic_field_policy"` JSON key with string
  values "strict" / "dynamic" / "ignore"

Bindings (policy getter + setter on Schema):

- laurus-python: set_dynamic_field_policy / dynamic_field_policy
- laurus-nodejs: setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-wasm:   setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-ruby:   set_dynamic_field_policy / dynamic_field_policy
- laurus-php:    setDynamicFieldPolicy / dynamicFieldPolicy
- laurus-mcp: create_index tool description documents the new key
- Shared parser: impl FromStr for DynamicFieldPolicy in the core crate

Out of scope (tracked in ~/.claude/tasks/laurus/TODO.md):

- Multi-valued numeric fields (Int64Array / Float64Array) - requires
  lexical store multi-value support; Dynamic inference returns a
  "not yet supported" error for numeric JSON arrays
- Integer -> Float field promotion on type conflict; values are
  truncated instead

BREAKING: Default behaviour for undeclared fields changes from
"silent drop" to "auto-add". Existing code that relied on the old
behaviour should set DynamicFieldPolicy::Ignore explicitly. The
Schema struct literal now requires dynamic_field_policy.

Tests: 32 new unit tests across schema / type_inference /
type_coercion / query / convert::schema; 9 end-to-end integration
tests in dynamic_schema_test.rs covering all three policies, the
`_` prefix guard, truncation, and DSL typo detection.

Docs (EN + JA): concepts/schema_and_fields, concepts/query_dsl,
laurus-cli/schema_format, laurus-server/{http_gateway,grpc_api},
laurus-{python,nodejs,wasm,ruby,php}/api_reference,
laurus-mcp/tools.
CI runs clippy with Rust 1.95, which introduces two new warn-by-default
lints that the prior codebase did not meet. These are not related to
the dynamic-schema feature but block the PR's CI check.

Changes:
- Replace `sort_by(|a, b| b.field.cmp(&a.field))` with
  `sort_by_key(|x| std::cmp::Reverse(x.field))` in facet.rs (3 sites),
  spelling/dictionary.rs, and the flat / hnsw / ivf segment managers.
- Replace `sort_by(|a, b| a.path.depth().cmp(&b.path.depth()))` with
  `sort_by_key(|c| c.path.depth())` in facet.rs.
- Collapse the nested `if` inside a `DataValue::Bytes(_, mime)` match
  arm into the arm's guard in vector/store.rs and
  vector/store/embedding_writer.rs, fixing clippy::collapsible_match.

No behaviour change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant