Fix Cypher label disjunction panic and improve schema handling by milliondreams · Pull Request #64 · rustic-ai/uni-db

milliondreams · 2026-05-04T15:54:21Z

No description provided.

Closes rustic-ai#62. `MATCH (n:A|B)` and the `WHERE n:A OR n:B` rewrite both lower to a `Union` of per-label `Scan` operators (rustic-ai#56). When the labels carry different property sets the per-label `Scan` outputs have different Arrow widths and DataFusion's `union_schema` panics with `index out of bounds` — the panic escapes the await as a process abort. The existing rustic-ai#56 fixtures all used uniform `name: String` schemas, so the hazard stayed hidden. Three layers fix it: - `function_props`: register `ID`, `ELEMENTID`, `TYPE` as `entity_arg_only`. The unknown-function fallback was marking entities as `*` (full-entity wildcard) for plain `id(b)` access, which forced `df_planner::resolve_properties` to expand per-label schemas under each Union branch. Cures the most common trigger. - `planner`: add `label_branches_share_property_schema` and route both `plan_unbound_node` and `replace_scan_all_with_label_union` through a heterogeneous fallback. When branch schemas would diverge (different property names/types, or any schemaless label in the mix), every branch becomes a *single-label* `ScanMainByLabels` instead of a per-label `Scan`. The per-branch Union shape is kept (multi-label `ScanMainByLabels` has AND/intersection semantics — wrong for disjunction), but each branch now emits a uniform schemaless schema so the Union is safe. - `df_planner::plan_union`: defense-in-depth schema-mismatch check before `UnionExec::try_new`. Any future logical-Union mismatch now becomes a typed `Plan` error instead of a DataFusion panic. Tests in `cypher_label_disjunction_support.rs` add a heterogeneous fixture (DestA: 2 props, DestB: 4 props) and cover the inline `(:A|B)`, common-property projection, and `WHERE`-form variants.

Restores the [patch.crates-io] override pointing at ../uni-xervo, mirroring the pattern used for 0.9.0 prior to commit 668844e. The patch can be removed once 0.10.0 is published to crates.io.

Closes rustic-ai#63. Re-applying the same schema (the documented "register on every KB-open" pattern) was duplicating index entries in schema.indexes super-linearly: explicit append in apply()'s AddIndex arm, plus the rebuild loop calling create_*_index whose epilogue re-appends every existing entry on top. Production catalogs hit 60K+ duplicates within ~15 reopens, dragging KB-open into multi-minute territory. Three layered fixes: - uni-common SchemaManager::add_index is now upsert-by-name. The trailing add_index in every IndexManager::create_*_index becomes idempotent for free, so neither apply()'s rebuild loop nor Cypher CREATE INDEX (which depends on that epilogue) needs plumbing changes. Index names are stable per (label, property), so name is a correct upsert key. - apply()'s AddIndex arm short-circuits the synchronous Lance rebuild when the index is already registered with identical config. This is what closes the "KB open takes minutes" gap — the post-fix repro goes from 15.7 s to 0.020 s for 10 applies. - SchemaManager::load_from_store self-heals legacy bloated catalogs by collapsing duplicate entries by name (last writer wins, matching the upsert semantics). Emits a tracing warning once per bloated load. No immediate save() on load — the cleanup persists on the next mutation that calls save(), keeping load() free of surprise I/O. Tests in schema_apply_idempotency.rs cover the issue's three repros plus a bloated-catalog migration case. Unit tests in schema.rs cover upsert and load-dedup semantics directly.

The defense-in-depth guard added in 3f71cd4 (issue rustic-ai#62) compared both field names and types per position before UnionExec::try_new, which over-rejected the user-facing Cypher UNION clause. UNION routinely produces branches whose internal columns are namespaced by different pattern variables (e.g. `MATCH (a:A) RETURN a UNION MATCH (b:B) RETURN b` yields `a._vid` vs `b._vid` at position 1). DataFusion's union_schema handles name divergence by adopting left names; only width/type mismatches are the panic source. Caught by uni-tck Union1[4] / Union2[4] which went from FAIL to PASS after relaxing to type-only comparison. Also folds in cargo fmt for two long let-bindings introduced in 4786f4a / 3f71cd4 that the CI fmt check flagged.

uni-xervo 0.11.0 ships two reranker-relevant fixes: - Cross-encoder loader auto-detects token_type_ids from session.inputs() (commit d08411c). Pre-0.11.0 the loader hardcoded the input, which errored at ORT load for BAAI/bge-reranker-base (its ONNX export doesn't declare that input). So BGE reranker support was effectively blocked on this bump. - Generative-style reranker dispatch for Qwen3-Reranker family (commit 21642b5). Selected via `options.style = "generative"` on the ModelAliasSpec; default `cross-encoder` is unchanged so existing catalogs are safe. The bump is non-breaking on the consumer surface (runtime.reranker(alias) and RerankerModel::rerank signatures unchanged), so no uni-db code edits are forced. Two new #[ignore] integration tests in reranker_integration.rs lock in real-ONNX coverage for both families through uni-db's alias-based rerank pipeline: - test_real_onnx_bge_reranker_reranks_by_relevance — BAAI/bge-reranker-base via the default cross-encoder path. Validates the token_type_ids auto-detection lands. - test_real_onnx_qwen3_reranker_reranks_by_relevance — onnx-community/ Qwen3-Reranker-0.6B-ONNX (quantized model_q4 ~500 MB) via generative dispatch. Pins rerank_score in [0, 1] so any future regression to raw logits surfaces in the assertion rather than as a silent score-shape drift. Both new tests use query_with(...).timeout(...).fetch_all() with extended timeouts to absorb first-run model download cost; the default 30 s session timeout is too tight for the larger models. Run via the existing CI step: `cargo nextest run -p uni-db --features provider-onnx --test reranker_integration --run-ignored all`. Verified: 29/29 reranker tests pass (25 mock + 4 real-ONNX). Workspace 3281/3281 pass with no regressions.

milliondreams added 5 commits April 30, 2026 23:16

chore(deps): bump uni-xervo to 0.10.0 with local-path patch

93223e2

Restores the [patch.crates-io] override pointing at ../uni-xervo, mirroring the pattern used for 0.9.0 prior to commit 668844e. The patch can be removed once 0.10.0 is published to crates.io.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Cypher label disjunction panic and improve schema handling#64

Fix Cypher label disjunction panic and improve schema handling#64
milliondreams wants to merge 5 commits intorustic-ai:mainfrom
milliondreams:main

milliondreams commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

milliondreams commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant