fix(query): cross-ledger GRAPH queries over indexed data (#1405 bugs 2+3)#1425
Draft
Jackamus29 wants to merge 2 commits into
Draft
fix(query): cross-ledger GRAPH queries over indexed data (#1405 bugs 2+3)#1425Jackamus29 wants to merge 2 commits into
Jackamus29 wants to merge 2 commits into
Conversation
…1405) Two composed failures prevented property paths + cross-graph joins from working over an indexed, multi-ledger dataset: Bug 2 — a GRAPH-scoped query over an INDEXED multi-ledger dataset hit an internal invariant ("EncodedSid/EncodedPid reached stamp_provenance"). Root cause: `DatasetOperator` computed `multi_ledger` from the *active* graphs only, so a single default graph (alongside named graphs from other ledgers) was treated as single-ledger — the binary store stayed enabled and its scans emitted late `Binding::EncodedSid`, which then seeded a GRAPH block / crossed a boundary and reached provenance stamping, which cannot decode them. Fix: also treat the scan as multi-ledger when the whole dataset spans ledgers, forcing full `Binding::Sid` materialization (which stamps to `IriMatch`). Bug 3 — a cross-`GRAPH` join or path over DIVERGENT namespace codes silently returned []. The materialization fix above resolves the join/seed cases (bound keys now cross as `IriMatch` and re-encode). For property paths specifically, the operator also matched pattern *constants* (endpoints) and *predicates* against the primary/lowering snapshot's codes rather than the per-GRAPH graph's, so a divergent-namespace endpoint/predicate found nothing. Fix: re-encode path predicate and constant-endpoint SIDs into the active graph's dict (`reencode_pred` + the `Ref::Sid` arm of `resolve_sid`), reusing the same decode-primary/encode-target idiom as `binary_scan`'s `reencode_sid`. Single-ledger and single-graph queries are unchanged (re-encode round-trips to the same SID; materialization only kicks in for multi-ledger datasets). The union-path guard (failure 1 in #1405) is intentionally left in place. Adds tests/it_multi_graph_property_path.rs: the q1-q3d repro plus a usage matrix (A1-A8) covering object-position joins, multi-value completeness, precision, strict paths, three-ledger chains, single-ledger regression, FILTER EXISTS, and unbounded closures — all over indexed, divergent-namespace ledgers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Review feedback: - Trim the engine-side comments to the load-bearing invariant and drop #1405 references from implementation code (db engineers know the mechanics; issue refs belong in the regression tests, which keep them). - Rewrite the test data from generic `ex:thing`/`ex:category`/`narrow`/`mid`/ `top` into a concrete library/subjects domain with distinct per-ledger prefixes, matching the house style of the sibling cross-ledger tests: `lib:book1` (library.example) references a `subj:` subject taxonomy (jazz ⊂ music ⊂ arts via `subj:broader`, subject.example). The own-prefix-first / shared-via-ref seeding that produces namespace-code divergence is preserved, so the aligned (q3d warm-up) vs divergent contrast still holds. No behavior change; 14/14 in the suite, fmt + clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Partially addresses #1405 — makes the documented
GRAPH-scoped workaround actually work over indexed, multi-ledger, divergent-namespace data. Fixes failures 2 and 3 from the issue. Failure 1 (lifting the union-path guard via cross-snapshot BFS) is a committed follow-up.Failures fixed
Bug 2 — indexed GRAPH path → internal error. A
GRAPH-scoped path over an indexed multi-ledger dataset hitEncodedSid/EncodedPid reached stamp_provenance.DatasetOperatorcomputedmulti_ledgerfrom the active graphs only, so a single default graph (alongside named graphs from other ledgers) was treated as single-ledger — the binary store stayed on, its scans emitted lateEncodedSid, and those seeded a GRAPH block / crossed a boundary into provenance stamping, which can't decode them. Fix: treat the scan as multi-ledger when the whole dataset spans ledgers, forcing fullBinding::Sidmaterialization (which stamps toIriMatch).Bug 3 — divergent-namespace cross-graph join/path → silent
[]. The materialization fix resolves the join/seed cases (bound keys now cross asIriMatchand re-encode). Property paths additionally matched pattern constants (endpoints) and predicates against the primary snapshot's codes rather than the per-GRAPHgraph's. Fix: re-encode path predicate + constant-endpoint SIDs into the active graph's dict (reencode_pred+ theRef::Sidarm ofresolve_sid), reusingbinary_scan's decode-primary/encode-target idiom.Scope / safety
q2characterization test asserts it still fires.Tests —
tests/it_multi_graph_property_path.rsThe
q1–q3drepro (adopted from the issue) plus a usage-pattern matrix, all indexed with divergent namespace codes and exact-value assertions:p+strict path + join (bug 2 + 3 combined)FILTER EXISTSacross a GRAPH boundary (covered by the root-cause fix)Verification: 14/14 in the new suite;
fluree-db-query1170+;grp_query299,grp_query_sparql258,grp_misc237; fmt + clippy clean.Follow-up (separate PR)
Failure 1 — lift the guard with a cross-snapshot BFS so a property path runs directly over a multi-graph union (
p+across ledgers), honoring the "query your datasets as if they were one materialized dataset" model.🤖 Generated with Claude Code