Skip to content

fix(query): cross-ledger GRAPH queries over indexed data (#1405 bugs 2+3)#1425

Draft
Jackamus29 wants to merge 2 commits into
mainfrom
fix/1405-cross-ledger-graph
Draft

fix(query): cross-ledger GRAPH queries over indexed data (#1405 bugs 2+3)#1425
Jackamus29 wants to merge 2 commits into
mainfrom
fix/1405-cross-ledger-graph

Conversation

@Jackamus29

Copy link
Copy Markdown
Contributor

Partially addresses #1405 — makes the documented GRAPH-scoped workaround actually work over indexed, multi-ledger, divergent-namespace data. Fixes failures 2 and 3 from the issue. Failure 1 (lifting the union-path guard via cross-snapshot BFS) is a committed follow-up.

Failures fixed

Bug 2 — indexed GRAPH path → internal error. A GRAPH-scoped path over an indexed multi-ledger dataset hit EncodedSid/EncodedPid reached stamp_provenance. DatasetOperator computed multi_ledger from the active graphs only, so a single default graph (alongside named graphs from other ledgers) was treated as single-ledger — the binary store stayed on, its scans emitted late EncodedSid, and those seeded a GRAPH block / crossed a boundary into provenance stamping, which can't decode them. Fix: treat the scan as multi-ledger when the whole dataset spans ledgers, forcing full Binding::Sid materialization (which stamps to IriMatch).

Bug 3 — divergent-namespace cross-graph join/path → silent []. The materialization fix resolves the join/seed cases (bound keys now cross as IriMatch and re-encode). Property paths additionally matched pattern constants (endpoints) and predicates against the primary snapshot's codes rather than the per-GRAPH graph's. Fix: re-encode path predicate + constant-endpoint SIDs into the active graph's dict (reencode_pred + the Ref::Sid arm of resolve_sid), reusing binary_scan's decode-primary/encode-target idiom.

Scope / safety

  • Single-ledger and single-graph queries unchanged (re-encode round-trips to the same SID; materialization only engages for multi-ledger datasets).
  • The union-path guard (failure 1) stays in place — its q2 characterization test asserts it still fires.
  • Tradeoff: multi-ledger datasets now eagerly materialize even a single default-graph scan (a scoped cost), which is the price of correct cross-boundary provenance — consistent with the existing multi-ledger disable.

Tests — tests/it_multi_graph_property_path.rs

The q1q3d repro (adopted from the issue) plus a usage-pattern matrix, all indexed with divergent namespace codes and exact-value assertions:

test pins
q3a / q3c bug 2 (indexed GRAPH path) / bug 3 (divergent join)
A1 object-position cross-graph join
A2 / A3 multi-value completeness / precision (include match, exclude non-match)
A4 p+ strict path + join (bug 2 + 3 combined)
A5 three-ledger chained join (re-encode composes)
A6 single-ledger GRAPH path unchanged (regression)
A7 FILTER EXISTS across a GRAPH boundary (covered by the root-cause fix)
A8 unbounded closure with a divergent-code predicate

Verification: 14/14 in the new suite; fluree-db-query 1170+; grp_query 299, grp_query_sparql 258, grp_misc 237; fmt + clippy clean.

Follow-up (separate PR)

Failure 1 — lift the guard with a cross-snapshot BFS so a property path runs directly over a multi-graph union (p+ across ledgers), honoring the "query your datasets as if they were one materialized dataset" model.

🤖 Generated with Claude Code

…1405)

Two composed failures prevented property paths + cross-graph joins from
working over an indexed, multi-ledger dataset:

Bug 2 — a GRAPH-scoped query over an INDEXED multi-ledger dataset hit an
internal invariant ("EncodedSid/EncodedPid reached stamp_provenance"). Root
cause: `DatasetOperator` computed `multi_ledger` from the *active* graphs only,
so a single default graph (alongside named graphs from other ledgers) was
treated as single-ledger — the binary store stayed enabled and its scans
emitted late `Binding::EncodedSid`, which then seeded a GRAPH block / crossed a
boundary and reached provenance stamping, which cannot decode them. Fix: also
treat the scan as multi-ledger when the whole dataset spans ledgers, forcing
full `Binding::Sid` materialization (which stamps to `IriMatch`).

Bug 3 — a cross-`GRAPH` join or path over DIVERGENT namespace codes silently
returned []. The materialization fix above resolves the join/seed cases (bound
keys now cross as `IriMatch` and re-encode). For property paths specifically,
the operator also matched pattern *constants* (endpoints) and *predicates*
against the primary/lowering snapshot's codes rather than the per-GRAPH graph's,
so a divergent-namespace endpoint/predicate found nothing. Fix: re-encode path
predicate and constant-endpoint SIDs into the active graph's dict
(`reencode_pred` + the `Ref::Sid` arm of `resolve_sid`), reusing the same
decode-primary/encode-target idiom as `binary_scan`'s `reencode_sid`.

Single-ledger and single-graph queries are unchanged (re-encode round-trips to
the same SID; materialization only kicks in for multi-ledger datasets). The
union-path guard (failure 1 in #1405) is intentionally left in place.

Adds tests/it_multi_graph_property_path.rs: the q1-q3d repro plus a usage matrix
(A1-A8) covering object-position joins, multi-value completeness, precision,
strict paths, three-ledger chains, single-ledger regression, FILTER EXISTS, and
unbounded closures — all over indexed, divergent-namespace ledgers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Jackamus29 Jackamus29 marked this pull request as draft July 3, 2026 01:12
Review feedback:
- Trim the engine-side comments to the load-bearing invariant and drop #1405
  references from implementation code (db engineers know the mechanics; issue
  refs belong in the regression tests, which keep them).
- Rewrite the test data from generic `ex:thing`/`ex:category`/`narrow`/`mid`/
  `top` into a concrete library/subjects domain with distinct per-ledger
  prefixes, matching the house style of the sibling cross-ledger tests:
  `lib:book1` (library.example) references a `subj:` subject taxonomy
  (jazz ⊂ music ⊂ arts via `subj:broader`, subject.example). The
  own-prefix-first / shared-via-ref seeding that produces namespace-code
  divergence is preserved, so the aligned (q3d warm-up) vs divergent contrast
  still holds.

No behavior change; 14/14 in the suite, fmt + clippy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant