feat: always-on RDFS entailment for SHACL and policy enforcement by bplatz · Pull Request #1426 · fluree/db

bplatz · 2026-07-03T10:54:16Z

Adds always-on RDFS entailment (subclass + subproperty) to SHACL and policy enforcement, matching what query reasoning already did — a Manager-typed record no longer slips past an Employee-targeting shape or an f:onClass ex:Employee policy. Built on the validate branch; richer entailment regimes (OWL modes) stay opt-in future work behind the same seam.

The bug this fixes

Commit-time SHACL expanded sh:targetClass over subclasses using snapshot.schema_hierarchy() — frozen at the last index build. Any rdfs:subClassOf committed since reindex was invisible to enforcement, so subclass-typed records skipped their shapes. Policy targeting had no entailment at all.

Mechanism: epoch counters + shared caches (zero cost on the common path)

Two counters on Novelty, bumped inside the per-flake routing loop apply_commit already runs (one integer compare per flake, same pattern as the existing f:reifies* check):

schema_epoch — bumps only when a commit asserts/retracts rdfs:subClassOf / rdfs:subPropertyOf.
shacl_epoch — bumps only on SHACL-affecting flakes (any sh:* predicate, or rdf:type edges to SHACL types / rdfs:Class / owl:Class).

Two Arc-shared caches on LedgerState, carried across commits:

SchemaHierarchyCache (core) — the current subclass/subproperty closure, keyed by (indexed schema t, schema_epoch); rebuilt lazily (two bound-predicate scans) only after a schema-touching commit. Consumed by transaction SHACL, the validate façade, and policy-context construction.
Compiled-SHACL slot (type-erased on the ledger crate, following the TypeErasedStore precedent) — data-only transactions reuse the previous transaction's compiled shapes via Arc<ShaclCache> and skip ShapeCompiler's ~40 predicate scans entirely. Any shape/schema change or f:shapesSource re-point recompiles; inline opts.shapes and cross-ledger wires bypass.

What entails now

SHACL targets: sh:targetClass fires for subclass instances (current hierarchy, not index-time); sh:targetSubjectsOf / targetObjectsOf match subjects/objects of subproperties.
SHACL paths: a constraint on sh:path schema:name governs values asserted via any subproperty — through the simple-predicate fast paths, sequence/inverse/alternative/transitive steps, and pair-constraint target loads. (This exposed a wiring gap: the staged-validation engine was rebuilt hierarchy-less across the transact crate boundary; validate_view_with_shacl now receives the shared Arc<ShaclCache> + hierarchy, which also removes a per-transaction deep clone of the compiled shapes.)
Policy: f:onClass policies govern subclass instances, f:onProperty policies govern subproperties — expanded once at policy-set build time, so evaluation is untouched.
Cross-ledger ontologies: when f:reasoningDefaults / f:schemaSource points at a model ledger, its subclass/subproperty edges merge into both SHACL and policy enforcement. Resolution is t-cached in the existing GovernanceCache — an unchanged model head is one nameservice t-read plus an Arc clone, never a re-query.

Semantics rule (deliberate)

Enforcement uses the committed hierarchy only: schema asserted in the same transaction as data does not entail for that transaction (pinned by test). The workaround is the natural one — two transactions, schema first — and matches how shape compilation already behaves.

Tests

Each behavior is pinned end-to-end: the unindexed-subclass rejection repro, the same-transaction rule, compile-cache invalidation on new shapes, policy subclass/subproperty governance (both directions fail without the change), SHACL subproperty paths and predicate targets, and two-ledger topologies for the cross-ledger cases (shape rejection at transaction time; policy visibility on the query path). Docs updated in the SHACL cookbook and policy model.

…ch cache) SHACL subclass targeting used snapshot.schema_hierarchy() — frozen at the last index build — so a shape targeting Employee silently skipped Manager-typed records when 'Manager rdfs:subClassOf Employee' was committed but not yet indexed. Mechanism (zero cost on the common path): - Novelty::schema_epoch bumps only when a commit asserts or retracts rdfs:subClassOf / rdfs:subPropertyOf (one integer + bounded name compare per flake, inside the existing routing loop). - SchemaHierarchyCache (core) holds the current hierarchy keyed by (indexed schema t, novelty schema epoch); Arc-shared on LedgerState and carried across commits, so non-schema commits keep it warm and rebuilds (two bound-predicate scans + closure) happen only after a schema-touching commit, lazily on the next consumer. - Transaction-time SHACL and the validate facade construct the engine with the cached hierarchy (ShaclEngine::from_dbs_with_hierarchy). - compute_schema_hierarchy_with_overlay moves to fluree-db-core so enforcement layers below the query crate can share it. Per the enforcement rule, the hierarchy is committed-state only: schema asserted in the same transaction as the data does not entail for that transaction (two-transaction workflow; pinned by test).

Every transaction recompiled all shapes from scratch (~40 bound- predicate scans in ShapeCompiler). Same epoch pattern as the RDFS hierarchy cache: - Novelty::shacl_epoch bumps only when a commit carries a SHACL- affecting flake: any sh:* predicate, or an rdf:type edge to a SHACL type / rdfs:Class / owl:Class (shape registration and implicit class targets both change compile output). One integer compare per flake in the existing routing loop. - ShaclEngine holds Arc<ShaclCache>; a type-erased slot on LedgerState (carried across commits, mirrored on LedgerView) stores the entry keyed by (indexed snapshot t, shacl epoch, schema epoch, shape- source graph ids). Data-only transactions reuse the previous compile via ShaclEngine::from_shared_cache; any shape/schema change or shapesSource re-point recompiles. Inline opts.shapes and cross- ledger shape wires bypass the cache (per-transaction by nature). The ledger crate stays SHACL-agnostic (Any-erased slot, downcast in the API enforcement layer, following the TypeErasedStore precedent).

Class policies (f:onClass) now govern instances of subclasses, and property policies (f:onProperty) govern subproperties. Expansion happens once at policy-set build time — for_classes gains the subclass closure and property targets gain the subproperty closure — so the class→property union, class_check_needed computation, and all four evaluation-time class checks see the expanded sets with no per-flake cost. The hierarchy is the current (novelty-aware) one, computed in build_policy_context alongside the existing stats assembly; the cross-ledger policy wire path keeps its pre-entailment behavior (no hierarchy handle there yet).

Always-on entailment for enforcement: a property step over p also traverses every q rdfs:subPropertyOf p. Applied everywhere SHACL reads data by predicate: path evaluation (forward/inverse steps and the simple-predicate fast paths in both the top-level and nested property loops), pair-constraint target-property loads, and predicate targeting (sh:targetSubjectsOf / sh:targetObjectsOf in focus discovery, the per-node applicability probes, and the literal-object collection). The hierarchy threads through ClassMembershipCtx (already flowing through the validation call tree) and — the wiring gap this exposed — through validate_view_with_shacl into the staged-validation engine: the tx path handed only the ShaclCache across the crate boundary and stage.rs rebuilt a hierarchy-less engine, so engine-level entailment silently vanished at transaction time. The staged path now receives the shared Arc<ShaclCache> plus the hierarchy (also dropping a per- transaction deep clone of the compiled shapes). W3C SHACL core unchanged at 82.7% (no suite test exercises subPropertyOf; ours pin the behavior).

Closes the last cell of the entailment matrix: when the class/property hierarchy lives in a model ledger (config f:reasoningDefaults / f:schemaSource with f:ledger), SHACL and policy enforcement now merge its subclass/subproperty edges — matching what query reasoning already did. A shared cross_ledger::resolve_schema_closure_bundle resolves the SchemaClosure wire (t-cached in GovernanceCache: an unchanged model head is an Arc clone, not a re-query) and translates it against the data ledger's snapshot. Consumers: - Transaction SHACL: the bundle threads through StagedShaclContext; the enforcement hierarchy computes over it composed on novelty (bypassing the local hierarchy cache and the compiled-shape reuse, whose keys have no model-t dimension). - Policy: build_policy_context composes the bundle into its hierarchy on the query path (both branches), the transact policy path, and the cross-ledger policy path; a from_opts_with_schema wrapper keeps the seven existing plain callers unchanged (local-only hierarchy). follow-owl-imports with a cross-ledger schema source fails closed, mirroring the query path. Tests: Manager-typed records governed by an Employee shape (tx rejection) and an f:onClass Employee policy (query visibility) purely via M's rdfs:subClassOf edge.

…-ledger schemaSource The enforcement-entailment resolver hard-failed on f:followOwlImports + cross-ledger f:schemaSource, mirroring the query path's fail-closed guard. But this resolver runs inside every transaction's enforcement setup, so once such a config committed, every subsequent write on the ledger was rejected -- including the config repair itself -- and the pre-existing it_schema_cross_ledger fails-closed tests (which pin that inserts succeed and the rejection surfaces at reasoning-query time) broke. Skip the cross-ledger bundle instead (warn + local-only hierarchy): enforcement falls back to exactly the pre-feature behavior for this unsupported combination, while the loud rejection stays on the reasoning-query path (resolve_configured_schema_bundle), where an incomplete import closure would actually change entailment results.

aaj3f

This is nice and nice to get SHACL working w/ ontology characteristics 👍

aaj3f · 2026-07-05T00:55:17Z

+    // Current RDFS hierarchy (always-on entailment for enforcement): class
+    // policies govern subclass instances, property policies govern
+    // subproperties. Two bound-predicate scans + closure — small next to the
+    // stats assembly and policy parsing above.
+    let hierarchy = match &cross_ledger_schema {
+        // Cross-ledger ontology: compose the model ledger's schema bundle
+        // over the local overlay so its subclass/subproperty edges merge in.
+        Some(bundle) => {
+            let composed = fluree_db_query::schema_bundle::SchemaBundleOverlay::new(
+                overlay,
+                std::sync::Arc::clone(bundle),
+            );
+            fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, &composed, to_t).await
+        }
+        None => {
+            fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, overlay, to_t).await
+        }
+    }
+    .map_err(|e| ApiError::internal(format!("policy hierarchy computation failed: {e}")))?;


I originally thought the perf-hit I'm about to describe scaled w/ instance data size, but it's actually bound by schema size, so it's a bit more minor than I thought, but I'll include the note anyway (corrected for schema-bounds not instance-bounds), but it's way more minor than I originally thought:

Every policy-enforced query now recomputes the current schema hierarchy here, unconditionally and without the epoch cache. compute_schema_hierarchy_with_overlay clones snapshot.schema (IndexSchema.pred.vals — one entry per predicate/class), builds a HashMap over all of it, and sorts — O(ontology size) allocations plus two bound-predicate range scans, on the per-query policy path. The tx path routes through the epoch-invalidated SchemaHierarchyCache::current(...) (tx.rs:671, validate.rs:434), and that cache is warm across commits (finalize_state_with_base, commit.rs:910); this read path does not use it, because build_policy_context_from_opts_inner operates on &LedgerSnapshot/&dyn OverlayProvider and never receives the LedgerState cache handle. Pre-PR, build_policy_set did zero hierarchy work.

Why minor, not major (verification): the cost is bounded by ontology size, not instance-data size, and this same function already runs a novelty-aware assemble_full_stats per query (policy_builder.rs:356-365) — the whole PolicyContext is rebuilt per query and uncached, so the hierarchy is one more comparable-magnitude step, not a newly-expensive path. The tx-caches/read-doesn't contrast is also imperfect: the cross-ledger tx branch (tx.rs:662-668) recomputes uncached too. Real and worth fixing, but not merge-blocking. Two independent mitigations, the first one is the high-value one:

Skip the computation when no restriction can use it (identity-only / default-allow / OnSubject policies pay nothing):

let needs_hierarchy = restrictions.iter().any(|r| { !r.for_classes.is_empty() || r.target_mode == TargetMode::OnProperty }); let hierarchy = if !needs_hierarchy { None } else { match &cross_ledger_schema { /* existing match */ } };

For the class/property case, thread the ledger's Arc<SchemaHierarchyCache> into this builder (add it to GraphDb and pass it down like tx.rs does) so repeated policy queries reuse one entry instead of rebuilding per call. As written, a large-ontology ledger pays a full schema clone + sort on every governed read.

Addressed mitigation #1 in 64699d8: the hierarchy computation is now guarded on a needs_hierarchy check (any restriction with for_classes non-empty or target_mode == OnProperty), so identity-only / default-allow / OnSubject policies skip it entirely. Left mitigation #2 (threading the ledger's SchemaHierarchyCache into this read path) as a follow-up — it's the larger plumbing change and only matters for large-ontology ledgers under governed read load.

aaj3f · 2026-07-05T00:55:54Z

+    // Same boundary for the cross-ledger ontology: when
+    // f:reasoningDefaults/f:schemaSource points at M, resolve the schema
+    // wire (t-cached) so the enforcement hierarchy can merge M's
+    // subclass/subproperty edges.
+    let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, resolve_ctx).await?;


resolve_cross_ledger_schema_for_tx re-runs resolve_ledger_config (config-graph type scan + parse) even though resolve_cross_ledger_shapes_for_tx (L839) loaded the identical config one line earlier. On any config-bearing ledger this doubles config resolution per transaction. The empty-config guard makes it free on plain ledgers, but for configured ledgers it is redundant work on the write path. Suggest resolving the config once in stage_with_config_shacl and passing &config (or Option<&LedgerConfig>) into both resolvers.

// resolve config once, then: let cross_ledger_shapes = resolve_cross_ledger_shapes_for_tx(&ledger, &config, resolve_ctx).await?; let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, &config, resolve_ctx).await?;

Addressed in 17ae7eb

…s it build_policy_context_from_opts_inner computed the schema hierarchy on every governed query, uncached — an O(ontology-size) schema clone + sort plus two bound-predicate scans. Only OnClass (subclass-instance) and OnProperty (subproperty) restrictions consult it; identity-only, default-allow, and OnSubject policies don't. Guard the computation on a needs_hierarchy check so those paths pay nothing, and a class/property-governed ledger only pays when a restriction can actually use the result. Behavior unchanged: subclass/subproperty entailment still fires (covered by policy_onclass_governs_subclass_instances / _via_cross_ledger_schema and policy_onproperty_governs_subproperties).

resolve_cross_ledger_shapes_for_tx and resolve_cross_ledger_schema_for_tx each called resolve_ledger_config independently, so a config-bearing ledger ran the config-graph type scan + parse twice per transaction. Resolve it once in stage_with_config_shacl and pass Option<&LedgerConfig> into both. The shapes resolver no longer needs the LedgerState handle at all; the schema resolver keeps it for resolve_schema_closure_bundle. Plain (config-less) ledgers are unaffected — a None config short-circuits both to no dispatch.

Four conflicts, all where main's config-resolution-once work overlaps this branch's RDFS enforcement: - novelty/lib.rs: union the Novelty init — this branch's schema_epoch / shacl_epoch plus main's config_write_t. - policy_view.rs: keep this branch's build_policy_context_from_opts_with_schema (threads cross_ledger_schema); drop the redundant policy_graphs resolve (already resolved before the no-inputs shortcut). - tx.rs: keep this branch's fail-loud resolve_ledger_config + cross-ledger shapes AND schema resolvers; bridge to main's apply_shacl config-dedup by deriving tx_config = config.clone().map(Arc::new) for the per-graph SHACL pass. - .fluree-memory/repo.ttl: union both branches' memory records. Verified: workspace builds --all-features --all-targets, fmt + clippy clean, suites green (novelty/policy/shacl 153, api policy/reasoning/graphsource 258, transact 159).

bplatz added 8 commits July 2, 2026 21:19

docs: RDFS entailment in SHACL and policy enforcement

074a7f7

chore: update lockfiles for parking_lot additions

e269c82

memory

13a1a8a

bplatz requested review from aaj3f and zonotope July 3, 2026 10:54

aaj3f approved these changes Jul 5, 2026

View reviewed changes

bplatz added 2 commits July 5, 2026 09:13

Base automatically changed from feature/shacl-validate-cli to main July 5, 2026 14:58

bplatz merged commit 109d146 into main Jul 5, 2026
13 checks passed

bplatz deleted the feature/rdfs-enforcement-entailment branch July 5, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: always-on RDFS entailment for SHACL and policy enforcement#1426

feat: always-on RDFS entailment for SHACL and policy enforcement#1426
bplatz merged 12 commits into
mainfrom
feature/rdfs-enforcement-entailment

bplatz commented Jul 3, 2026

Uh oh!

aaj3f left a comment

Uh oh!

aaj3f Jul 5, 2026

Uh oh!

bplatz Jul 5, 2026

Uh oh!

aaj3f Jul 5, 2026

Uh oh!

bplatz Jul 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bplatz commented Jul 3, 2026

The bug this fixes

Mechanism: epoch counters + shared caches (zero cost on the common path)

What entails now

Semantics rule (deliberate)

Tests

Uh oh!

aaj3f left a comment

Choose a reason for hiding this comment

Uh oh!

aaj3f Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

bplatz Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

aaj3f Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

bplatz Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants