feat: always-on RDFS entailment for SHACL and policy enforcement#1426
Conversation
…ch cache) SHACL subclass targeting used snapshot.schema_hierarchy() — frozen at the last index build — so a shape targeting Employee silently skipped Manager-typed records when 'Manager rdfs:subClassOf Employee' was committed but not yet indexed. Mechanism (zero cost on the common path): - Novelty::schema_epoch bumps only when a commit asserts or retracts rdfs:subClassOf / rdfs:subPropertyOf (one integer + bounded name compare per flake, inside the existing routing loop). - SchemaHierarchyCache (core) holds the current hierarchy keyed by (indexed schema t, novelty schema epoch); Arc-shared on LedgerState and carried across commits, so non-schema commits keep it warm and rebuilds (two bound-predicate scans + closure) happen only after a schema-touching commit, lazily on the next consumer. - Transaction-time SHACL and the validate facade construct the engine with the cached hierarchy (ShaclEngine::from_dbs_with_hierarchy). - compute_schema_hierarchy_with_overlay moves to fluree-db-core so enforcement layers below the query crate can share it. Per the enforcement rule, the hierarchy is committed-state only: schema asserted in the same transaction as the data does not entail for that transaction (two-transaction workflow; pinned by test).
Every transaction recompiled all shapes from scratch (~40 bound- predicate scans in ShapeCompiler). Same epoch pattern as the RDFS hierarchy cache: - Novelty::shacl_epoch bumps only when a commit carries a SHACL- affecting flake: any sh:* predicate, or an rdf:type edge to a SHACL type / rdfs:Class / owl:Class (shape registration and implicit class targets both change compile output). One integer compare per flake in the existing routing loop. - ShaclEngine holds Arc<ShaclCache>; a type-erased slot on LedgerState (carried across commits, mirrored on LedgerView) stores the entry keyed by (indexed snapshot t, shacl epoch, schema epoch, shape- source graph ids). Data-only transactions reuse the previous compile via ShaclEngine::from_shared_cache; any shape/schema change or shapesSource re-point recompiles. Inline opts.shapes and cross- ledger shape wires bypass the cache (per-transaction by nature). The ledger crate stays SHACL-agnostic (Any-erased slot, downcast in the API enforcement layer, following the TypeErasedStore precedent).
Class policies (f:onClass) now govern instances of subclasses, and property policies (f:onProperty) govern subproperties. Expansion happens once at policy-set build time — for_classes gains the subclass closure and property targets gain the subproperty closure — so the class→property union, class_check_needed computation, and all four evaluation-time class checks see the expanded sets with no per-flake cost. The hierarchy is the current (novelty-aware) one, computed in build_policy_context alongside the existing stats assembly; the cross-ledger policy wire path keeps its pre-entailment behavior (no hierarchy handle there yet).
Always-on entailment for enforcement: a property step over p also traverses every q rdfs:subPropertyOf p. Applied everywhere SHACL reads data by predicate: path evaluation (forward/inverse steps and the simple-predicate fast paths in both the top-level and nested property loops), pair-constraint target-property loads, and predicate targeting (sh:targetSubjectsOf / sh:targetObjectsOf in focus discovery, the per-node applicability probes, and the literal-object collection). The hierarchy threads through ClassMembershipCtx (already flowing through the validation call tree) and — the wiring gap this exposed — through validate_view_with_shacl into the staged-validation engine: the tx path handed only the ShaclCache across the crate boundary and stage.rs rebuilt a hierarchy-less engine, so engine-level entailment silently vanished at transaction time. The staged path now receives the shared Arc<ShaclCache> plus the hierarchy (also dropping a per- transaction deep clone of the compiled shapes). W3C SHACL core unchanged at 82.7% (no suite test exercises subPropertyOf; ours pin the behavior).
Closes the last cell of the entailment matrix: when the class/property hierarchy lives in a model ledger (config f:reasoningDefaults / f:schemaSource with f:ledger), SHACL and policy enforcement now merge its subclass/subproperty edges — matching what query reasoning already did. A shared cross_ledger::resolve_schema_closure_bundle resolves the SchemaClosure wire (t-cached in GovernanceCache: an unchanged model head is an Arc clone, not a re-query) and translates it against the data ledger's snapshot. Consumers: - Transaction SHACL: the bundle threads through StagedShaclContext; the enforcement hierarchy computes over it composed on novelty (bypassing the local hierarchy cache and the compiled-shape reuse, whose keys have no model-t dimension). - Policy: build_policy_context composes the bundle into its hierarchy on the query path (both branches), the transact policy path, and the cross-ledger policy path; a from_opts_with_schema wrapper keeps the seven existing plain callers unchanged (local-only hierarchy). follow-owl-imports with a cross-ledger schema source fails closed, mirroring the query path. Tests: Manager-typed records governed by an Employee shape (tx rejection) and an f:onClass Employee policy (query visibility) purely via M's rdfs:subClassOf edge.
…-ledger schemaSource The enforcement-entailment resolver hard-failed on f:followOwlImports + cross-ledger f:schemaSource, mirroring the query path's fail-closed guard. But this resolver runs inside every transaction's enforcement setup, so once such a config committed, every subsequent write on the ledger was rejected -- including the config repair itself -- and the pre-existing it_schema_cross_ledger fails-closed tests (which pin that inserts succeed and the rejection surfaces at reasoning-query time) broke. Skip the cross-ledger bundle instead (warn + local-only hierarchy): enforcement falls back to exactly the pre-feature behavior for this unsupported combination, while the loud rejection stays on the reasoning-query path (resolve_configured_schema_bundle), where an incomplete import closure would actually change entailment results.
aaj3f
left a comment
There was a problem hiding this comment.
This is nice and nice to get SHACL working w/ ontology characteristics 👍
| // Current RDFS hierarchy (always-on entailment for enforcement): class | ||
| // policies govern subclass instances, property policies govern | ||
| // subproperties. Two bound-predicate scans + closure — small next to the | ||
| // stats assembly and policy parsing above. | ||
| let hierarchy = match &cross_ledger_schema { | ||
| // Cross-ledger ontology: compose the model ledger's schema bundle | ||
| // over the local overlay so its subclass/subproperty edges merge in. | ||
| Some(bundle) => { | ||
| let composed = fluree_db_query::schema_bundle::SchemaBundleOverlay::new( | ||
| overlay, | ||
| std::sync::Arc::clone(bundle), | ||
| ); | ||
| fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, &composed, to_t).await | ||
| } | ||
| None => { | ||
| fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, overlay, to_t).await | ||
| } | ||
| } | ||
| .map_err(|e| ApiError::internal(format!("policy hierarchy computation failed: {e}")))?; |
There was a problem hiding this comment.
I originally thought the perf-hit I'm about to describe scaled w/ instance data size, but it's actually bound by schema size, so it's a bit more minor than I thought, but I'll include the note anyway (corrected for schema-bounds not instance-bounds), but it's way more minor than I originally thought:
Every policy-enforced query now recomputes the current schema hierarchy here, unconditionally and without the epoch cache. compute_schema_hierarchy_with_overlay clones snapshot.schema (IndexSchema.pred.vals — one entry per predicate/class), builds a HashMap over all of it, and sorts — O(ontology size) allocations plus two bound-predicate range scans, on the per-query policy path. The tx path routes through the epoch-invalidated SchemaHierarchyCache::current(...) (tx.rs:671, validate.rs:434), and that cache is warm across commits (finalize_state_with_base, commit.rs:910); this read path does not use it, because build_policy_context_from_opts_inner operates on &LedgerSnapshot/&dyn OverlayProvider and never receives the LedgerState cache handle. Pre-PR, build_policy_set did zero hierarchy work.
Why minor, not major (verification): the cost is bounded by ontology size, not instance-data size, and this same function already runs a novelty-aware assemble_full_stats per query (policy_builder.rs:356-365) — the whole PolicyContext is rebuilt per query and uncached, so the hierarchy is one more comparable-magnitude step, not a newly-expensive path. The tx-caches/read-doesn't contrast is also imperfect: the cross-ledger tx branch (tx.rs:662-668) recomputes uncached too. Real and worth fixing, but not merge-blocking. Two independent mitigations, the first one is the high-value one:
-
Skip the computation when no restriction can use it (identity-only / default-allow / OnSubject policies pay nothing):
let needs_hierarchy = restrictions.iter().any(|r| { !r.for_classes.is_empty() || r.target_mode == TargetMode::OnProperty }); let hierarchy = if !needs_hierarchy { None } else { match &cross_ledger_schema { /* existing match */ } };
-
For the class/property case, thread the ledger's
Arc<SchemaHierarchyCache>into this builder (add it toGraphDband pass it down like tx.rs does) so repeated policy queries reuse one entry instead of rebuilding per call. As written, a large-ontology ledger pays a full schema clone + sort on every governed read.
There was a problem hiding this comment.
Addressed mitigation #1 in 64699d8: the hierarchy computation is now guarded on a needs_hierarchy check (any restriction with for_classes non-empty or target_mode == OnProperty), so identity-only / default-allow / OnSubject policies skip it entirely. Left mitigation #2 (threading the ledger's SchemaHierarchyCache into this read path) as a follow-up — it's the larger plumbing change and only matters for large-ontology ledgers under governed read load.
| // Same boundary for the cross-ledger ontology: when | ||
| // f:reasoningDefaults/f:schemaSource points at M, resolve the schema | ||
| // wire (t-cached) so the enforcement hierarchy can merge M's | ||
| // subclass/subproperty edges. | ||
| let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, resolve_ctx).await?; |
There was a problem hiding this comment.
resolve_cross_ledger_schema_for_tx re-runs resolve_ledger_config (config-graph type scan + parse) even though resolve_cross_ledger_shapes_for_tx (L839) loaded the identical config one line earlier. On any config-bearing ledger this doubles config resolution per transaction. The empty-config guard makes it free on plain ledgers, but for configured ledgers it is redundant work on the write path. Suggest resolving the config once in stage_with_config_shacl and passing &config (or Option<&LedgerConfig>) into both resolvers.
// resolve config once, then:
let cross_ledger_shapes = resolve_cross_ledger_shapes_for_tx(&ledger, &config, resolve_ctx).await?;
let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, &config, resolve_ctx).await?;…s it build_policy_context_from_opts_inner computed the schema hierarchy on every governed query, uncached — an O(ontology-size) schema clone + sort plus two bound-predicate scans. Only OnClass (subclass-instance) and OnProperty (subproperty) restrictions consult it; identity-only, default-allow, and OnSubject policies don't. Guard the computation on a needs_hierarchy check so those paths pay nothing, and a class/property-governed ledger only pays when a restriction can actually use the result. Behavior unchanged: subclass/subproperty entailment still fires (covered by policy_onclass_governs_subclass_instances / _via_cross_ledger_schema and policy_onproperty_governs_subproperties).
resolve_cross_ledger_shapes_for_tx and resolve_cross_ledger_schema_for_tx each called resolve_ledger_config independently, so a config-bearing ledger ran the config-graph type scan + parse twice per transaction. Resolve it once in stage_with_config_shacl and pass Option<&LedgerConfig> into both. The shapes resolver no longer needs the LedgerState handle at all; the schema resolver keeps it for resolve_schema_closure_bundle. Plain (config-less) ledgers are unaffected — a None config short-circuits both to no dispatch.
Four conflicts, all where main's config-resolution-once work overlaps this branch's RDFS enforcement: - novelty/lib.rs: union the Novelty init — this branch's schema_epoch / shacl_epoch plus main's config_write_t. - policy_view.rs: keep this branch's build_policy_context_from_opts_with_schema (threads cross_ledger_schema); drop the redundant policy_graphs resolve (already resolved before the no-inputs shortcut). - tx.rs: keep this branch's fail-loud resolve_ledger_config + cross-ledger shapes AND schema resolvers; bridge to main's apply_shacl config-dedup by deriving tx_config = config.clone().map(Arc::new) for the per-graph SHACL pass. - .fluree-memory/repo.ttl: union both branches' memory records. Verified: workspace builds --all-features --all-targets, fmt + clippy clean, suites green (novelty/policy/shacl 153, api policy/reasoning/graphsource 258, transact 159).
Adds always-on RDFS entailment (subclass + subproperty) to SHACL and policy enforcement, matching what query reasoning already did — a Manager-typed record no longer slips past an Employee-targeting shape or an
f:onClass ex:Employeepolicy. Built on the validate branch; richer entailment regimes (OWL modes) stay opt-in future work behind the same seam.The bug this fixes
Commit-time SHACL expanded
sh:targetClassover subclasses usingsnapshot.schema_hierarchy()— frozen at the last index build. Anyrdfs:subClassOfcommitted since reindex was invisible to enforcement, so subclass-typed records skipped their shapes. Policy targeting had no entailment at all.Mechanism: epoch counters + shared caches (zero cost on the common path)
Two counters on
Novelty, bumped inside the per-flake routing loopapply_commitalready runs (one integer compare per flake, same pattern as the existingf:reifies*check):schema_epoch— bumps only when a commit asserts/retractsrdfs:subClassOf/rdfs:subPropertyOf.shacl_epoch— bumps only on SHACL-affecting flakes (anysh:*predicate, orrdf:typeedges to SHACL types /rdfs:Class/owl:Class).Two
Arc-shared caches onLedgerState, carried across commits:SchemaHierarchyCache(core) — the current subclass/subproperty closure, keyed by(indexed schema t, schema_epoch); rebuilt lazily (two bound-predicate scans) only after a schema-touching commit. Consumed by transaction SHACL, the validate façade, and policy-context construction.TypeErasedStoreprecedent) — data-only transactions reuse the previous transaction's compiled shapes viaArc<ShaclCache>and skipShapeCompiler's ~40 predicate scans entirely. Any shape/schema change orf:shapesSourcere-point recompiles; inlineopts.shapesand cross-ledger wires bypass.What entails now
sh:targetClassfires for subclass instances (current hierarchy, not index-time);sh:targetSubjectsOf/targetObjectsOfmatch subjects/objects of subproperties.sh:path schema:namegoverns values asserted via any subproperty — through the simple-predicate fast paths, sequence/inverse/alternative/transitive steps, and pair-constraint target loads. (This exposed a wiring gap: the staged-validation engine was rebuilt hierarchy-less across the transact crate boundary;validate_view_with_shaclnow receives the sharedArc<ShaclCache>+ hierarchy, which also removes a per-transaction deep clone of the compiled shapes.)f:onClasspolicies govern subclass instances,f:onPropertypolicies govern subproperties — expanded once at policy-set build time, so evaluation is untouched.f:reasoningDefaults/f:schemaSourcepoints at a model ledger, its subclass/subproperty edges merge into both SHACL and policy enforcement. Resolution is t-cached in the existingGovernanceCache— an unchanged model head is one nameservicet-read plus anArcclone, never a re-query.Semantics rule (deliberate)
Enforcement uses the committed hierarchy only: schema asserted in the same transaction as data does not entail for that transaction (pinned by test). The workaround is the natural one — two transactions, schema first — and matches how shape compilation already behaves.
Tests
Each behavior is pinned end-to-end: the unindexed-subclass rejection repro, the same-transaction rule, compile-cache invalidation on new shapes, policy subclass/subproperty governance (both directions fail without the change), SHACL subproperty paths and predicate targets, and two-ledger topologies for the cross-ledger cases (shape rejection at transaction time; policy visibility on the query path). Docs updated in the SHACL cookbook and policy model.