Skip to content

feat: always-on RDFS entailment for SHACL and policy enforcement#1426

Merged
bplatz merged 12 commits into
mainfrom
feature/rdfs-enforcement-entailment
Jul 5, 2026
Merged

feat: always-on RDFS entailment for SHACL and policy enforcement#1426
bplatz merged 12 commits into
mainfrom
feature/rdfs-enforcement-entailment

Conversation

@bplatz

@bplatz bplatz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Adds always-on RDFS entailment (subclass + subproperty) to SHACL and policy enforcement, matching what query reasoning already did — a Manager-typed record no longer slips past an Employee-targeting shape or an f:onClass ex:Employee policy. Built on the validate branch; richer entailment regimes (OWL modes) stay opt-in future work behind the same seam.

The bug this fixes

Commit-time SHACL expanded sh:targetClass over subclasses using snapshot.schema_hierarchy()frozen at the last index build. Any rdfs:subClassOf committed since reindex was invisible to enforcement, so subclass-typed records skipped their shapes. Policy targeting had no entailment at all.

Mechanism: epoch counters + shared caches (zero cost on the common path)

Two counters on Novelty, bumped inside the per-flake routing loop apply_commit already runs (one integer compare per flake, same pattern as the existing f:reifies* check):

  • schema_epoch — bumps only when a commit asserts/retracts rdfs:subClassOf / rdfs:subPropertyOf.
  • shacl_epoch — bumps only on SHACL-affecting flakes (any sh:* predicate, or rdf:type edges to SHACL types / rdfs:Class / owl:Class).

Two Arc-shared caches on LedgerState, carried across commits:

  • SchemaHierarchyCache (core) — the current subclass/subproperty closure, keyed by (indexed schema t, schema_epoch); rebuilt lazily (two bound-predicate scans) only after a schema-touching commit. Consumed by transaction SHACL, the validate façade, and policy-context construction.
  • Compiled-SHACL slot (type-erased on the ledger crate, following the TypeErasedStore precedent) — data-only transactions reuse the previous transaction's compiled shapes via Arc<ShaclCache> and skip ShapeCompiler's ~40 predicate scans entirely. Any shape/schema change or f:shapesSource re-point recompiles; inline opts.shapes and cross-ledger wires bypass.

What entails now

  • SHACL targets: sh:targetClass fires for subclass instances (current hierarchy, not index-time); sh:targetSubjectsOf / targetObjectsOf match subjects/objects of subproperties.
  • SHACL paths: a constraint on sh:path schema:name governs values asserted via any subproperty — through the simple-predicate fast paths, sequence/inverse/alternative/transitive steps, and pair-constraint target loads. (This exposed a wiring gap: the staged-validation engine was rebuilt hierarchy-less across the transact crate boundary; validate_view_with_shacl now receives the shared Arc<ShaclCache> + hierarchy, which also removes a per-transaction deep clone of the compiled shapes.)
  • Policy: f:onClass policies govern subclass instances, f:onProperty policies govern subproperties — expanded once at policy-set build time, so evaluation is untouched.
  • Cross-ledger ontologies: when f:reasoningDefaults / f:schemaSource points at a model ledger, its subclass/subproperty edges merge into both SHACL and policy enforcement. Resolution is t-cached in the existing GovernanceCache — an unchanged model head is one nameservice t-read plus an Arc clone, never a re-query.

Semantics rule (deliberate)

Enforcement uses the committed hierarchy only: schema asserted in the same transaction as data does not entail for that transaction (pinned by test). The workaround is the natural one — two transactions, schema first — and matches how shape compilation already behaves.

Tests

Each behavior is pinned end-to-end: the unindexed-subclass rejection repro, the same-transaction rule, compile-cache invalidation on new shapes, policy subclass/subproperty governance (both directions fail without the change), SHACL subproperty paths and predicate targets, and two-ledger topologies for the cross-ledger cases (shape rejection at transaction time; policy visibility on the query path). Docs updated in the SHACL cookbook and policy model.

bplatz added 8 commits July 2, 2026 21:19
…ch cache)

SHACL subclass targeting used snapshot.schema_hierarchy() — frozen at
the last index build — so a shape targeting Employee silently skipped
Manager-typed records when 'Manager rdfs:subClassOf Employee' was
committed but not yet indexed.

Mechanism (zero cost on the common path):
- Novelty::schema_epoch bumps only when a commit asserts or retracts
  rdfs:subClassOf / rdfs:subPropertyOf (one integer + bounded name
  compare per flake, inside the existing routing loop).
- SchemaHierarchyCache (core) holds the current hierarchy keyed by
  (indexed schema t, novelty schema epoch); Arc-shared on LedgerState
  and carried across commits, so non-schema commits keep it warm and
  rebuilds (two bound-predicate scans + closure) happen only after a
  schema-touching commit, lazily on the next consumer.
- Transaction-time SHACL and the validate facade construct the engine
  with the cached hierarchy (ShaclEngine::from_dbs_with_hierarchy).
- compute_schema_hierarchy_with_overlay moves to fluree-db-core so
  enforcement layers below the query crate can share it.

Per the enforcement rule, the hierarchy is committed-state only:
schema asserted in the same transaction as the data does not entail
for that transaction (two-transaction workflow; pinned by test).
Every transaction recompiled all shapes from scratch (~40 bound-
predicate scans in ShapeCompiler). Same epoch pattern as the RDFS
hierarchy cache:

- Novelty::shacl_epoch bumps only when a commit carries a SHACL-
  affecting flake: any sh:* predicate, or an rdf:type edge to a SHACL
  type / rdfs:Class / owl:Class (shape registration and implicit class
  targets both change compile output). One integer compare per flake
  in the existing routing loop.
- ShaclEngine holds Arc<ShaclCache>; a type-erased slot on LedgerState
  (carried across commits, mirrored on LedgerView) stores the entry
  keyed by (indexed snapshot t, shacl epoch, schema epoch, shape-
  source graph ids). Data-only transactions reuse the previous
  compile via ShaclEngine::from_shared_cache; any shape/schema change
  or shapesSource re-point recompiles. Inline opts.shapes and cross-
  ledger shape wires bypass the cache (per-transaction by nature).

The ledger crate stays SHACL-agnostic (Any-erased slot, downcast in
the API enforcement layer, following the TypeErasedStore precedent).
Class policies (f:onClass) now govern instances of subclasses, and
property policies (f:onProperty) govern subproperties. Expansion
happens once at policy-set build time — for_classes gains the subclass
closure and property targets gain the subproperty closure — so the
class→property union, class_check_needed computation, and all four
evaluation-time class checks see the expanded sets with no per-flake
cost.

The hierarchy is the current (novelty-aware) one, computed in
build_policy_context alongside the existing stats assembly; the
cross-ledger policy wire path keeps its pre-entailment behavior (no
hierarchy handle there yet).
Always-on entailment for enforcement: a property step over p also
traverses every q rdfs:subPropertyOf p. Applied everywhere SHACL reads
data by predicate: path evaluation (forward/inverse steps and the
simple-predicate fast paths in both the top-level and nested property
loops), pair-constraint target-property loads, and predicate targeting
(sh:targetSubjectsOf / sh:targetObjectsOf in focus discovery, the
per-node applicability probes, and the literal-object collection).

The hierarchy threads through ClassMembershipCtx (already flowing
through the validation call tree) and — the wiring gap this exposed —
through validate_view_with_shacl into the staged-validation engine:
the tx path handed only the ShaclCache across the crate boundary and
stage.rs rebuilt a hierarchy-less engine, so engine-level entailment
silently vanished at transaction time. The staged path now receives
the shared Arc<ShaclCache> plus the hierarchy (also dropping a per-
transaction deep clone of the compiled shapes).

W3C SHACL core unchanged at 82.7% (no suite test exercises
subPropertyOf; ours pin the behavior).
Closes the last cell of the entailment matrix: when the class/property
hierarchy lives in a model ledger (config f:reasoningDefaults /
f:schemaSource with f:ledger), SHACL and policy enforcement now merge
its subclass/subproperty edges — matching what query reasoning already
did.

A shared cross_ledger::resolve_schema_closure_bundle resolves the
SchemaClosure wire (t-cached in GovernanceCache: an unchanged model
head is an Arc clone, not a re-query) and translates it against the
data ledger's snapshot. Consumers:

- Transaction SHACL: the bundle threads through StagedShaclContext;
  the enforcement hierarchy computes over it composed on novelty
  (bypassing the local hierarchy cache and the compiled-shape reuse,
  whose keys have no model-t dimension).
- Policy: build_policy_context composes the bundle into its hierarchy
  on the query path (both branches), the transact policy path, and
  the cross-ledger policy path; a from_opts_with_schema wrapper keeps
  the seven existing plain callers unchanged (local-only hierarchy).

follow-owl-imports with a cross-ledger schema source fails closed,
mirroring the query path. Tests: Manager-typed records governed by an
Employee shape (tx rejection) and an f:onClass Employee policy (query
visibility) purely via M's rdfs:subClassOf edge.
@bplatz bplatz requested review from aaj3f and zonotope July 3, 2026 10:54
…-ledger schemaSource

The enforcement-entailment resolver hard-failed on f:followOwlImports +
cross-ledger f:schemaSource, mirroring the query path's fail-closed
guard. But this resolver runs inside every transaction's enforcement
setup, so once such a config committed, every subsequent write on the
ledger was rejected -- including the config repair itself -- and the
pre-existing it_schema_cross_ledger fails-closed tests (which pin that
inserts succeed and the rejection surfaces at reasoning-query time)
broke.

Skip the cross-ledger bundle instead (warn + local-only hierarchy):
enforcement falls back to exactly the pre-feature behavior for this
unsupported combination, while the loud rejection stays on the
reasoning-query path (resolve_configured_schema_bundle), where an
incomplete import closure would actually change entailment results.

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and nice to get SHACL working w/ ontology characteristics 👍

Comment thread fluree-db-api/src/policy_builder.rs Outdated
Comment on lines +370 to +388
// Current RDFS hierarchy (always-on entailment for enforcement): class
// policies govern subclass instances, property policies govern
// subproperties. Two bound-predicate scans + closure — small next to the
// stats assembly and policy parsing above.
let hierarchy = match &cross_ledger_schema {
// Cross-ledger ontology: compose the model ledger's schema bundle
// over the local overlay so its subclass/subproperty edges merge in.
Some(bundle) => {
let composed = fluree_db_query::schema_bundle::SchemaBundleOverlay::new(
overlay,
std::sync::Arc::clone(bundle),
);
fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, &composed, to_t).await
}
None => {
fluree_db_core::compute_schema_hierarchy_with_overlay(snapshot, overlay, to_t).await
}
}
.map_err(|e| ApiError::internal(format!("policy hierarchy computation failed: {e}")))?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought the perf-hit I'm about to describe scaled w/ instance data size, but it's actually bound by schema size, so it's a bit more minor than I thought, but I'll include the note anyway (corrected for schema-bounds not instance-bounds), but it's way more minor than I originally thought:

Every policy-enforced query now recomputes the current schema hierarchy here, unconditionally and without the epoch cache. compute_schema_hierarchy_with_overlay clones snapshot.schema (IndexSchema.pred.vals — one entry per predicate/class), builds a HashMap over all of it, and sorts — O(ontology size) allocations plus two bound-predicate range scans, on the per-query policy path. The tx path routes through the epoch-invalidated SchemaHierarchyCache::current(...) (tx.rs:671, validate.rs:434), and that cache is warm across commits (finalize_state_with_base, commit.rs:910); this read path does not use it, because build_policy_context_from_opts_inner operates on &LedgerSnapshot/&dyn OverlayProvider and never receives the LedgerState cache handle. Pre-PR, build_policy_set did zero hierarchy work.

Why minor, not major (verification): the cost is bounded by ontology size, not instance-data size, and this same function already runs a novelty-aware assemble_full_stats per query (policy_builder.rs:356-365) — the whole PolicyContext is rebuilt per query and uncached, so the hierarchy is one more comparable-magnitude step, not a newly-expensive path. The tx-caches/read-doesn't contrast is also imperfect: the cross-ledger tx branch (tx.rs:662-668) recomputes uncached too. Real and worth fixing, but not merge-blocking. Two independent mitigations, the first one is the high-value one:

  1. Skip the computation when no restriction can use it (identity-only / default-allow / OnSubject policies pay nothing):

    let needs_hierarchy = restrictions.iter().any(|r| {
        !r.for_classes.is_empty() || r.target_mode == TargetMode::OnProperty
    });
    let hierarchy = if !needs_hierarchy {
        None
    } else {
        match &cross_ledger_schema { /* existing match */ }
    };
  2. For the class/property case, thread the ledger's Arc<SchemaHierarchyCache> into this builder (add it to GraphDb and pass it down like tx.rs does) so repeated policy queries reuse one entry instead of rebuilding per call. As written, a large-ontology ledger pays a full schema clone + sort on every governed read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed mitigation #1 in 64699d8: the hierarchy computation is now guarded on a needs_hierarchy check (any restriction with for_classes non-empty or target_mode == OnProperty), so identity-only / default-allow / OnSubject policies skip it entirely. Left mitigation #2 (threading the ledger's SchemaHierarchyCache into this read path) as a follow-up — it's the larger plumbing change and only matters for large-ontology ledgers under governed read load.

Comment thread fluree-db-api/src/tx.rs Outdated
Comment on lines +840 to +844
// Same boundary for the cross-ledger ontology: when
// f:reasoningDefaults/f:schemaSource points at M, resolve the schema
// wire (t-cached) so the enforcement hierarchy can merge M's
// subclass/subproperty edges.
let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, resolve_ctx).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_cross_ledger_schema_for_tx re-runs resolve_ledger_config (config-graph type scan + parse) even though resolve_cross_ledger_shapes_for_tx (L839) loaded the identical config one line earlier. On any config-bearing ledger this doubles config resolution per transaction. The empty-config guard makes it free on plain ledgers, but for configured ledgers it is redundant work on the write path. Suggest resolving the config once in stage_with_config_shacl and passing &config (or Option<&LedgerConfig>) into both resolvers.

// resolve config once, then:
let cross_ledger_shapes = resolve_cross_ledger_shapes_for_tx(&ledger, &config, resolve_ctx).await?;
let cross_ledger_schema = resolve_cross_ledger_schema_for_tx(&ledger, &config, resolve_ctx).await?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 17ae7eb

bplatz added 2 commits July 5, 2026 09:13
…s it

build_policy_context_from_opts_inner computed the schema hierarchy on every
governed query, uncached — an O(ontology-size) schema clone + sort plus two
bound-predicate scans. Only OnClass (subclass-instance) and OnProperty
(subproperty) restrictions consult it; identity-only, default-allow, and
OnSubject policies don't. Guard the computation on a needs_hierarchy check so
those paths pay nothing, and a class/property-governed ledger only pays when a
restriction can actually use the result.

Behavior unchanged: subclass/subproperty entailment still fires (covered by
policy_onclass_governs_subclass_instances / _via_cross_ledger_schema and
policy_onproperty_governs_subproperties).
resolve_cross_ledger_shapes_for_tx and resolve_cross_ledger_schema_for_tx each
called resolve_ledger_config independently, so a config-bearing ledger ran the
config-graph type scan + parse twice per transaction. Resolve it once in
stage_with_config_shacl and pass Option<&LedgerConfig> into both. The shapes
resolver no longer needs the LedgerState handle at all; the schema resolver
keeps it for resolve_schema_closure_bundle. Plain (config-less) ledgers are
unaffected — a None config short-circuits both to no dispatch.
Base automatically changed from feature/shacl-validate-cli to main July 5, 2026 14:58
Four conflicts, all where main's config-resolution-once work overlaps this
branch's RDFS enforcement:

- novelty/lib.rs: union the Novelty init — this branch's schema_epoch /
  shacl_epoch plus main's config_write_t.
- policy_view.rs: keep this branch's build_policy_context_from_opts_with_schema
  (threads cross_ledger_schema); drop the redundant policy_graphs resolve
  (already resolved before the no-inputs shortcut).
- tx.rs: keep this branch's fail-loud resolve_ledger_config + cross-ledger
  shapes AND schema resolvers; bridge to main's apply_shacl config-dedup by
  deriving tx_config = config.clone().map(Arc::new) for the per-graph SHACL
  pass.
- .fluree-memory/repo.ttl: union both branches' memory records.

Verified: workspace builds --all-features --all-targets, fmt + clippy clean,
suites green (novelty/policy/shacl 153, api policy/reasoning/graphsource 258,
transact 159).
@bplatz bplatz merged commit 109d146 into main Jul 5, 2026
13 checks passed
@bplatz bplatz deleted the feature/rdfs-enforcement-entailment branch July 5, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants