Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 22 additions & 3 deletions docs/contributing/shacl-implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,14 +167,31 @@ Cost is bounded by the number of predicate-targeted shapes in the cache, not by
- For each subject: fetch `rdf:type` flakes, then call `engine.validate_node(db, subject, &types)`.
- Tag each returned `ValidationResult` with `graph_id = Some(g_id)` so the caller can partition reject vs warn.

### `sh:class` value membership and value-sets across graphs

`sh:class` is validated by `validate_class_constraint` in `fluree-db-shacl/src/validate.rs`. There are **two distinct graph contexts** at play:

- **Compilation graph** — where the shapes themselves are read from (`f:shapesSource`; see [Shape compilation from multiple graphs](#shape-compilation-from-multiple-graphs)).
- **Membership graph(s)** — where a value's `rdf:type` (and any `rdfs:subClassOf` hierarchy) is looked up to decide `sh:class` conformance.

By default, membership is resolved against the **focus node's own data graph** (plus `g_id=0` for the `subClassOf` walk). That breaks the "shared value-set" pattern — e.g. a controlled list of US states (`ex:illinois a ex:USState`) referenced by records living in a different graph. To support it, `validate_view_with_shacl` receives the `f:shapesSource` graph ids as `membership_g_ids` and the engine unions them into the lookup: a value's `rdf:type` is read across `{focus data graph} ∪ {membership graphs}`, so the value-set vocabulary can live alongside the shapes. When `f:shapesSource` is unset (shapes in `g_id=0`), this degenerates to the historical behaviour.

A **per-transaction memo** on the `ShaclEngine` (`class_cache`, keyed `(value, class, focus g_id)`) collapses repeated checks — inserting 100 records that all reference `ex:illinois` performs a single membership lookup. The engine is built fresh per transaction (`validate_view_with_shacl`) and shared across all focus nodes, so the memo is scoped to exactly one validation pass. Cache hits skip the range scan **and** its fuel charge, so per-transaction fuel depends on intra-transaction value repetition.

**Cross-ledger value-sets.** When `f:shapesSource` is cross-ledger (`f:ledger`), the controlled vocabulary lives in the *model ledger* M alongside the shapes. Because M's ABox (`ex:illinois a ex:USState`) is *not* carried in the shapes wire (which projects only SHACL predicates + `rdf:list` internals), membership is resolved by **querying M live**: `stage_with_config_shacl` opens M at the resolved `t` (`load_graph_db_at_t`) and threads a `CrossLedgerMembership { model_db, data_ns_map }` down to `validate_class_constraint`. On a local miss, `value_conforms_cross_ledger` decodes the value/class Sids to IRIs via D's staged namespace map (`data_ns_map` — the base snapshot alone can't decode namespaces staged this txn), re-encodes them against M (whose split mode may differ), then does the `rdf:type` + `subClassOf` lookup in M's term space. Well-known predicates (`rdf:type`, `rdfs:subClassOf`) share global namespace codes, so only the user IRIs are translated. The per-txn memo covers cross-ledger verdicts too. M is pinned at the resolved `t` (latest at tx time), consistent with cross-ledger shapes.

Scope limits (as of this writing):
- **Top-level property shapes only.** `sh:class` reached via a referenced/nested shape (`sh:and`/`or`/`xone`/`node` referencing a shape by id) passes `None` for the context and keeps the legacy data-graph lookup (no vocabulary union, no cross-ledger, no memo).
- **`f:atT` / trust dimensions** on the source are still rejected globally, so a cross-ledger value-set tracks M's latest committed state.

### RDFS subclass fallback (`is_subclass_of`)

When the indexed `SchemaHierarchy` doesn't know about a `rdfs:subClassOf` edge (e.g. asserted in the same or a recent unindexed transaction), `validate_class_constraint` calls `is_subclass_of(db, start, target)` which walks `rdfs:subClassOf` upward via BFS.
When the indexed `SchemaHierarchy` doesn't know about a `rdfs:subClassOf` edge (e.g. asserted in the same or a recent unindexed transaction), `validate_class_constraint` (via `value_conforms_to_class`) calls `is_subclass_of(db, membership_g_ids, start, target)` which walks `rdfs:subClassOf` upward via BFS.

Two invariants in that walk:

- **Always scope to `g_id=0`** via `rescope_to_schema_graph(db)` — schema lives in the default graph, matching how `SchemaHierarchy::from_db_root_schema` is built. Subject may be in graph G but the `subClassOf` edge must be looked up in the schema graph.
- **Preserve tracker + other `GraphDbRef` fields** — `rescope_to_schema_graph` uses `db` copy + `g_id = 0` mutation rather than `GraphDbRef::new(..)`, which would reset `tracker`, `runtime_small_dicts`, and `eager`. There's a unit test pinning this (`rescope_to_schema_graph_preserves_tracker_and_other_fields`).
- **Scope to `g_id=0` unioned with the membership graphs** via `rescope_to_graph(db, g)` — schema lives in the default graph (matching how `SchemaHierarchy::from_db_root_schema` is built), while a value-set vocabulary configured via `f:shapesSource` may declare a small class hierarchy in its own graph that must also be honoured. Subject may be in graph G but the `subClassOf` edge is looked up in the schema/vocabulary graphs.
- **Preserve tracker + other `GraphDbRef` fields** — `rescope_to_graph` uses `db` copy + `g_id` mutation rather than `GraphDbRef::new(..)`, which would reset `tracker`, `runtime_small_dicts`, and `eager`. There's a unit test pinning this (`rescope_to_graph_preserves_tracker_and_other_fields`).

## Adding a new constraint

Expand Down Expand Up @@ -250,6 +267,7 @@ fluree.upsert(ledger, &valid_data).await.expect("must pass");
See `fluree-db-api/tests/it_config_graph.rs` for patterns that write config via TriG into the config graph, then stage transactions across multiple graphs. Examples:

- `shacl_shapes_source_points_to_named_graph` — `f:shapesSource` wiring.
- `shacl_class_value_set_in_shapes_graph` — `sh:class` value-set defined in the `f:shapesSource` graph, referenced by data in another graph (cross-graph membership union + per-txn memo).
- `shacl_per_graph_disable_honored` — per-graph `shaclEnabled: false`.
- `shacl_per_graph_mode_warn_vs_reject` — mixed modes across graphs.
- `shacl_target_subjects_of_fires_on_base_state_edge` — base-state predicate-target discovery.
Expand All @@ -270,6 +288,7 @@ This is how we guard against tests that pass trivially but don't actually exerci
- **`sh:uniqueLang`, `sh:languageIn`** — parsed but not evaluated. Needs language-tag metadata on flakes, which isn't yet threaded through the validation path.
- **`sh:qualifiedValueShape` (+ `sh:qualifiedMinCount` / `sh:qualifiedMaxCount`)** — parsed but not evaluated. Needs recursive nested-shape counting.
- **Cross-transaction shape cache** — every call to `from_dbs_with_overlay` recompiles from scratch. `ShaclCacheKey` has a `schema_epoch` field that's ready to drive a shared `Arc<ShaclCache>` cache on the connection, but nothing populates it yet. Low priority until perf regressions are observed.
- **`sh:class` in referenced/nested shapes** — value-membership context (vocabulary graphs, cross-ledger model, per-txn memo) isn't threaded through the `sh:and`/`or`/`xone`/`node` referenced-shape path; those keep the legacy data-graph lookup.

## Where to look in the code

Expand Down
29 changes: 29 additions & 0 deletions docs/guides/cookbook-shacl.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,35 @@ Semantics:
- Use `f:graphSelector f:defaultGraph` to explicitly point at the default graph (same as omitting `f:shapesSource`).
- `f:shapesSource` also supports **cross-ledger references** — set `f:ledger` on the inner `f:graphSource` to compile shapes from a different ledger at validation time. See [Cross-ledger governance — Cross-ledger SHACL shapes](../security/cross-ledger-policy.md#cross-ledger-shacl-shapes) for the end-to-end pattern.

## Shared value-sets with `sh:class`

`sh:class` is the natural way to model a **controlled value-set** — e.g. a fixed list of US states — as an *extensible* enumeration: each allowed value is an instance of a class, and adding a new value means inserting one triple rather than editing the shape (contrast [`sh:in`](#enumerated-values), which bakes the list into the shape). Put the value-set vocabulary **in the same graph as the shapes** (`f:shapesSource`), and it is honoured even when the referencing records live in a different graph:

```trig
@prefix f: <https://ns.flur.ee/db#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://example.org/> .

GRAPH <http://example.org/shapes> {
ex:PersonShape a sh:NodeShape ;
sh:targetClass ex:Person ;
sh:property [ sh:path ex:homeState ; sh:class ex:USState ] .

# The value-set vocabulary lives alongside the shapes.
ex:illinois a ex:USState .
ex:iowa a ex:USState .
}
```

With `f:shapesSource` pointing at `<http://example.org/shapes>`, a record written to any graph passes validation when its `ex:homeState` is one of the declared states, and is rejected otherwise. Adding a state later (`ex:ohio a ex:USState`) requires no shape change.

Semantics and limits:

- **Membership graph = focus data graph ∪ the `f:shapesSource` graph.** A value is an instance of the class if it is typed so in either the record's own graph or the vocabulary graph. When shapes live in the default graph, this is just an ordinary local lookup.
- **Per-transaction caching.** Repeated references to the same value within one transaction are memoized — bulk-inserting many records that share a state pays the membership lookup once.
- **Cross-ledger too.** When `f:shapesSource` is cross-ledger (`f:ledger`), the controlled vocabulary lives in the model ledger alongside the shapes, and membership is resolved by querying that model ledger live (pinned at its latest committed state at transaction time). So a shared governance model can hold shapes *and* the value-sets they reference, and many data ledgers can point at it.

## Inline shapes per transaction

In addition to shapes stored in a ledger, a transaction can supply
Expand Down
1 change: 1 addition & 0 deletions fluree-db-api/src/commit_transfer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,7 @@ impl Fluree {
// construct; commit replay carries no
// `opts` payload.
inline_shape_bundle: None,
cross_ledger_membership: None,
},
)
.await
Expand Down
93 changes: 93 additions & 0 deletions fluree-db-api/src/tx.rs
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,14 @@ pub(crate) struct StagedShaclContext<'a> {
/// additively. Inline shapes do not persist into the ledger.
pub inline_shape_bundle:
Option<std::sync::Arc<fluree_db_query::schema_bundle::SchemaBundleFlakes>>,

/// Live model-ledger membership source for cross-ledger `sh:class`
/// value-sets. Present only when `f:shapesSource` is cross-ledger
/// (`f:ledger` set). Carries a `GraphDbRef` into M's value-set graph at the
/// resolved `t` plus D's staged namespace map (needed to translate D-term
/// Sids into M's term space); `validate_class_constraint` consults it on
/// demand after a local miss.
pub cross_ledger_membership: Option<fluree_db_shacl::CrossLedgerMembership<'a>>,
}

/// Inspect the data ledger's resolved config and, when
Expand Down Expand Up @@ -509,6 +517,13 @@ pub(crate) async fn apply_shacl_policy_to_staged_view(
let mut cl_overlay_holder = None;
#[allow(unused_assignments)]
let mut inline_overlay_holder = None;
// Graphs consulted for `sh:class` value membership at validation time. The
// focus node's own data graph is always consulted; these are the extra
// `f:shapesSource` vocabulary graph(s) unioned in, so a shared value-set
// (e.g. a list of US states) can live alongside the shapes rather than in
// every data graph. Cross-ledger value-sets aren't supported yet, so that
// branch falls back to the default graph.
let membership_g_ids: Vec<fluree_db_core::GraphId>;
let mut shape_dbs: Vec<fluree_db_core::GraphDbRef<'_>> =
if let (Some(wire), Some(staged_ns)) = (ctx.cross_ledger_shapes, ctx.staged_ns) {
let bundle = wire
Expand All @@ -522,6 +537,7 @@ pub(crate) async fn apply_shacl_policy_to_staged_view(
base.novelty.as_ref(),
bundle,
));
membership_g_ids = vec![0];
vec![fluree_db_core::GraphDbRef::new(
&base.snapshot,
0u16,
Expand All @@ -532,6 +548,7 @@ pub(crate) async fn apply_shacl_policy_to_staged_view(
// 4b. Same-ledger path. Resolve `f:shapesSource` into
// concrete graph IDs; default to `[0]` when unset.
let shapes_g_ids = resolve_shapes_source_g_ids(config.as_deref(), &base.snapshot)?;
membership_g_ids = shapes_g_ids.clone();
shapes_g_ids
.iter()
.map(|g_id| base.as_graph_db_ref(*g_id))
Expand Down Expand Up @@ -578,6 +595,8 @@ pub(crate) async fn apply_shacl_policy_to_staged_view(
ctx.graph_sids,
ctx.tracker,
per_graph_policy.as_ref(),
&membership_g_ids,
ctx.cross_ledger_membership,
)
.await?;

Expand Down Expand Up @@ -692,6 +711,78 @@ async fn stage_with_config_shacl(
.map(|(&g_id, iri)| (g_id, ns_registry.sid_for_iri(iri)))
.collect();

// Cross-ledger `sh:class` value-sets: when f:shapesSource is cross-ledger,
// open the model ledger M live at the resolved t and expose a GraphDbRef
// into its value-set graph (the shapes-source graph, where the controlled
// vocabulary lives alongside the shapes). `validate_class_constraint`
// consults it on demand — memoized — after a local membership miss. The
// owned handle and D's namespace map (for term translation) are held across
// the validation await below.
let cross_ledger_model_db = match cross_ledger_shapes.as_deref() {
Some(resolved) => Some(
resolve_ctx
.fluree
.load_graph_db_at_t(&resolved.model_ledger_id, resolved.resolved_t)
.await
.map_err(|e| {
fluree_db_transact::TransactError::Parse(format!(
"failed to open cross-ledger value-set model {} at t={}: {e}",
resolved.model_ledger_id, resolved.resolved_t
))
})?,
),
None => None,
};
// D's namespace codes → IRI prefixes (base + this transaction's staged
// allocations). Needed to decode a staged value Sid to its IRI before
// re-encoding against M — the staged base snapshot alone can't decode a
// namespace introduced this transaction.
let cross_ledger_data_ns_map: Option<HashMap<u16, String>> =
cross_ledger_model_db.as_ref().map(|_| {
ns_registry
.all_codes()
.into_iter()
.filter_map(|code| ns_registry.get_prefix(code).map(|p| (code, p.to_string())))
.collect()
});
let cross_ledger_membership = match (
&cross_ledger_model_db,
&cross_ledger_data_ns_map,
cross_ledger_shapes.as_deref(),
) {
(Some(m_db), Some(ns_map), Some(resolved)) => {
// The value-set vocabulary lives in the same M graph the shapes were
// compiled from, so a miss here should be unreachable. Error loudly
// rather than silently dropping membership — a silent `None` would
// make every M-only value fall through to "not a member" and reject
// the write with spurious `sh:class` violations.
let g_id =
crate::cross_ledger::resolve_selector_g_id(&m_db.snapshot, &resolved.graph_iri)
.map_err(|e| {
fluree_db_transact::TransactError::Parse(format!(
"cross-ledger value-set graph resolution failed: {e}"
))
})?
.ok_or_else(|| {
fluree_db_transact::TransactError::Parse(format!(
"cross-ledger value-set graph {} not present in model ledger {} at t={} \
(shapes resolved but vocabulary graph missing)",
resolved.graph_iri, resolved.model_ledger_id, resolved.resolved_t
))
})?;
Some(fluree_db_shacl::CrossLedgerMembership {
model_db: fluree_db_core::GraphDbRef::new(
&m_db.snapshot,
g_id,
m_db.overlay.as_ref(),
m_db.t,
),
data_ns_map: ns_map,
})
}
_ => None,
};
Comment on lines +748 to +784

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When resolve_selector_g_id returns Ok(None) (the shapes-source graph IRI is absent from M's registry at the resolved t), .map(..) yields None, silently disabling cross-ledger membership — every cross-ledger value then falls through to "not a member" and the write is rejected with false ShaclViolations. Contrast the Err case one line up, which is propagated with context, and the ReservedGraphSelected error inside the resolver. In practice shape resolution already guarantees the graph exists (shapes were loaded from it), so this branch should be unreachable — which is exactly why a silent None here is a latent trap. Prefer an explicit error (or at least a debug_assert!) so a future divergence between shape-graph and value-set-graph resolution surfaces loudly rather than as spurious violations:

.map_err(|e| { /* ... */ })?
.ok_or_else(|| fluree_db_transact::TransactError::Parse(format!(
    "cross-ledger value-set graph {} not present in model ledger {} at t={} \
     (shapes resolved but vocabulary graph missing)",
    resolved.graph_iri, resolved.model_ledger_id, resolved.resolved_t
)))
.map(|g_id| /* CrossLedgerMembership { .. } */ )

(Adjust the surrounding match so this arm yields Some(..).) If the silent fallback is intentional, add a one-line comment saying so and why false violations are acceptable there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a0e1109


apply_shacl_policy_to_staged_view(
&view,
StagedShaclContext {
Expand All @@ -706,6 +797,7 @@ async fn stage_with_config_shacl(
}),
staged_ns: cross_ledger_shapes.as_deref().map(|_| &ns_registry),
inline_shape_bundle,
cross_ledger_membership,
},
)
.await?;
Expand Down Expand Up @@ -2301,6 +2393,7 @@ impl crate::Fluree {
// today — inline SHACL flows in over the JSON
// transaction path. Wireable later if needed.
inline_shape_bundle: None,
cross_ledger_membership: None,
},
)
.await
Expand Down
Loading
Loading