Problem
There is no mechanism in boring-semantic-layer to express constraints between dimensions and measures at the model definition level. When a semantic model is exposed to an LLM agent (e.g. via MCP), the agent can construct queries that are syntactically valid (correct dimension/measure names) but semantically invalid, producing silently wrong results instead of a clear error.
This is particularly dangerous in AI-agent scenarios because:
- Silent failures produce wrong data: the query executes successfully but returns incorrect numbers (e.g. duplicated sums). The LLM has no way to detect this and presents wrong results to the user with full confidence.
- Error messages enable self-correction: if the model raises a clear
ValueError explaining why the query is invalid and what to do instead, the LLM can reformulate its query and retry successfully.
Our Use Case
We expose BSL semantic models via a Model Context Protocol (MCP) server to LLM agents. Our data has a structural constraint: sales figures are duplicated across categories rows in the source table. Querying measures without pinning the right dimensions produces inflated numbers.
We need to tell the model: "if the agent requests these measures, it MUST include (or filter by) these dimensions; otherwise reject the query with a helpful error."
Current Workaround
We monkey-patch a with_validator() method onto SemanticModel:
from boring_semantic_layer import SemanticModel
def _with_validator(self, validator):
object.__setattr__(self, "_validate_query", validator)
return self
SemanticModel.with_validator = _with_validator
The key design choice: constraint logic lives in the model definition, and the execution layer simply calls whatever validator is attached. This keeps model definitions self-contained and declarative.
Model definition (where constraints are declared)
from boring_semantic_layer import to_semantic_table
from boring_semantic_layer.ops import Dimension, Measure
_TOTAL_MEASURES = frozenset({
"total_sales_eur",
"total_transaction_count",
"fs_share_of_sales", # Uses total in denominator
})
def _dimension_is_pinned(dim, dimensions, filters):
"""Check if a dimension is in GROUP BY or filtered to one value."""
if dim in dimensions:
return True
eq_filters = [f for f in (filters or []) if f.get("field") == dim and f.get("operator") in ("=", "==")]
return len(eq_filters) == 1
def _make_total_constraint_validator(model_name, required_dims):
"""Create a validator enforcing dimension requirements for total measures."""
def validator(dimensions, measures, filters):
requested_total = set(measures) & _TOTAL_MEASURES
if not requested_total:
return
for dim in required_dims:
if not _dimension_is_pinned(dim, dimensions, filters):
raise ValueError(
f"Query validation failed for model '{model_name}': "
f"total measures ({sorted(requested_total)}) require dimension '{dim}' "
f"to be either included in dimensions (GROUP BY) or filtered to "
f"exactly one value. This prevents incorrect aggregation of "
f"duplicated data."
)
return validator
# Model definition (constraint is co-located with the model)
model = (
to_semantic_table(filtered_table, name="fs_sales_product_hfb", description="...")
.with_dimensions(
event_date=Dimension(...),
hfb_no=Dimension(...),
fs_product_int_sk=Dimension(...),
)
.with_measures(
total_sales_eur=Measure(...),
fs_sales_eur=Measure(...),
)
.with_validator(
_make_total_constraint_validator("fs_sales_product_hfb", ["hfb_no", "fs_product_int_sk"])
)
)
Query execution (where validation is invoked)
def query_model(self, model, dimensions, measures, filters=None):
# ... name/type validation ...
# Call model-defined validator (if any)
validator = getattr(model, "_validate_query", None)
if validator is not None:
validator(dimensions, measures, filters) # raises ValueError on bad queries
# ... execute query ...
What This Achieves
| Without validation |
With validation |
Agent queries total_sales_eur without filtering by product -> gets 5x inflated number |
Agent gets: "total measures require 'fs_product_int_sk' to be filtered. Use dimension values endpoint to find valid IDs." |
| Agent presents wrong data to user with confidence |
Agent retries with correct filter -> returns accurate data |
| No way to detect the error after the fact |
Clear, actionable error at query time |
Proposed Native API
A with_validator() (or with_constraints()) method on SemanticModel that:
- Accepts a callable
(dimensions: list[str], measures: list[str], filters: list[dict] | None) -> None that raises ValueError on invalid queries.
- Stores it on the model (respecting immutability via the same pattern as other
with_* methods).
- Optionally exposes it in
json_definition so external tools can introspect constraints without executing them.
# Proposed API
model = (
to_semantic_table(table, name="my_model")
.with_dimensions(...)
.with_measures(...)
.with_validator(my_validator_fn)
)
# Access
model.validate_query(dimensions=["country"], measures=["sales"], filters=[...])
# or
model.json_definition["validator"] # metadata about constraints (optional)
Why a callable (not declarative constraints)
We considered a declarative approach (e.g. measure_requires_dimensions={"total_sales": ["product_id"]}), but real-world constraints are more nuanced:
- A dimension can be "pinned" either by being in GROUP BY or by being filtered to exactly one value.
- Some constraints only apply to subsets of measures.
- Future constraints may involve filter value combinations.
A callable gives model authors full flexibility while keeping the execution layer generic. A declarative format could be added later as syntactic sugar that generates a validator callable internally.
Summary
This feature would let BSL models be self-validating, expressing not just what can be queried, but which combinations are valid. This is critical for AI-agent use cases where the consumer cannot reason about data semantics and relies on clear error signals to self-correct.
Problem
There is no mechanism in
boring-semantic-layerto express constraints between dimensions and measures at the model definition level. When a semantic model is exposed to an LLM agent (e.g. via MCP), the agent can construct queries that are syntactically valid (correct dimension/measure names) but semantically invalid, producing silently wrong results instead of a clear error.This is particularly dangerous in AI-agent scenarios because:
ValueErrorexplaining why the query is invalid and what to do instead, the LLM can reformulate its query and retry successfully.Our Use Case
We expose BSL semantic models via a Model Context Protocol (MCP) server to LLM agents. Our data has a structural constraint: sales figures are duplicated across categories rows in the source table. Querying measures without pinning the right dimensions produces inflated numbers.
We need to tell the model: "if the agent requests these measures, it MUST include (or filter by) these dimensions; otherwise reject the query with a helpful error."
Current Workaround
We monkey-patch a
with_validator()method ontoSemanticModel:The key design choice: constraint logic lives in the model definition, and the execution layer simply calls whatever validator is attached. This keeps model definitions self-contained and declarative.
Model definition (where constraints are declared)
Query execution (where validation is invoked)
What This Achieves
total_sales_eurwithout filtering by product -> gets 5x inflated number"total measures require 'fs_product_int_sk' to be filtered. Use dimension values endpoint to find valid IDs."Proposed Native API
A
with_validator()(orwith_constraints()) method onSemanticModelthat:(dimensions: list[str], measures: list[str], filters: list[dict] | None) -> Nonethat raisesValueErroron invalid queries.with_*methods).json_definitionso external tools can introspect constraints without executing them.Why a callable (not declarative constraints)
We considered a declarative approach (e.g.
measure_requires_dimensions={"total_sales": ["product_id"]}), but real-world constraints are more nuanced:A callable gives model authors full flexibility while keeping the execution layer generic. A declarative format could be added later as syntactic sugar that generates a validator callable internally.
Summary
This feature would let BSL models be self-validating, expressing not just what can be queried, but which combinations are valid. This is critical for AI-agent use cases where the consumer cannot reason about data semantics and relies on clear error signals to self-correct.