Skip to content

Feature Request: Native Query Validation on SemanticModel #258

@Luiscri

Description

@Luiscri

Problem

There is no mechanism in boring-semantic-layer to express constraints between dimensions and measures at the model definition level. When a semantic model is exposed to an LLM agent (e.g. via MCP), the agent can construct queries that are syntactically valid (correct dimension/measure names) but semantically invalid, producing silently wrong results instead of a clear error.

This is particularly dangerous in AI-agent scenarios because:

  1. Silent failures produce wrong data: the query executes successfully but returns incorrect numbers (e.g. duplicated sums). The LLM has no way to detect this and presents wrong results to the user with full confidence.
  2. Error messages enable self-correction: if the model raises a clear ValueError explaining why the query is invalid and what to do instead, the LLM can reformulate its query and retry successfully.

Our Use Case

We expose BSL semantic models via a Model Context Protocol (MCP) server to LLM agents. Our data has a structural constraint: sales figures are duplicated across categories rows in the source table. Querying measures without pinning the right dimensions produces inflated numbers.

We need to tell the model: "if the agent requests these measures, it MUST include (or filter by) these dimensions; otherwise reject the query with a helpful error."

Current Workaround

We monkey-patch a with_validator() method onto SemanticModel:

from boring_semantic_layer import SemanticModel

def _with_validator(self, validator):
    object.__setattr__(self, "_validate_query", validator)
    return self

SemanticModel.with_validator = _with_validator

The key design choice: constraint logic lives in the model definition, and the execution layer simply calls whatever validator is attached. This keeps model definitions self-contained and declarative.

Model definition (where constraints are declared)

from boring_semantic_layer import to_semantic_table
from boring_semantic_layer.ops import Dimension, Measure

_TOTAL_MEASURES = frozenset({
    "total_sales_eur",
    "total_transaction_count",
    "fs_share_of_sales",  # Uses total in denominator
})

def _dimension_is_pinned(dim, dimensions, filters):
    """Check if a dimension is in GROUP BY or filtered to one value."""
    if dim in dimensions:
        return True
    eq_filters = [f for f in (filters or []) if f.get("field") == dim and f.get("operator") in ("=", "==")]
    return len(eq_filters) == 1

def _make_total_constraint_validator(model_name, required_dims):
    """Create a validator enforcing dimension requirements for total measures."""
    def validator(dimensions, measures, filters):
        requested_total = set(measures) & _TOTAL_MEASURES
        if not requested_total:
            return
        for dim in required_dims:
            if not _dimension_is_pinned(dim, dimensions, filters):
                raise ValueError(
                    f"Query validation failed for model '{model_name}': "
                    f"total measures ({sorted(requested_total)}) require dimension '{dim}' "
                    f"to be either included in dimensions (GROUP BY) or filtered to "
                    f"exactly one value. This prevents incorrect aggregation of "
                    f"duplicated data."
                )
    return validator

# Model definition (constraint is co-located with the model)
model = (
    to_semantic_table(filtered_table, name="fs_sales_product_hfb", description="...")
    .with_dimensions(
        event_date=Dimension(...),
        hfb_no=Dimension(...),
        fs_product_int_sk=Dimension(...),
    )
    .with_measures(
        total_sales_eur=Measure(...),
        fs_sales_eur=Measure(...),
    )
    .with_validator(
        _make_total_constraint_validator("fs_sales_product_hfb", ["hfb_no", "fs_product_int_sk"])
    )
)

Query execution (where validation is invoked)

def query_model(self, model, dimensions, measures, filters=None):
    # ... name/type validation ...

    # Call model-defined validator (if any)
    validator = getattr(model, "_validate_query", None)
    if validator is not None:
        validator(dimensions, measures, filters)  # raises ValueError on bad queries

    # ... execute query ...

What This Achieves

Without validation With validation
Agent queries total_sales_eur without filtering by product -> gets 5x inflated number Agent gets: "total measures require 'fs_product_int_sk' to be filtered. Use dimension values endpoint to find valid IDs."
Agent presents wrong data to user with confidence Agent retries with correct filter -> returns accurate data
No way to detect the error after the fact Clear, actionable error at query time

Proposed Native API

A with_validator() (or with_constraints()) method on SemanticModel that:

  1. Accepts a callable (dimensions: list[str], measures: list[str], filters: list[dict] | None) -> None that raises ValueError on invalid queries.
  2. Stores it on the model (respecting immutability via the same pattern as other with_* methods).
  3. Optionally exposes it in json_definition so external tools can introspect constraints without executing them.
# Proposed API
model = (
    to_semantic_table(table, name="my_model")
    .with_dimensions(...)
    .with_measures(...)
    .with_validator(my_validator_fn)
)

# Access
model.validate_query(dimensions=["country"], measures=["sales"], filters=[...])
# or
model.json_definition["validator"]  # metadata about constraints (optional)

Why a callable (not declarative constraints)

We considered a declarative approach (e.g. measure_requires_dimensions={"total_sales": ["product_id"]}), but real-world constraints are more nuanced:

  • A dimension can be "pinned" either by being in GROUP BY or by being filtered to exactly one value.
  • Some constraints only apply to subsets of measures.
  • Future constraints may involve filter value combinations.

A callable gives model authors full flexibility while keeping the execution layer generic. A declarative format could be added later as syntactic sugar that generates a validator callable internally.

Summary

This feature would let BSL models be self-validating, expressing not just what can be queried, but which combinations are valid. This is critical for AI-agent use cases where the consumer cannot reason about data semantics and relies on clear error signals to self-correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions