Feature Request: Native Query Validation on SemanticModel

## Problem

There is no mechanism in `boring-semantic-layer` to express **constraints between dimensions and measures** at the model definition level. When a semantic model is exposed to an LLM agent (e.g. via MCP), the agent can construct queries that are syntactically valid (correct dimension/measure names) but **semantically invalid**, producing silently wrong results instead of a clear error.

This is particularly dangerous in AI-agent scenarios because:

1. **Silent failures produce wrong data**: the query executes successfully but returns incorrect numbers (e.g. duplicated sums). The LLM has no way to detect this and presents wrong results to the user with full confidence.
2. **Error messages enable self-correction**: if the model raises a clear `ValueError` explaining *why* the query is invalid and *what to do instead*, the LLM can reformulate its query and retry successfully.

## Our Use Case

We expose BSL semantic models via a Model Context Protocol (MCP) server to LLM agents. Our data has a structural constraint: **sales figures are duplicated across categories rows** in the source table. Querying measures without pinning the right dimensions produces inflated numbers.

We need to tell the model: *"if the agent requests these measures, it MUST include (or filter by) these dimensions; otherwise reject the query with a helpful error."*

## Current Workaround

We monkey-patch a `with_validator()` method onto `SemanticModel`:

```python
from boring_semantic_layer import SemanticModel

def _with_validator(self, validator):
    object.__setattr__(self, "_validate_query", validator)
    return self

SemanticModel.with_validator = _with_validator
```

The key design choice: **constraint logic lives in the model definition**, and the execution layer simply calls whatever validator is attached. This keeps model definitions self-contained and declarative.

### Model definition (where constraints are declared)

```python
from boring_semantic_layer import to_semantic_table
from boring_semantic_layer.ops import Dimension, Measure

_TOTAL_MEASURES = frozenset({
    "total_sales_eur",
    "total_transaction_count",
    "fs_share_of_sales",  # Uses total in denominator
})

def _dimension_is_pinned(dim, dimensions, filters):
    """Check if a dimension is in GROUP BY or filtered to one value."""
    if dim in dimensions:
        return True
    eq_filters = [f for f in (filters or []) if f.get("field") == dim and f.get("operator") in ("=", "==")]
    return len(eq_filters) == 1

def _make_total_constraint_validator(model_name, required_dims):
    """Create a validator enforcing dimension requirements for total measures."""
    def validator(dimensions, measures, filters):
        requested_total = set(measures) & _TOTAL_MEASURES
        if not requested_total:
            return
        for dim in required_dims:
            if not _dimension_is_pinned(dim, dimensions, filters):
                raise ValueError(
                    f"Query validation failed for model '{model_name}': "
                    f"total measures ({sorted(requested_total)}) require dimension '{dim}' "
                    f"to be either included in dimensions (GROUP BY) or filtered to "
                    f"exactly one value. This prevents incorrect aggregation of "
                    f"duplicated data."
                )
    return validator

# Model definition (constraint is co-located with the model)
model = (
    to_semantic_table(filtered_table, name="fs_sales_product_hfb", description="...")
    .with_dimensions(
        event_date=Dimension(...),
        hfb_no=Dimension(...),
        fs_product_int_sk=Dimension(...),
    )
    .with_measures(
        total_sales_eur=Measure(...),
        fs_sales_eur=Measure(...),
    )
    .with_validator(
        _make_total_constraint_validator("fs_sales_product_hfb", ["hfb_no", "fs_product_int_sk"])
    )
)
```

### Query execution (where validation is invoked)

```python
def query_model(self, model, dimensions, measures, filters=None):
    # ... name/type validation ...

    # Call model-defined validator (if any)
    validator = getattr(model, "_validate_query", None)
    if validator is not None:
        validator(dimensions, measures, filters)  # raises ValueError on bad queries

    # ... execute query ...
```

## What This Achieves

| Without validation | With validation |
|---|---|
| Agent queries `total_sales_eur` without filtering by product -> gets 5x inflated number | Agent gets: `"total measures require 'fs_product_int_sk' to be filtered. Use dimension values endpoint to find valid IDs."` |
| Agent presents wrong data to user with confidence | Agent retries with correct filter -> returns accurate data |
| No way to detect the error after the fact | Clear, actionable error at query time |

## Proposed Native API

A `with_validator()` (or `with_constraints()`) method on `SemanticModel` that:

1. Accepts a callable `(dimensions: list[str], measures: list[str], filters: list[dict] | None) -> None` that raises `ValueError` on invalid queries.
2. Stores it on the model (respecting immutability via the same pattern as other `with_*` methods).
3. Optionally exposes it in `json_definition` so external tools can introspect constraints without executing them.

```python
# Proposed API
model = (
    to_semantic_table(table, name="my_model")
    .with_dimensions(...)
    .with_measures(...)
    .with_validator(my_validator_fn)
)

# Access
model.validate_query(dimensions=["country"], measures=["sales"], filters=[...])
# or
model.json_definition["validator"]  # metadata about constraints (optional)
```

### Why a callable (not declarative constraints)

We considered a declarative approach (e.g. `measure_requires_dimensions={"total_sales": ["product_id"]}`), but real-world constraints are more nuanced:

- A dimension can be "pinned" either by being in GROUP BY **or** by being filtered to exactly one value.
- Some constraints only apply to subsets of measures.
- Future constraints may involve filter value combinations.

A callable gives model authors full flexibility while keeping the execution layer generic. A declarative format could be added later as syntactic sugar that generates a validator callable internally.

## Summary

This feature would let BSL models be **self-validating**, expressing not just *what* can be queried, but *which combinations are valid*. This is critical for AI-agent use cases where the consumer cannot reason about data semantics and relies on clear error signals to self-correct.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Native Query Validation on SemanticModel #258

Problem

Our Use Case

Current Workaround

Model definition (where constraints are declared)

Query execution (where validation is invoked)

What This Achieves

Proposed Native API

Why a callable (not declarative constraints)

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Without validation	With validation
Agent queries `total_sales_eur` without filtering by product -> gets 5x inflated number	Agent gets: `"total measures require 'fs_product_int_sk' to be filtered. Use dimension values endpoint to find valid IDs."`
Agent presents wrong data to user with confidence	Agent retries with correct filter -> returns accurate data
No way to detect the error after the fact	Clear, actionable error at query time

Feature Request: Native Query Validation on SemanticModel #258

Description

Problem

Our Use Case

Current Workaround

Model definition (where constraints are declared)

Query execution (where validation is invoked)

What This Achieves

Proposed Native API

Why a callable (not declarative constraints)

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions