Add validation rules for semantic model names, relationship columns, extensions, and dialects#97
Add validation rules for semantic model names, relationship columns, extensions, and dialects#97kevinwangz wants to merge 1 commit into
Conversation
kevinwangz
commented
Mar 24, 2026
- Check for duplicate semantic model names across the file
- Validate that from_columns and to_columns have equal length in relationships
- Validate that custom extension 'data' fields contain valid JSON, checking all levels (model, dataset, field, metric, relationship)
- Detect duplicate dialect entries within a single expression's dialects array for both fields and metrics
…extensions, and dialects - Check for duplicate semantic model names across the file - Validate that from_columns and to_columns have equal length in relationships - Validate that custom extension 'data' fields contain valid JSON, checking all levels (model, dataset, field, metric, relationship) - Detect duplicate dialect entries within a single expression's dialects array for both fields and metrics
|
@kevinwangz Thanks for this PR. But I realized that the original validation script is missing tests. Do you mind bootstrapping the tests for this code ? Else, I am happy to add the tests for this. |
jbonofre
left a comment
There was a problem hiding this comment.
Overall good, I just have a question about json.loads semantics.
| ext_data = ext.get("data") | ||
| if ext_data is not None: | ||
| try: | ||
| json.loads(ext_data) |
There was a problem hiding this comment.
The data field from YAML will already be parsed as a Python object (dist/list/int), not a JSON string.
json.loads() expects a string. If someone writes:
custom_extensions:
- vendor_name: acme
data: '{"key": "value"}' # string - works
- vendor_name: acme
data: # YAML dict - TypeError from json.loads
key: value
The TypeError is caught, but the error message would be misleading. It would say "invalid JSON" when really the data is valid structured data, just not a JSON string.
The validation should clarify what it actually expects: is data supposed to be JSON-encoded string, or a structured object ?
If either is acceptable, the check should only run json.loads when ext_data is a str.