Skip to content

Add validation rules for semantic model names, relationship columns, extensions, and dialects#97

Open
kevinwangz wants to merge 1 commit into
mainfrom
kwang-add-more-validation
Open

Add validation rules for semantic model names, relationship columns, extensions, and dialects#97
kevinwangz wants to merge 1 commit into
mainfrom
kwang-add-more-validation

Conversation

@kevinwangz
Copy link
Copy Markdown
Collaborator

  • Check for duplicate semantic model names across the file
  • Validate that from_columns and to_columns have equal length in relationships
  • Validate that custom extension 'data' fields contain valid JSON, checking all levels (model, dataset, field, metric, relationship)
  • Detect duplicate dialect entries within a single expression's dialects array for both fields and metrics

…extensions, and dialects

- Check for duplicate semantic model names across the file
- Validate that from_columns and to_columns have equal length in
  relationships
- Validate that custom extension 'data' fields contain valid JSON,
  checking all levels (model, dataset, field, metric, relationship)
- Detect duplicate dialect entries within a single expression's
  dialects array for both fields and metrics
@kevinwangz kevinwangz requested a review from khush-bhatia March 24, 2026 16:03
@jbonofre jbonofre self-requested a review March 25, 2026 12:20
@khush-bhatia
Copy link
Copy Markdown
Member

@kevinwangz Thanks for this PR. But I realized that the original validation script is missing tests. Do you mind bootstrapping the tests for this code ? Else, I am happy to add the tests for this.

Copy link
Copy Markdown
Collaborator

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good, I just have a question about json.loads semantics.

Comment thread validation/validate.py
ext_data = ext.get("data")
if ext_data is not None:
try:
json.loads(ext_data)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data field from YAML will already be parsed as a Python object (dist/list/int), not a JSON string.

json.loads() expects a string. If someone writes:

custom_extensions:
  - vendor_name: acme
     data: '{"key": "value"}'              # string - works
  - vendor_name: acme
     data:                                            # YAML dict - TypeError from json.loads
         key: value

The TypeError is caught, but the error message would be misleading. It would say "invalid JSON" when really the data is valid structured data, just not a JSON string.

The validation should clarify what it actually expects: is data supposed to be JSON-encoded string, or a structured object ?

If either is acceptable, the check should only run json.loads when ext_data is a str.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants