Add bidirectional Honeydew ↔ OSI converter#136
Open
baruchoxman wants to merge 13 commits into
Open
Conversation
Implements OSI → Honeydew and Honeydew → OSI conversion with full round-trip fidelity. Relationship names are stored natively on the relation, and relations are always placed on the many side. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix empty-string expression bypassing None guard - Restore calculated_attribute routing via HONEYDEW type hint - Preserve bool datatype through OSI round-trip - Store string metric ai_context in osi metadata for recovery - Warn on duplicate metric names instead of silently overwriting - Restore connection_expr from HONEYDEW custom_extension on OSI→Honeydew - Warn on malformed JSON in _read_osi_metadata instead of silent drop - Remove filter limitation from README (not applicable) - Update Honeydew docs link to honeydew.ai/docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssion guards - Restore Honeydew-specific fields (display_name, hidden, format_string, timegrain, owner, folder) in OSI→Honeydew direction by reading them back from the HONEYDEW custom_extension on both entity and attribute objects - Add path traversal guard in main() using os.path.normpath + startswith check - Guard against whitespace-only expressions (not expr or not expr.strip()) - Warn when field expression is a non-dict value instead of silently dropping it - Simplify elif not from_cols: to else: in _osi_relation_to_honeydew - Strengthen test_ai_context_string_preserved to assert the string value is recoverable in description - Add 5 new tests: display_name/format round-trip, calc attr Honeydew fields, entity owner, whitespace expression skipped, non-dict expression warns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite entire test file from classes to module-level pytest functions; parametrize repeated cases (is_simple_identifier, parse_osi_source, field datatypes, entity honeydew fields, dataset/calc attr fields, empty/whitespace expressions, path traversal guard, check_safe_path) - Extract _check_safe_path into a named helper in the converter (was inlined in main()) so path traversal logic is independently testable - Add parametrized test_check_safe_path covering ../evil.yml and ../../etc/passwd rejection and legitimate nested paths - Add test_empty_or_whitespace_metric_expression_skipped to cover the OSI→Honeydew whitespace guard on metrics (was previously untested) - Parametrize entity/attr/calc Honeydew field round-trip tests (owner, display_name, hidden, folder each verified independently) - Update _write_workspace helper to pass through entity-level fields (owner, display_name, hidden, folder) so round-trip tests use the standard _honeydew_roundtrip() helper instead of manual workspace setup - Promote _HD_ATTR_KEYS tuple to module-level constant (was defined inside the field loop on every iteration) - Drop from __future__ import annotations — requires Python 3.12+ - Add pyproject.toml with requires-python = ">=3.12" - Add Python 3.12+ requirement section to README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove requirements.txt (superseded by pyproject.toml) - Extract _fields_to_honeydew() from _dataset_to_files(): field classification is now a named, independently testable function; four direct unit tests added - Warn when a relationship has neither from_columns nor connection_expr so callers discover the incomplete join before it reaches Honeydew - Round-trip the vendors list: non-HONEYDEW vendors are stored in the workspace osi metadata on OSI→Honeydew and merged back on the return trip (HONEYDEW always appears first) - Add main() CLI smoke tests via subprocess: osi-to-honeydew writes the expected workspace.yml, honeydew-to-osi writes a parseable OSI YAML, and path traversal in an entity name is rejected with exit code 1 - Update README setup instructions to use pip install . / pip install -e . Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collapse 13 individual OSI→Honeydew tests, 7 Honeydew→OSI tests, 10 OSI round-trip tests, and 13 Honeydew round-trip tests into four @pytest.mark.parametrize blocks. Every assertion now compares the entire output dict rather than cherry-picked fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Docstring: relationship name is mapped directly to Honeydew's relation name field (not osi metadata); ai_context is mapped natively to description/labels/AI metadata, not treated as having no equivalent - README mapping table: rename rows to use Honeydew's canonical terms (Source Attribute, Calculated Attribute) and add missing ai_context row - README limitations: rewrite the confusing "One dataset per entity" bullet to clarify that OSI dataset = one table/query, Honeydew supports multiple dataset files per entity but the converter generates exactly one Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jbonofre
approved these changes
Jun 2, 2026
Collaborator
jbonofre
left a comment
There was a problem hiding this comment.
I also suggest to add HONEYDEW in the list of supported vendors (in the converters/index.md and core-spec).
Overall good, but worth to clarify the comments.
- Store OSI `label` in osi metadata when field has label but no dict ai_context, so the Honeydew→OSI path can distinguish OSI-originated labels from native Honeydew labels and avoid injecting spurious ai_context.synonyms on round-trip - Add `label` parameter to `_build_osi_metadata` and corresponding read support in `_read_osi_metadata` - Warn when an entity has more than one dataset file during Honeydew→OSI conversion (only the primary dataset is converted) - Update test expectations to reflect correct idempotent behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nonyms When a field has ai_context.synonyms but no label, the osi metadata section contains ai_context but no label key. Guard the label fallback on the absence of osi_meta ai_context so that synonyms don't get promoted to OSI label on round-trip. Verified with TPC-DS: zero semantic differences over OSI→HD→OSI→HD. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
converters/honeydew/package with a bidirectional converter between OSI semantic model YAML and Honeydew workspace directoriesosi-to-honeydewandhoneydew-to-osivia a CLI and Python APIai_context,unique_keys, non-Honeydewcustom_extensions) are stored in Honeydewmetadata; Honeydew-native fields (owner,display_name,hidden,format_string,timegrain,labels) are preserved in aHONEYDEWcustom extensionWhat's included
src/honeydew_osi_converter.py— ~1000-line converter covering datasets, fields, calculated attributes, metrics, relationships, and workspace-level metadatatests/test_honeydew_osi_converter.py— 127 parametrized tests with full-dict output verification (no cherry-picked field assertions)pyproject.toml— package config, Python 3.12+ requirement, pytest setupREADME.md— mapping tables, usage, limitationsTest plan
python -m pytest tests/passes (127 tests, ~1s)python src/honeydew_osi_converter.py osi-to-honeydew -i <file> -o <dir>produces a valid Honeydew workspacepython src/honeydew_osi_converter.py honeydew-to-osi -i <dir> -o <file>produces a valid OSI YAMLexamples/tpcds_semantic_model.yaml— 5 datasets, 31 fields, 4 relationships, 5 metrics round-trip with zero semantic differences