Skip to content

Add bidirectional Honeydew ↔ OSI converter#136

Open
baruchoxman wants to merge 13 commits into
open-semantic-interchange:mainfrom
honeydew-ai:honeydew-converter
Open

Add bidirectional Honeydew ↔ OSI converter#136
baruchoxman wants to merge 13 commits into
open-semantic-interchange:mainfrom
honeydew-ai:honeydew-converter

Conversation

@baruchoxman
Copy link
Copy Markdown

@baruchoxman baruchoxman commented May 27, 2026

Summary

  • Adds a new converters/honeydew/ package with a bidirectional converter between OSI semantic model YAML and Honeydew workspace directories
  • Supports both directions: osi-to-honeydew and honeydew-to-osi via a CLI and Python API
  • Full round-trip fidelity: OSI-only fields (ai_context, unique_keys, non-Honeydew custom_extensions) are stored in Honeydew metadata; Honeydew-native fields (owner, display_name, hidden, format_string, timegrain, labels) are preserved in a HONEYDEW custom extension

What's included

  • src/honeydew_osi_converter.py — ~1000-line converter covering datasets, fields, calculated attributes, metrics, relationships, and workspace-level metadata
  • tests/test_honeydew_osi_converter.py — 127 parametrized tests with full-dict output verification (no cherry-picked field assertions)
  • pyproject.toml — package config, Python 3.12+ requirement, pytest setup
  • README.md — mapping tables, usage, limitations

Test plan

  • python -m pytest tests/ passes (127 tests, ~1s)
  • python src/honeydew_osi_converter.py osi-to-honeydew -i <file> -o <dir> produces a valid Honeydew workspace
  • python src/honeydew_osi_converter.py honeydew-to-osi -i <dir> -o <file> produces a valid OSI YAML
  • Round-trip tested against the TPC-DS example in examples/tpcds_semantic_model.yaml — 5 datasets, 31 fields, 4 relationships, 5 metrics round-trip with zero semantic differences

baruchoxman and others added 8 commits May 27, 2026 15:42
Implements OSI → Honeydew and Honeydew → OSI conversion with full
round-trip fidelity. Relationship names are stored natively on the
relation, and relations are always placed on the many side.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix empty-string expression bypassing None guard
- Restore calculated_attribute routing via HONEYDEW type hint
- Preserve bool datatype through OSI round-trip
- Store string metric ai_context in osi metadata for recovery
- Warn on duplicate metric names instead of silently overwriting
- Restore connection_expr from HONEYDEW custom_extension on OSI→Honeydew
- Warn on malformed JSON in _read_osi_metadata instead of silent drop
- Remove filter limitation from README (not applicable)
- Update Honeydew docs link to honeydew.ai/docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssion guards

- Restore Honeydew-specific fields (display_name, hidden, format_string,
  timegrain, owner, folder) in OSI→Honeydew direction by reading them back
  from the HONEYDEW custom_extension on both entity and attribute objects
- Add path traversal guard in main() using os.path.normpath + startswith check
- Guard against whitespace-only expressions (not expr or not expr.strip())
- Warn when field expression is a non-dict value instead of silently dropping it
- Simplify elif not from_cols: to else: in _osi_relation_to_honeydew
- Strengthen test_ai_context_string_preserved to assert the string value is
  recoverable in description
- Add 5 new tests: display_name/format round-trip, calc attr Honeydew fields,
  entity owner, whitespace expression skipped, non-dict expression warns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite entire test file from classes to module-level pytest functions;
  parametrize repeated cases (is_simple_identifier, parse_osi_source,
  field datatypes, entity honeydew fields, dataset/calc attr fields,
  empty/whitespace expressions, path traversal guard, check_safe_path)
- Extract _check_safe_path into a named helper in the converter (was
  inlined in main()) so path traversal logic is independently testable
- Add parametrized test_check_safe_path covering ../evil.yml and
  ../../etc/passwd rejection and legitimate nested paths
- Add test_empty_or_whitespace_metric_expression_skipped to cover the
  OSI→Honeydew whitespace guard on metrics (was previously untested)
- Parametrize entity/attr/calc Honeydew field round-trip tests
  (owner, display_name, hidden, folder each verified independently)
- Update _write_workspace helper to pass through entity-level fields
  (owner, display_name, hidden, folder) so round-trip tests use the
  standard _honeydew_roundtrip() helper instead of manual workspace setup
- Promote _HD_ATTR_KEYS tuple to module-level constant (was defined
  inside the field loop on every iteration)
- Drop from __future__ import annotations — requires Python 3.12+
- Add pyproject.toml with requires-python = ">=3.12"
- Add Python 3.12+ requirement section to README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove requirements.txt (superseded by pyproject.toml)
- Extract _fields_to_honeydew() from _dataset_to_files(): field
  classification is now a named, independently testable function; four
  direct unit tests added
- Warn when a relationship has neither from_columns nor connection_expr
  so callers discover the incomplete join before it reaches Honeydew
- Round-trip the vendors list: non-HONEYDEW vendors are stored in the
  workspace osi metadata on OSI→Honeydew and merged back on the return
  trip (HONEYDEW always appears first)
- Add main() CLI smoke tests via subprocess: osi-to-honeydew writes the
  expected workspace.yml, honeydew-to-osi writes a parseable OSI YAML,
  and path traversal in an entity name is rejected with exit code 1
- Update README setup instructions to use pip install . / pip install -e .

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collapse 13 individual OSI→Honeydew tests, 7 Honeydew→OSI tests,
10 OSI round-trip tests, and 13 Honeydew round-trip tests into four
@pytest.mark.parametrize blocks. Every assertion now compares the
entire output dict rather than cherry-picked fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Docstring: relationship name is mapped directly to Honeydew's relation
  name field (not osi metadata); ai_context is mapped natively to
  description/labels/AI metadata, not treated as having no equivalent
- README mapping table: rename rows to use Honeydew's canonical terms
  (Source Attribute, Calculated Attribute) and add missing ai_context row
- README limitations: rewrite the confusing "One dataset per entity" bullet
  to clarify that OSI dataset = one table/query, Honeydew supports multiple
  dataset files per entity but the converter generates exactly one

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest to add HONEYDEW in the list of supported vendors (in the converters/index.md and core-spec).

Overall good, but worth to clarify the comments.

Comment thread converters/honeydew/src/honeydew_osi_converter.py
Comment thread converters/honeydew/src/honeydew_osi_converter.py
baruchoxman and others added 5 commits June 2, 2026 09:52
- Store OSI `label` in osi metadata when field has label but no dict
  ai_context, so the Honeydew→OSI path can distinguish OSI-originated
  labels from native Honeydew labels and avoid injecting spurious
  ai_context.synonyms on round-trip
- Add `label` parameter to `_build_osi_metadata` and corresponding
  read support in `_read_osi_metadata`
- Warn when an entity has more than one dataset file during
  Honeydew→OSI conversion (only the primary dataset is converted)
- Update test expectations to reflect correct idempotent behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nonyms

When a field has ai_context.synonyms but no label, the osi metadata
section contains ai_context but no label key. Guard the label fallback
on the absence of osi_meta ai_context so that synonyms don't get
promoted to OSI label on round-trip.

Verified with TPC-DS: zero semantic differences over OSI→HD→OSI→HD.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants