Skip to content

feat(ontology): add PGE Evaluator stage to owl-generator#83

Open
FiifiB wants to merge 1 commit into
databrickslabs:masterfrom
FiifiB:feat/agents-ontology-pge
Open

feat(ontology): add PGE Evaluator stage to owl-generator#83
FiifiB wants to merge 1 commit into
databrickslabs:masterfrom
FiifiB:feat/agents-ontology-pge

Conversation

@FiifiB

@FiifiB FiifiB commented Jun 25, 2026

Copy link
Copy Markdown

Summary

Turns OWL ontology generation into a real Planner→Generator→Evaluator (PGE)
loop. Today agent_owl_generator does single-shot generation plus a pitfall-tool
fix loop, but has no deterministic Evaluator — so hard structural defects can
survive into the delivered ontology. This PR adds a Stage-1 evaluator that scores
the generated ontology against the source metadata and feeds concrete retry-hints
back to the generator on Tier-1 defects, bounded by a hard cap.

This is the first of two independent PGE PRs (ontology generation; entity/
relationship mapping). It is additive and self-contained.

What changed

  • agent_owl_generator/engine.py
    • _evaluate_ontology_stage() — parses the Turtle, runs deterministic Tier-1
      checks (orphan classes, dangling rdfs:domain/rdfs:range, naming violations,
      duplicate classes) and returns a retry-hint on hard defects. Fails open
      any parse/dependency error returns None, so the evaluator never blocks
      delivery.
    • Evaluator wired into the loop after the pitfall loop, bounded by
      MAX_OWL_EVAL_ROUNDS, and only retries while an iteration remains so a usable
      ontology is never discarded by exhausting MAX_ITERATIONS.
    • MAX_OUTPUT_TOKENS = 16000 so exhaustive attribute coverage isn't silently
      truncated past the old 4096 ceiling.
    • Stronger prompt: # ATTRIBUTE COVERAGE section + get_table_detail-per-table
      workflow step → exhaustive (not curated) datatype-property coverage.
  • New agents/pge_eval/ slice (gold-free, intrinsic, computed from the
    ontology + source schema only): normalize.py + ontology_metrics.evaluate_ontology.
    The package root is intentionally minimal; the full scorecard/CLI lands in a
    separate change, so importers depend on the concrete submodule.

Testing

uv run pytest tests/units/pge_eval/test_ontology_metrics.py tests/units/pge_eval/test_owl_evaluator_stage.py tests/units/ontology/test_owl_generator.py -q39 passed.

Broader sanity: tests/units/{agents,ontology,pge_eval}565 passed, 11 skipped.

This pull request and its description were written by Isaac.

Turn owl-generation into a real Planner→Generator→Evaluator loop. After the
pitfall-tool fix loop settles, a deterministic Stage-1 evaluator scores the
ontology against source metadata and feeds concrete retry-hints back to the
generator on Tier-1 structural defects (orphan classes, dangling domain/range,
naming violations, duplicate classes), bounded by MAX_OWL_EVAL_ROUNDS.

- engine.py: _evaluate_ontology_stage() + loop wiring (fails open; never
  discards a usable ontology), MAX_OUTPUT_TOKENS=16000, exhaustive
  ATTRIBUTE COVERAGE prompt + get_table_detail workflow step.
- New agents/pge_eval slice: normalize.py + ontology_metrics.evaluate_ontology
  (gold-free, intrinsic; minimal package root to avoid coupling).
- Tests: ontology_metrics + owl_evaluator_stage (39 targeted, 565 unit green).

Co-authored-by: Isaac
@FiifiB FiifiB requested a review from a team as a code owner June 25, 2026 11:20
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Fiifi Botchway seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants