Skip to content

feat(mapping): add PGE loop for entity/relationship mapping#84

Open
FiifiB wants to merge 2 commits into
databrickslabs:masterfrom
FiifiB:feat/agents-mapping-pge
Open

feat(mapping): add PGE loop for entity/relationship mapping#84
FiifiB wants to merge 2 commits into
databrickslabs:masterfrom
FiifiB:feat/agents-mapping-pge

Conversation

@FiifiB

@FiifiB FiifiB commented Jun 25, 2026

Copy link
Copy Markdown

Summary

Introduces agent_mapping_pge, a Planner→Generator→Evaluator (PGE) engine
for entity/relationship mapping, as an additive alternative to the existing
single-agent agent_auto_assignment. The PGE engine plans a source-model,
generates entity and relationship SQL per ontology item, and gates each with a
deterministic evaluator plus an independent semantic critic — replacing the
"implementer marks its own homework" pattern with separation of creator and critic.

This is the second of two independent PGE PRs (ontology generation;
entity/relationship mapping) and is self-contained.

Why additive (both engines retained)

agent_auto_assignment is kept and still reachable via
AgentClient.run_auto_assignment. A new AgentClient.run_mapping_pge gateway
exposes the PGE engine. This lets a downstream orchestrator choose which engine
to run based on source/ontology complexity (a follow-up change).

What changed

  • NEW agents/agent_mapping_pge/ — Planner (planner.py), generators
    (generators/{entity,relationship}.py), evaluator
    (evaluator/{deterministic,critic,report}.py), engine orchestrator
    (engine.py), contracts.py, and coverage.py.
    • Engine-enforced coverage: computed from the ontology, not LLM discretion;
      skip[] is advisory and never removes an item.
    • Abstract-superclass UNION derivation + synthetic-endpoint fallback so
      a single failed hub entity can't cascade to drop all relationships.
    • Bounded ThreadPool walk with monotonic progress.
  • NEW agents/tools/{planner,evaluation}.py — planner/evaluation terminal
    tools (submit_source_model, submit_evaluation, normalized_value_overlap).
  • agents/tools/context.py — adds source_model + semantic_eval_report
    (forward-ref typed); warehouse_id and all existing fields preserved.
  • back/objects/mapping/Mapping.py — runs the PGE engine in the auto-assign
    flow and accumulates PGE extras (source_model, mapping_evaluations,
    mapping_run_log); save_mappings_to_session gains three optional params
    (default None, so the legacy path is unaffected). The upstream
    _canonicalize_imported_uris helper is preserved.

Testing

  • uv run pytest tests/agents/agent_mapping_pge -q90 passed.
  • uv run pytest tests/units/agents tests/units/mapping -q208 passed.
  • New-package imports resolve on the upstream base (v0.5.2).

This pull request and its description were written by Isaac.

Introduce agent_mapping_pge — a Planner→Generator→Evaluator mapping engine —
additively. Plans a source-model, generates entity + relationship SQL per
ontology item, and gates each with a deterministic evaluator + a semantic
critic. Coverage is engine-enforced from the ontology (abstract-superclass
UNION derivation + synthetic-endpoint fallback so one failed hub can't drop
all relationships).

Additive: agent_auto_assignment is retained and still reachable via
AgentClient.run_auto_assignment; a new AgentClient.run_mapping_pge gateway
exposes the PGE engine, so an orchestrator can choose between them. Upstream
features preserved (ToolContext.warehouse_id, Mapping._canonicalize_imported_uris).

- NEW agent_mapping_pge package + tools/{planner,evaluation}.py
- context.py: +source_model/+semantic_eval_report fields (warehouse_id kept)
- Mapping.py: run PGE + accumulate source_model/evaluations/run_log;
  save_mappings_to_session gains 3 optional params (legacy path unaffected)
- Tests: 90 in tests/agents/agent_mapping_pge; 208 across units/{agents,mapping}

Co-authored-by: Isaac
@FiifiB FiifiB requested a review from a team as a code owner June 25, 2026 11:32
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Fiifi Botchway seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Satisfy the AI-feature lifecycle gate for the new agent: documented contract
(purpose, tool surface, eval dimensions, failure modes) + a 20-example baseline
eval dataset spanning single-source, multi-source cross-trust reconciliation,
and degenerate inputs.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants