A configurable, deterministic similarity score for structured (JSON-like) data โ and a drop-in, model-free reward for LLM prompt optimization.
LLMs are increasingly asked to emit JSON conforming to a fixed schema โ for information extraction, tool calling, agentic planning, and knowledge-graph construction. Measuring how close such an output is to a gold reference is awkward: exact match is brittle, text similarity ignores structure, and an LLM judge โ powerful and flexible, but costlier to run and harder to reproduce โ is not always the right fit when you need a fast, deterministic, auditable score. Object Aligner offers a complementary alternative for that case.
Object Aligner (OA) scores two JSON objects by recursively aligning their trees โ the Hungarian algorithm for unordered collections, sequence alignment for ordered ones โ and awarding partial credit at the granularity the schema declares. It is configured entirely through a compact set of JSON Schema extensions, so adapting it to a new task means annotating a schema, not writing code.
A primary use is prompt optimization: OA's deterministic, decomposable score makes a ready reward signal for optimizers such as GEPA or DSPy โ and because the same alignment localizes every mismatch, it also emits ranked, natural-language feedback for their reflection slots, with no extra model call.
- ๐ณ Schema-driven recursive alignment โ one deterministic score in
[0, 1], with partial credit at every node, for arbitrarily nested objects, lists, and primitives. - ๐ธ๏ธ Referential alignment for (hyper)graphs โ score cross-referenced records up to identifier renumbering. OA infers a bijection between gold and candidate ids and scores every reference through it.
- ๐ข Per-list sequence semantics โ choose, per list, between order-agnostic matching, an order-sensitive monotone regime (insertions/deletions) for ranking & planning, and positional tuples / prefixes whose slots carry position-specific meaning.
- ๐งญ Deterministic ranked feedback โ the same alignment that produces the score also pinpoints where the candidate departs from gold and emits ranked repair operations, scored by the exact amount of score each recovers โ no LLM call.
- ๐ Drop-in optimizer reward โ plug OA into prompt optimizers like DSPy or GEPA as a reproducible, auditable, model-free reward (and reflection signal).
- ๐งฌ Semantic string similarity (optional extension) โ score text fields by meaning rather than character overlap, using OpenAI (or any OpenAI-compatible) embeddings, with built-in caching and batching.
- ๐งฎ Deterministic & decomposable โ same inputs โ same number; the top-level score is an explicit weighted aggregate of child scores, which is what makes attribution and feedback exact.
Not on PyPI yet โ install straight from GitHub (like other
AIC tools, e.g.
aic-nlp-utils):
pip install git+https://github.com/aic-factcheck/object_aligner.gitOr with uv:
uv add git+https://github.com/aic-factcheck/object_aligner.gitOptional extras (embedding-based semantic string similarity via an OpenAI-compatible API):
pip install "object-aligner[semantic-openai] @ git+https://github.com/aic-factcheck/object_aligner.git"Requires Python 3.13+.
from object_aligner import ObjectAligner
schema = {"type": "string", "score": "jaro"}
aligner = ObjectAligner(schema)
print(aligner.metric("hello", "hallo")) # {'score': 0.8667}Score a nested object and ask for human-readable feedback in one call:
result = aligner.metric(gold, pred, generate_feedback=True)
print(result["score"])
print(result["feedback"]) # ranked, prescriptive fix list โ deterministic, no LLMOA takes a gold object g, a candidate object p, and a schema S, and returns
score(g, p | S) โ [0, 1]. Scoring at every internal node runs in two phases:
- Alignment โ fix a correspondence between the children of
gandp(Hungarian assignment for unordered collections / maps; a sequence-alignment dynamic program for ordered lists). - Scoring โ aggregate the per-pair child scores over that correspondence into a single number, weighted as the schema declares.
Both branches recurse, so any nesting depth works naturally. Primitives are scored directly
by a configurable comparator; empty values (null/None) are handled explicitly.
| Area | What you get |
|---|---|
| ๐ค Primitives | Strings (exact, jaro, jaro_winkler, levenshtein, damerau_levenshtein, osa, indel, lcsseq), numbers (exact, invdiff, relative), per-field thresholds, and custom metric callables. See primitives. |
| ๐ Lists & sequences | order:"fixed" (positional), order:"align" (order-agnostic Hungarian), monotone order-sensitive alignment, prefixItems/prefixWeights tuples, and ignoreExcess/ignoreMissing. See lists. |
| ๐๏ธ Maps / objects | Keys matched by label only (Hungarian), then values graded recursively; tune with keyImportance, valueImportance, valueWeight. See dicts. |
| ๐ธ๏ธ Referential alignment | idScope / ref declare primary/foreign-key-style links; OA scores references invariant to id relabeling, with 1-WL tie-breaking for property-identical twins. See referential. |
| ๐งญ Feedback | feedback() โ top-K ranked repair string for optimizer reflection slots (GEPA/DSPy/TextGrad). See feedback. |
| ๐ฉน Attribution & repair | attribute() decomposes the deficit into ranked per-path contributions; repair() emits RFC-6902-style ops with exact score deltas and apply_to(). See attribution, repair. |
| ๐ฃ๏ธ Describe | describe() โ deterministic plain-English walk of the alignment tree. See describe. |
| ๐ณ Null handling | Per-field nullScore for asymmetric null/value mismatches. See null handling. |
| ๐ Confidence | Opt-in per-pair stability scores harvested from each Hungarian matrix. See confidence. |
| ๐งฌ Semantic similarity | Opt-in embedding-based string metric with caching, batching, and OpenAI-compatible transport. See semantic. |
Complex structured data is rarely a flat tree: cross-references between records make it a
graph or hypergraph, which no prior similarity metric scores once identifiers are
arbitrary. Mark one primitive as an identifier (idScope) and others as references (ref):
schema = {
"type": "object",
"properties": {
"people": {
"type": "array", "order": "align",
"items": {"type": "object", "properties": {
"id": {"type": "integer", "idScope": "person"},
"name": {"type": "string", "score": "exact", "valueWeight": 2.0},
"role": {"type": "string", "score": "exact"},
}},
},
"mentorships": {
"type": "array", "order": "align", "ignoreExcess": True,
"items": {"type": "object", "properties": {
"mentor": {"type": "integer", "ref": "person"},
"mentee": {"type": "integer", "ref": "person"},
}},
},
},
}OA infers the goldโcandidate id bijection (by everything except the masked id field), breaks remaining ties by graph structure with WeisfeilerโLeman color refinement, and scores every reference through the bijection โ so two correct extractions that renumber and reorder their records still match. Recovering the bijection exactly is graph isomorphism, which OA approximates in near-linear time.
OA is a deterministic, decomposable structural reward โ cheap to evaluate at scale, reproducible, and easy to audit. It complements LLM-as-judge rewards: use a judge for open-ended semantic grading, and OA when the answer has a known schema (the two can also be combined). Used as the reward inside GEPA across synthetic and real-world datasets, OA produced consistent gains and never a significant loss โ and the same alignment supplies the natural-language reflection signal, so one call returns both how well a candidate did and what to change.
OA is configured with a small set of keywords layered on top of JSON Schema:
| Keyword | Applies to | Purpose |
|---|---|---|
score |
string / number / integer | Leaf comparator (built-in name or custom metric) |
threshold |
string / number / integer | Floor below which a leaf scores 0 |
order |
array | "fixed" (positional) or "align" (order-agnostic) |
ignoreExcess / ignoreMissing |
array | Drop unmatched candidate / gold items from the denominator |
prefixItems / prefixWeights |
array | Positional tuple head with per-slot weights |
keyImportance / valueImportance |
object | Weight of the key term vs. the value term |
valueWeight |
object property | Per-property weight in the value aggregate |
idScope |
primitive (in an array) | Declare an identifier scope (primary key) |
ref |
primitive | Reference into a named scope (foreign key) |
nullScore |
any node | Score for an asymmetric null/value mismatch |
Full reference: docs/schema_reference.md.
Start at docs/index.md. Chapters:
uv sync # install dependencies
uv run pytest # run the test suiteThe repository ships a comprehensive pytest suite under tests/ covering primitives, lists,
dicts, nesting, referential alignment, feedback, repair, attribution, and edge cases.
If you use Object Aligner in academic work, please cite the paper (in preparation):
@misc{drchal2026objectaligner,
title = {Object Aligner: A Configurable JSON Schema Similarity Score for Graphs,
Applied to LLM Prompt Optimization},
author = {Drchal, Jan},
year = {2026},
note = {Reference implementation: https://github.com/aic-factcheck/object_aligner}
}This is a cleaned-up, standalone version of the Object Aligner originally developed as part of the PromptOpt prompt-optimization framework. The original implementation can be found in the first commit of PromptOpt (Dec 20, 2024).