Skip to content

Closes: #387 - Schema comparator tool#452

Open
bctiemann wants to merge 11 commits into386-schema-exportfrom
387-schema-comparator
Open

Closes: #387 - Schema comparator tool#452
bctiemann wants to merge 11 commits into386-schema-exportfrom
387-schema-comparator

Conversation

@bctiemann
Copy link
Copy Markdown
Contributor

@bctiemann bctiemann commented Apr 7, 2026

Closes: #387

Implements the COT schema comparator ("diff engine"): a pure-read public API that compares an incoming schema document against the live DB state of each referenced CustomObjectType and returns a structured diff that the upgrade executor (#389, PR #453) can consume.

Public API — comparator.py

diff_cot(type_def)        → COTDiff         # diff one COT definition dict
diff_document(schema_doc) → list[COTDiff]   # diff all COTs in a document

Data model

COTDiff

Attribute Type Description
name str COT name from schema
slug str Lookup key (COT matched by slug)
is_new bool True if COT does not yet exist in DB
cot_changes dict[str, tuple] {attr: (db_val, schema_val)} for top-level COT attribute differences
field_changes list[FieldChange] Per-field operations
warnings list[str] Non-fatal issues

Convenience properties: has_changes, has_destructive_changes, adds, removes, alters.

FieldChange

Attribute Type Description
op FieldOp ADD, REMOVE, or ALTER
schema_id int Stable field identifier
db_name str | None Current DB field name (None for ADD)
schema_def dict Raw schema field dict
changed_attrs dict[str, tuple] {attr: (db_val, schema_val)} — populated for ALTER; includes "name" if renamed, "type" if type differs

Properties: is_rename, is_type_change.

Matching and tombstone rules

  • Fields are matched by schema_id. DB fields without a schema_id are not tracked — they generate a warning, not a REMOVE.
  • A REMOVE is only emitted when the field's schema_id appears in schema.removed_fields (tombstone list). A field absent from both schema.fields and schema.removed_fields is ambiguous and generates a warning.
  • related_object_type values are compared in their encoded schema form ("app_label/model" or "custom-objects/<slug>"), making diffs round-trip compatible with the schema format.

Tests — test_comparator.py (37 tests, 9 classes)

Class What it covers
ComparatorCleanDiffTestCase Export → diff round-trip produces no changes
ComparatorNewCOTTestCase COT absent from DB → all fields are ADD, is_new=True
ComparatorCOTAttrsTestCase Top-level attribute diffs (name, version, description, etc.)
ComparatorFieldAddTestCase Schema field with no matching DB schema_idADD
ComparatorFieldRemoveTestCase DB field tombstoned in schema → REMOVE; untombstoned absence → warning
ComparatorFieldAlterTestCase Rename, type change, scalar attrs, choice_set, related_object_type, related_object_filter
ComparatorWarningsTestCase Untracked DB fields (no schema_id) and non-tombstoned absent fields produce warnings
ComparatorMultiCOTTestCase Multi-COT document returns one diff per type
ComparatorHelperPropertiesTestCase has_changes, has_destructive_changes, adds, removes, alters

@bctiemann
Copy link
Copy Markdown
Contributor Author

@claude /review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 9, 2026

Claude finished @bctiemann's task in 1m 53s —— View job


Code Review: PR #452 — Schema Comparator Tool

  • Read changed files
  • Analyze comparator.py
  • Analyze test_comparator.py
  • Post review feedback

Overview

This PR adds a clean, well-architected read-only diff engine that sits between the schema exporter (#386) and the upcoming upgrade executor (#389). The design is solid: the data model is expressive, the tombstone rules are well-documented, and the test coverage is comprehensive.


Strengths

  • Excellent documentation: The module docstring is a model of clarity — it documents the public API, data model, matching semantics, and tombstone rules before any code appears.
  • Query efficiency: select_related("choice_set", "related_object_type") avoids N+1 queries, and the cot_slug_cache pre-fetch consolidates custom-COT lookups into a single DB round-trip.
  • Tombstone design: The three-way treatment (field in schema → ADD/ALTER, field tombstoned → REMOVE, field absent from both → warning) correctly handles the "edited outside the workflow" case without silent data loss.
  • Test coverage: 37 tests across 9 classes covering every meaningful code path including the round-trip clean diff.

Issues

Medium — Type-specific attrs compared against schema type, not DB type

comparator.py:211

type_specific = FIELD_TYPE_ATTRS.get(schema_type, set())

When a field's type changes (e.g., textinteger), type_specific is derived from the incoming schema type, not the DB type. This means old-type-specific attributes (e.g., validation_regex on a text field being converted to integer) won't appear in changed_attrs at all — the diff will only note the type change, silently omitting the stale validation_regex on the DB field.

This is probably fine since the executor is responsible for schema changes, but the module docstring says "Type changes are included in changed_attrs but are not validated here; the executor decides whether to allow or reject them." — it could be worth adding: "Type-specific attrs from the old DB type are not included in the diff when the type changes."

Fix this →


Minor — re.IGNORECASE is unnecessary on the model name regex

comparator.py:55

_TABLE_MODEL_RE = re.compile(r'^table(\d+)model$', re.IGNORECASE)

Django generates model names as lowercase (e.g., table42model). re.IGNORECASE is unnecessary here and could mask a future naming change. Use re.ASCII instead if you need to constrain the charset, or omit the flag entirely.

Fix this →


Minor — _encode_rot warning path has no test coverage

comparator.py:173-179

if slug is None:
    warnings.append(
        f"CustomObjectType pk={cot_id} is referenced by a field but "
        "no longer exists in the DB. ..."
    )
    return f"{CUSTOM_OBJECTS_APP_LABEL_SLUG}/<deleted:{cot_id}>"

The "referenced COT was deleted" path is exercised by no test. This is the only warning path in comparator.py without test coverage.


Minor — has_changes is False for a new COT with no fields

comparator.py:132-133

@property
def has_changes(self) -> bool:
    return bool(self.cot_changes or self.field_changes)

A newly-created COT with no fields produces is_new=True, cot_changes={}, field_changes=[] — so has_changes is False. The executor would need to check is_new independently to decide whether to create the COT. The current contract is consistent (no changes to apply), but it may be surprising to callers. Consider documenting this in the docstring.


Minor — Silent no-op in _alter_field test helper

test_comparator.py:246-253

def _alter_field(self, field_name, **overrides):
    type_def = export_cot(self.cot)
    for sf in type_def.get("fields", []):
        if sf["name"] == field_name:
            sf.update(overrides)
            break
    return diff_cot(type_def)

If field_name doesn't match any exported field (e.g., due to a typo or the field not being exported), the loop exits without updating anything and diff_cot is called on an unmodified type_def — the test will then likely pass vacuously. Adding a else: raise AssertionError(f"field {field_name!r} not found in export") on the for loop would make failures explicit.

Fix this →


Minor — Missing schema key raises unhelpful KeyError

comparator.py:284-285

slug = type_def["slug"]
name = type_def["name"]

If a caller passes a malformed schema dict (missing "slug" or "name"), a bare KeyError is raised with no context. Since this is a public API, a guard like:

if "slug" not in type_def or "name" not in type_def:
    raise ValueError(f"type_def must contain 'slug' and 'name' keys; got: {list(type_def)}")

…would produce a much more actionable error message. (Alternatively, document that the input must be a valid schema dict from the exporter.)


Nits

  • comparator.py:157 — The function name _encode_rot uses the abbreviation rot (related object type) without definition. Consider _encode_related_object_type or at least a one-line note in the docstring linking it to the exporter's _encode_related_object_type (which the docstring already mentions it mirrors).
  • comparator.py:392-399diff_document processes COTs sequentially. Fine for now, but worth a comment if performance for large schemas ever becomes a concern (e.g., each COT issues several DB queries).

Summary

The implementation is clean and well-tested. The tombstone/warning design in particular is thoughtful. The only real behavioral question worth a comment is the type-specific-attrs-on-type-change edge case. Everything else is minor polish. ✓ Ready to merge once the team is satisfied with the edge case documentation.

@bctiemann bctiemann requested review from arthanson and jnovinger April 9, 2026 00:49
@bctiemann
Copy link
Copy Markdown
Contributor Author

All issues identified by the agent review have been addressed.

Copy link
Copy Markdown
Contributor

@arthanson arthanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked by #449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants