Skip to content

Closes: #391 (Phase 2) - post_migrate auto-heal for mixin column drift#481

Open
bctiemann wants to merge 3 commits into391-base-column-snapshotfrom
391-post-migrate-heal
Open

Closes: #391 (Phase 2) - post_migrate auto-heal for mixin column drift#481
bctiemann wants to merge 3 commits into391-base-column-snapshotfrom
391-post-migrate-heal

Conversation

@bctiemann
Copy link
Copy Markdown
Contributor

Summary

Implements automatic detection and repair of mixin column drift for Custom Object Type tables (Phase 2 of #391). When NetBox is upgraded and a mixin (e.g. ChangeLoggingMixin) gains a new concrete column, existing COT tables would previously be missing that column silently. This PR closes that gap.

Depends on PR #480 (Phase 1 -- base column snapshot).

New module: mixin_migration.py

  • _expected_base_fields(cot) -- resolves the full set of base columns from the live CustomObject model, excluding user-defined fields
  • _can_auto_add(field) -- determines whether a missing column can be safely added (ADD COLUMN) without breaking existing rows (nullable or has a default)
  • heal_cot(cot, verbosity, dry_run) -- detects drift for a single COT, adds safe columns via schema_editor.add_field(), warns on unsafe ones (NOT NULL/no default, type changes, removals), never auto-drops, refreshes schema_document["base_columns"] after successful additions
  • heal_all_cots(verbosity, dry_run) -- iterates all COTs, returns {total, healed, warnings} summary

Signal wiring (__init__.py)

_heal_mixin_columns() is connected to post_migrate in ready(). It fires automatically after every manage.py migrate run. A module-level _heal_ran flag ensures the work happens only once per process even though post_migrate fires once per installed app. Excluded during makemigrations and collectstatic.

Management command: upgrade_custom_objects

Explicit escape hatch for inspection and manual repair:

manage.py upgrade_custom_objects               # check and fix all COTs
manage.py upgrade_custom_objects --dry-run     # preview without DB changes
manage.py upgrade_custom_objects --cot mytype  # single COT by name or ID

Safety rules

Scenario Action
New nullable column Auto-ADD
New column with default Auto-ADD
New NOT NULL column, no default Warn only
Column type changed Warn only
Column removed from mixin Warn only, never DROP

Test plan

  • CanAutoAddTestCase -- nullable, defaulted, both, neither
  • ExpectedBaseFieldsTestCase -- returns id/created/last_updated, excludes user fields, returns real Django Field instances
  • HealCotTestCase -- no-drift returns empty; nullable column is added to DB; non-nullable warns; snapshot updated after addition; removed column warns but not dropped; dry_run reports without touching DB; dry_run does not update snapshot
  • HealAllCotsTestCase -- total matches COT count, healed=0 when no drift, correct summary keys
  • UpgradeCustomObjectsCommandTestCase -- runs without error, --dry-run flag, --cot by name, --cot by ID, unknown COT raises CommandError

Generated with Claude Code

Add netbox_custom_objects/mixin_migration.py with:
  - _expected_base_fields(): resolves base columns from live CustomObject model
  - _can_auto_add(): determines if a missing column can be safely ADD COLUMNed
  - heal_cot(): detects drift for one COT, adds safe columns, warns on unsafe
    ones, never auto-drops; refreshes schema_document snapshot on success
  - heal_all_cots(): iterates all COTs, returns {total, healed, warnings}

Wire _heal_mixin_columns() post_migrate signal handler in __init__.py:ready()
so the heal runs automatically after every 'manage.py migrate'. A module-level
_heal_ran flag deduplicates the call (post_migrate fires once per installed
app). makemigrations and collectstatic are excluded.

Add management command 'upgrade_custom_objects' with --dry-run and --cot flags
for explicit inspection and repair outside the migrate flow.

Tests cover: _can_auto_add() unit tests, _expected_base_fields() with/without
user fields, heal_cot() adding nullable columns, warning on non-nullable, not
dropping removed columns, snapshot refresh, dry_run correctness,
heal_all_cots() summary counts, and all management command flags.
@bctiemann bctiemann force-pushed the 391-post-migrate-heal branch from 61b9fe2 to f446e28 Compare April 23, 2026 10:11
- Snapshot refresh: derive updated base_columns by merging new fields into the
  existing snapshot dict rather than re-running _store_base_column_snapshot(model),
  which re-derives from model._meta and misses columns added directly to the DB
  by this heal pass (schema_editor.add_field does not update _meta).
- removed-column test: actually add ghost_col to the DB table so the
  'if col_name in actual_names' guard in heal_cot() fires correctly.
@bctiemann bctiemann requested a review from arthanson April 23, 2026 22:35
@bctiemann
Copy link
Copy Markdown
Contributor Author

@claude /review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @bctiemann's task in 2m 11s —— View job


Code Review — PR #481: Phase 2 post_migrate auto-heal for mixin column drift

  • Gather context and read changed files
  • Analyze code quality and correctness
  • Review test coverage
  • Post review feedback

Overview

This is a well-structured implementation of automatic mixin column drift detection and repair. The design is sound: post-migrate signal wiring, a module-level deduplication flag, a management command escape hatch, and clear safety rules about what can/cannot be auto-fixed. Overall the code is clean and the test coverage is thorough. A few issues are worth addressing before merge.


Issues Found

Bug: Field name vs. column name mismatch

mixin_migration.py:43-45 and mixin_migration.py:112

_expected_base_fields keys the result by f.name, but _actual_column_names returns DB introspection column names. These diverge whenever a model field uses db_column= to override the column name. The comparison at line 112 (if field_name in actual_names) would then fail to detect an existing column, causing heal_cot to try adding a column that already exists (which would fail or succeed depending on DB behavior).

# Current: uses f.name (Python attribute name)
return {f.name: f for f in model._meta.concrete_fields if ...}

# Should use f.attname or f.column for FK fields and db_column overrides:
return {f.column: f for f in model._meta.concrete_fields if ...}

Also, the user-field exclusion at line 41 checks cot.fields.values_list("name", flat=True) (Python field names), which also may not match f.column. These need to be consistent.

Fix this →


Missing implementation: type-change detection

mixin_migration.py docstring line 19 says "Warn only: column type appears to have changed" but there is no code that actually performs this check. The function detects missing columns and removed columns, but never compares the type of columns that exist in both expected and actual. This gap means silently incorrect column types go undetected.

Fix this →


Design concern: logging vs. stdout in the management command

upgrade_custom_objects.py uses self.stdout.write for high-level status and logger.info (inside heal_cot) for per-column details. In practice, users running manage.py upgrade_custom_objects --dry-run -v 2 won't see the "would ADD COLUMN" messages in their terminal unless Django logging is explicitly configured to print to console. This makes dry-run mode less useful interactively.

Consider having heal_cot accept an optional output callable (like self.stdout.write) so the command can route per-column messages to stdout, or have the command post-process the returned result dict to emit those details.


Minor: _heal_ran never resets — could confuse tests indirectly

__init__.py:37,59 — The module-level _heal_ran flag is set on first call and never cleared. This is by design for production (one heal per process), but if a test indirectly triggers _heal_mixin_columns (e.g., via a real post_migrate firing during TransactionTestCase setup), subsequent calls in the same test process are silently skipped. The tests currently call heal_cot and heal_all_cots directly so this doesn't bite them, but it's worth noting for future test authors.


Minor: double cot.get_model() call in hot path

mixin_migration.py:108 and 140_expected_base_fields(cot) calls cot.get_model() internally, and then cot.get_model() is called again in the add_field block. Consider storing the result in a local variable and passing it to _expected_base_fields, or at minimum retrieving the model once in heal_cot:

model = cot.get_model()
expected = _expected_base_fields(model, user_field_names)  # refactored signature

Test Coverage Assessment

The tests are comprehensive and well-structured. Highlights:

  • CanAutoAddTestCase — clean unit tests, all four combinations covered
  • HealCotTestCase — good integration tests against a real DB; dry-run, snapshot update, and removed-column cases all covered
  • HealAllCotsTestCase and UpgradeCustomObjectsCommandTestCase — solid smoke tests

One gap: test_dry_run_flag_accepted only asserts "DRY RUN" appears in stdout (the top-level handle() message). The per-field "would ADD COLUMN" messages from heal_cot go to logger.info, not self.stdout, so there's no test that verifies the dry-run per-column output path is actually reachable from the management command. This is a minor gap but aligns with the stdout/logger mixing issue above.


Summary

Issue Severity
f.name vs f.column in drift comparison Medium — would produce incorrect results for db_column-overriding fields
Missing type-change detection Low — documented in docstring but unimplemented
Logger vs. stdout mixing in command Low — UX/DX concern
Double get_model() call Negligible

The core architecture is solid. The f.name vs f.column issue is the most important to address before merge.

- Key by f.column (not f.name) in _expected_base_fields for correct FK field comparison
- Accept optional model arg in _expected_base_fields to avoid duplicate get_model()
- Add type-change detection: warn when a present base column field class differs from snapshot
- Route per-column output to self.stdout by having command iterate COTs directly
- Add test for type-change detection warning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bctiemann
Copy link
Copy Markdown
Contributor Author

All review items addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant