ref(seer): Drop legacy night-shift columns and kind db_default#114828
Open
ref(seer): Drop legacy night-shift columns and kind db_default#114828
Conversation
Renames SeerNightShiftRunIssue → SeerNightShiftRunResult (state-only; db_table preserved) and adds a `kind` discriminator + per-row `extras` JSON to support nightly per-org work beyond agentic triage. Triage's existing `action` column is moved into extras["action"] via a deploy-time backfill. Legacy columns (action, triage_strategy, error_message) are marked SafeRemoveField MOVE_TO_PENDING; their physical drop ships in a follow-up.
Without this, runs that had an error_message recorded against the legacy column would return errorMessage: null from the API after the column is removed from model state via SafeRemoveField(MOVE_TO_PENDING). Adds a second RunPython op (run before the error_message SafeRemoveField) and extends the migration test to cover both the existing action backfill and the new error_message backfill.
Mypy flagged the missing annotation on the migration test's setup_before_migration override.
Dropping the db_default in the same migration that adds the NOT NULL column breaks rolling deploys: old replicas still INSERTing without kind would fail with a NOT NULL violation between when the migration applies and when those replicas are replaced. Defer the db_default drop to the follow-up PR2 (the same one that runs SafeRemoveField DELETE for the legacy columns), which already requires PR1 to be fully deployed.
Pre-migration the triage_strategy column was a required CharField, always set to "agentic_triage" — including for runs that produced no result rows (failed quota check, no eligible projects, dry runs). My earlier serializer change derived the legacy alias from result-row presence, silently flipping triageStrategy to null for those runs. Hardcode it back to "agentic_triage" so the API surface matches the old behavior. Multi-kind serializer logic ships with the feature PR.
The migration has 12 operations including a CONCURRENTLY index, so TestMigrations.setUp's roll-back/forward cycle was hovering at or over CI's 120s fail-slow budget per test. The migration itself is already exercised by: - Direct application against dev DB with real data (47 result rows backfilled correctly). - The full night-shift test suite (49 tests) running against the post-migration schema with --create-db. - The two RunPython callbacks are simple idempotent loops; the cost of an integration test has stopped paying for itself.
This reverts commit f7cd5c3.
Drive-by edit from the original PR1 commit. No reason for it.
Follow-up to the schema-genericization PR. The previous PR moved action, triage_strategy, and error_message to MOVE_TO_PENDING and left a db_default on the new kind column to keep mid-rolling-deploy INSERTs from old replicas valid. Now that the previous PR is fully deployed, this PR: - SafeRemoveField(DELETE) for action, triage_strategy, error_message to physically drop the columns from Postgres. - Drops the db_default on kind so callers must specify it explicitly (already the case in all live code).
Contributor
|
This PR has a migration; here is the generated SQL for for --
-- Custom state/database change combination
--
-- (no-op)
--
-- Alter field run on seernightshiftrunresult
--
-- (no-op)
--
-- Add field kind to seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ADD COLUMN "kind" varchar(256) DEFAULT 'agentic_triage' NOT NULL;
--
-- Add field extras to seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ADD COLUMN "extras" jsonb DEFAULT '{}'::jsonb NOT NULL;
--
-- Alter field group on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "group_id" DROP NOT NULL;
--
-- Alter field action on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "action" DROP NOT NULL;
--
-- Alter field triage_strategy on seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" ALTER COLUMN "triage_strategy" DROP NOT NULL;
--
-- Create index seer_nights_run_id_d5406e_idx on field(s) run, kind of model seernightshiftrunresult
--
CREATE INDEX CONCURRENTLY "seer_nights_run_id_d5406e_idx" ON "seer_nightshiftrunissue" ("run_id", "kind");
--
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL
--
-- Moved seernightshiftrunresult.action field to pending deletion state
--
-- (no-op)
--
-- Moved seernightshiftrun.triage_strategy field to pending deletion state
--
-- (no-op)
--
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL
--
-- Moved seernightshiftrun.error_message field to pending deletion state
--
-- (no-op)for --
-- Remove field action from seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" DROP COLUMN "action" CASCADE;
--
-- Remove field triage_strategy from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "triage_strategy" CASCADE;
--
-- Remove field error_message from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "error_message" CASCADE;
--
-- Alter field kind on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "kind" DROP DEFAULT; |
…-column-deletion # Conflicts: # migrations_lockfile.txt # src/sentry/seer/models/night_shift.py
Contributor
|
This PR has a migration; here is the generated SQL for for --
-- Remove field action from seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" DROP COLUMN "action" CASCADE;
--
-- Remove field triage_strategy from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "triage_strategy" CASCADE;
--
-- Remove field error_message from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "error_message" CASCADE;
--
-- Alter field kind on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "kind" DROP DEFAULT; |
The test passes but exceeds the global 120s budget because Django's migration framework has to roll back/forward 12 operations (including a CONCURRENTLY index) every time. CI attributes this work to the "call" phase, so pytest-fail-slow trips. Lifting the per-test budget to 4m gives stable headroom; the cost is fundamentally in the framework, not the test logic.
The migration shipped, the data backfilled correctly in prod, and the test was the only one in the codebase needing a fail_slow override — 130s of CI time on every backend PR for marginal post-merge value. Squash will eventually elide the migration anyway (it's marked elidable=True). Cleaner to remove now along with the column-deletion migration this PR ships.
wedamija
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #114790 — review and land that one first, then wait for full deploy before merging this.
Drops the legacy night-shift columns now that #114790 has fully deployed and all live code has stopped reading/writing them.
SafeRemoveField(DELETE)forseer_nightshiftrunissue.action,seer_nightshiftrun.triage_strategy, andseer_nightshiftrun.error_message— physically drops the columns from Postgres.db_defaultonkind(kept in ref(seer): Genericize night shift result table #114790 to keep INSERTs from old replicas valid mid-rolling-deploy). All live code now specifieskindexplicitly, so the safety net is no longer needed.