Skip to content

Conversation

@dbernheisel
Copy link
Contributor

No description provided.

Comment on lines +95 to +99
If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling
the migration lock in the CheckRepoStatus plug during dev to avoid hitting and
waiting on the advisory lock with concurrent web processes. You can do this by
adding `migration_lock: false` to the CheckRepoStatus plug in your
`MyAppWeb.Endpoint`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling
the migration lock in the CheckRepoStatus plug during dev to avoid hitting and
waiting on the advisory lock with concurrent web processes. You can do this by
adding `migration_lock: false` to the CheckRepoStatus plug in your
`MyAppWeb.Endpoint`.
If you're using Phoenix, you will likely need to disable
the migration lock in the CheckRepoStatus plug during dev to avoid hitting and
waiting on the advisory lock with concurrent web processes. You can do this by
adding `migration_lock: false` to the CheckRepoStatus plug in your
`MyAppWeb.Endpoint`.


Note: we cannot use `Ecto.Migration.modify/3` as it will include updating the column type as
well unnecessarily, causing Postgres to rewrite the table. For more information,
[see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to an external repo? Perhaps we remove the link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the issue (hah) leaving the link, it's mostly a great issue explaining and reproducing the problem for those that want more info.

Removing the link is fine though


The issue is that we cannot use `Ecto.Migration.modify/3` as it will include updating the column type as
well unnecessarily, causing Postgres to rewrite the table. For more information,
[see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link here too!


### Good

**Strategy 1**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call these **Option 1** and so on for consistency with the rest of the guide?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively rename Option 1 to Strategy. :)


## Adding a value to a PostgreSQL enum

Adding enum values inside a transaction fails in PostgreSQL < 12.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL 12 is already EOLed, so we can remove it.


Here's how we'll manage the backfill:

1. Create a "temporary" table. In this example, we're creating a real table that we'll drop at the end of the data migration. In Postgres, there are [actual temporary tables](https://www.postgresql.org/docs/12/sql-createtable.html) that are discarded after the session is over; we're not using those because we need resiliency in case the data migration encounters an error. The error would cause the session to be over, and therefore the temporary table tracking progress would be lost. Real tables don't have this problem. Likewise, we don't want to store IDs in application memory during the migration for the same reason.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a temporary table, couldn't we use a temporary column? Or is the issue that removing the column later would be expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It assumes the backfill is about one column, and if so that could be another option, but it doesn't have to be: the user might be doing multiple fixes and write to multiple columns.

The point is that we need to store a list of records that a generic operation needs to run on.

Copy link
Member

@greg-rychlewski greg-rychlewski Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jose means you add a column to say "fix applied: yes or no". Then when you iterate through your table doing the updates you use that column to prevent yourself from applying the fix more than once in case of restarts. And whatever filter you used to populate your temporary table can be used to stop the iterating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand now. Yeah that could be a good approach but yes also have to consider the same gotchas with adding columns with defaults on large existing tables.

I find a separate temporary table safer and less coupled. Less likely to fool with application logic

@@ -0,0 +1,365 @@
# Backfilling Data

When I say "backfilling data", I mean that as any attempt to change data in bulk. This can happen in code through migrations, application code, UIs that allow multiple selections and updates, or in a console connected to a running application. Since bulk changes affect a lot of data, it's always a good idea to have the code reviewed before it runs. You also want to check that it runs efficiently and does not overwhelm the database. Ideally, it's nice when the code is written to be safe to re-run. For these reasons, please don't change data in bulk through a console!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place we use first person in the guides, so we should make it consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I kept looking at this phrase and didn't like it. I can rephrase

@josevalim
Copy link
Member

Thank you @dbernheisel! This looks excellent and I do have some quick feedback:

  1. We should probably check the EOL version of the databases and remove advice that applies only to old versions (e.g. before PostgreSQL 12)

  2. Some part of the guides are too specific to PostgreSQL and I think we should either make them consistent for all databases or remove the PostgreSQL. My suggestion, to make this easier on everyone, is to remove the PG specifics bit for, pretty much the reference manual on Safe Migrations and some of the locking bits in the anatomy of a migration. If you agree, I can push those changes. Can I go ahead?

@dbernheisel
Copy link
Contributor Author

Absolutely. I'll go through feedback today

@dbernheisel
Copy link
Contributor Author

The gotchas are important however for each flavor.

The MyXQL adapter uses advisory locks so there are less issues with transactions.

The Postgres adapter does not default to advisory locks (maybe it should?) so that it can avoid some gotchas

MSSQL I have less experience so more verification is needed

Sqlite3 is different enough where I don't suspect these same gotchas apply, but again more verification is needed.

I think the recipes could be formatted better to have callouts per adapter? It does cover MySQL and Postgres, but is silent on the others.

@josevalim
Copy link
Member

The gotchas are important however for each flavor.

Yeah, I was not asking to remove those, those are really great! It was mostly the debugging locks bits and the reference material, so I pushed my changes already. We could have a separate document on debugging PG locks but as I said, easier to do in a future PR. :)

From my point of view, nothing else needs to removed. There are still some PG specific commands still but it is very easy for someone to consult those for other databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants