feat(db): apply file-based Drizzle migrations programmatically at startup#25
feat(db): apply file-based Drizzle migrations programmatically at startup#25cooper (czxtm) merged 3 commits intomainfrom
Conversation
…rtup
Replaces the manual `drizzle-kit push` / `bun run db:push` flow with
file-based migrations that ship with each deploy and run themselves on
the first request to a new isolate.
What changed
- `packages/db/drizzle.config.ts` switches `out` from `./src/migrations`
to `./drizzle` and stops importing `@gen/env/web` (drizzle-kit
generate doesn't need a real URL — falls back to a stub if neither
POSTGRES_URL nor DATABASE_URL is set).
- `packages/db/scripts/bundle-migrations.ts` reads `drizzle/` and writes
`packages/db/src/migrations-bundle.generated.ts` so SQL ships inlined
with the JS bundle (Cloudflare Workers can't read `node:fs` for SQL
files; this avoids any bundler-specific magic like
`import.meta.glob('?raw')`).
- `packages/db/src/migrate.ts` exports `runMigrations(db)` — a custom
migrator that takes a `pg_advisory_lock`, ensures
`__drizzle_migrations` exists, and applies each pending bundled entry
in idx order. Per-isolate, the in-flight Promise is cached so
concurrent callers reuse the same migrate run.
- `packages/auth/src/index.ts` awaits `runMigrations(db)` at
module-evaluation time before constructing the better-auth instance,
guarded on a configured connection string so vitest/typecheck
contexts don't crash at import.
- Removes `db:push` from root `package.json`, `packages/db/package.json`,
`turbo.json`, and `.stack/config.nix`. `db:generate` now chains
`drizzle-kit generate && bun run db:bundle`. `db:migrate` is kept for
local ad-hoc use.
- `docs/adr/0002-runtime-startup-migrations.md` records the decision,
rationale, alternatives considered, and follow-ups (CI bundle-drift
check, down migrations, Effect-native variant).
- README/AGENTS.md/CLAUDE.md/WARP.md updated to describe the new flow.
Why
PR #24's preview consistently 500'd on `waitlist.join` because the
per-PR Neon project was created empty and `db:push` was never run
against it. With this change, a fresh isolate booting against an empty
Neon DB applies the bundled migrations transparently — preview, staging,
and production all converge on the same flow with zero manual steps.
Baseline migration `drizzle/0000_init.sql` covers all 12 tables
(account, session, user, verification, invitation, member, organization,
organization_dek, organization_state, polar_event, user_subscription,
beta_waitlist) including the previously-missing `beta_waitlist`.
PR SummaryHigh Risk Overview Adds a Updates tooling to match the new flow: Reviewed by Cursor Bugbot for commit d0f214d. Configure here. |
|
Preview |
Adds a one-time safety net to runMigrations(): if `__drizzle_migrations` doesn't exist but the public schema already has tables (the legacy `db:push` flow's footprint), assume the schema was managed externally and pre-populate `__drizzle_migrations` with every bundled tag instead of trying to re-run the baseline `CREATE TABLE` statements. Without this, the first deploy after this change against production or staging (where the schema already exists from prior `db:push` runs) would crash on `relation "account" already exists`. With this, the first boot is a one-shot fast-forward; subsequent boots see the table and the normal diff-and-apply flow takes over. Fresh databases (PR previews, new dev installs) skip this branch — no existing tables means no inference, and the baseline migration runs normally.
|
Docs preview |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0f214d83e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| if (process.env.DATABASE_URL || process.env.POSTGRES_URL) { | ||
| await runMigrations(db); |
There was a problem hiding this comment.
Gate startup migration on DATABASE_URL only
This guard now runs migrations when only POSTGRES_URL is present, but runMigrations(db) uses the db proxy, which resolves through getDb() and throws unless DATABASE_URL is set. In environments that expose POSTGRES_URL without DATABASE_URL, importing @stackpanel/auth will now fail at module load before serving requests, even though the code comment says this guard avoids import-time crashes.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| async function applyMigrations(db: Db): Promise<void> { | ||
| await db.execute(sql`SELECT pg_advisory_lock(${MIGRATION_LOCK_KEY})`); |
There was a problem hiding this comment.
Hold advisory lock on a single DB session
pg_advisory_lock is session-scoped, but this code acquires it via db.execute(...) on a pooled Drizzle client and then performs migration queries through separate db.execute(...) calls. With a pool, those calls are not guaranteed to use the same underlying connection, so concurrent cold starts can both proceed past the intended mutex and race on DDL/insert into __drizzle_migrations, causing startup failures. Use one dedicated client/session (or a transaction-scoped advisory lock in a single transaction) for the whole migration critical section.
Useful? React with 👍 / 👎.
The previous `pg_advisory_lock()` call was session-scoped — and pg.Pool returns the underlying connection (and therefore the held lock) back to the pool on `db.execute` completion. Subsequent isolates checking out the same pooled connection would see "lock already held by your own session" and block forever, which manifested as Cloudflare 522 timeouts on every request to the PR-25 preview after deploy. Switches to `pg_advisory_xact_lock` inside a per-migration transaction so the lock auto-releases at COMMIT/ROLLBACK regardless of pool lifecycle, and re-checks `__drizzle_migrations` membership inside the lock to handle the case where another isolate raced ahead and applied the same migration first. Also drops the JS bigint lock-key constant in favour of a plain number — drizzle's `sql` template binds bigint via the pg `int8` type parser which has had subtle compatibility issues across pg versions, and the lock key fits comfortably in int4.
End-to-end verification on the preview ✅ / 🟡
1. The original failure is fixedThe bug from PR #24 — curl -s -m 30 -X POST "https://local.pr-25.stackpanel.com/api/trpc/waitlist.join?batch=1" \
-H "Content-Type: application/json" \
-d '{"0":{"json":{"email":"test+migrations@example.invalid"}}}'now responds in ~3.6s (no more Cloudflare 522 worker-timeout, no more missing-table 500). The runtime migrator built the 2. The previous-commit deadlock is also fixedThe interim 522 timeouts were caused by 3. There's a follow-up env issue blocking full successThe waitlist response is still a 500, but for an entirely different and pre-existing reason: every tRPC request constructs a Better Auth context ( "BETTER_AUTH_SECRET": "",
"BETTER_AUTH_URL": "",
"POSTGRES_URL": "ENC[AES256_GCM,...]",Better Auth's "you are using the default secret" guard then throws an
That issue includes the curl reproducer, the encrypted-payload extract, and the full acceptance criteria for closing the loop on TL;DRThis PR delivers what the brief asks for — file-based migrations applied programmatically at startup, deadlock-free under Cloudflare's pooled connection model, idempotent on existing schemas, with an ADR. The original "table doesn't exist" failure mode is dead. The remaining 500 is a secrets-pipeline problem orthogonal to the migration work. |
Summary
Replaces the manual
drizzle-kit push/bun run db:pushflow with file-based migrations that ship with each deploy and run themselves on the first request to a new isolate.The trigger: PR #24's preview consistently 500'd on the waitlist signup endpoint with
Failed query: select "id" from "beta_waitlist"because the per-PR Neon project was created empty anddb:pushwas never run against it. With this change, a fresh isolate booting against an empty Neon DB applies the bundled migrations transparently — preview, staging, and production all converge on the same flow with zero manual steps.Decision and trade-offs are documented in
docs/adr/0002-runtime-startup-migrations.md.What changed
Drizzle config & generation
packages/db/drizzle.config.ts— switchesoutfrom./src/migrationsto./drizzle, drops the@gen/env/webimport (drizzle-kitgeneratedoesn't need a real URL — falls back to a stub if neitherPOSTGRES_URLnorDATABASE_URLis set).packages/db/scripts/bundle-migrations.ts— readsdrizzle/and writespackages/db/src/migrations-bundle.generated.tsso SQL ships inlined with the JS bundle (Cloudflare Workers can't readnode:fsfor SQL files; this avoids any bundler-specific magic likeimport.meta.glob('?raw')).packages/db/drizzle/0000_init.sql— baseline migration covering all 12 tables (account, session, user, verification, invitation, member, organization, organization_dek, organization_state, polar_event, user_subscription,beta_waitlist).Runtime migrator
packages/db/src/migrate.ts— exportsrunMigrations(db). Custom migrator that takes apg_advisory_lock, ensures__drizzle_migrationsexists, and applies each pending bundled entry inidxorder. Per-isolate, the in-flightPromiseis cached so concurrent callers reuse the same migrate run; on failure the cache is cleared so the next request retries.packages/db/src/index.ts— re-exportsrunMigrationsand aDbtype for callers.Wire-in
packages/auth/src/index.ts—await runMigrations(db)at module-evaluation time before constructingbetterAuth({...}). Guarded on a configured connection string so vitest/typecheck contexts don't crash at import. Anything downstream that importsauth(the tRPC handler, route middleware, background jobs) inherits the dependency naturally.Cleanup
db:pushfrom rootpackage.json,packages/db/package.json,turbo.json, and.stack/config.nix.db:generatenow chainsdrizzle-kit generate && bun run db:bundle.db:migrateis kept for local ad-hoc use.Docs
docs/adr/0002-runtime-startup-migrations.md— full ADR (context, decision, alternatives considered, follow-ups).README.md-equivalent sections inAGENTS.md,CLAUDE.md,WARP.mdupdated to describe the new flow.Test plan
bunx tsc --noEmitclean inpackages/dbandpackages/auth(only pre-existing third-party errors innode_modules/.bun/alchemy@*remain).bunx oxlintclean on every changed/new TS file.drizzle-kit generate --name initand bundled successfully.local.<stage>.stackpanel.com:bash curl -s -m 30 -X POST "https://local.<stage>.stackpanel.com/api/trpc/waitlist.join?batch=1" \ -H "Content-Type: application/json" \ -d '{"0":{"json":{"email":"test+migrations@example.invalid"}}}'Expect
{"result":{"data":{"json":{"ok":true,"alreadyOnList":false}}}}instead of the priorFailed query: select "id" from "beta_waitlist"500.Deploy Webandverifyworkflows on the PR remain green.Migration plan / rollout
This is the first migration the
__drizzle_migrationstable will see in every environment. On the first request to a freshly deployed isolate the migrator will:pg_advisory_lock(0x4d495252)— serialise concurrent isolate boots.CREATE TABLE IF NOT EXISTS "__drizzle_migrations"— idempotent.0000_initis already applied (production case where the schema already exists), it skips. Otherwise it runs the SQL.__drizzle_migrations, release the lock.For environments that already have the schema (production, the parent
devNeon project), step 3'sCREATE TABLEstatements will fail with "relation already exists" and the migration will abort. For the production rollout, the operator should pre-populate__drizzle_migrationswith the0000_inittag before the first deploy lands:(Empty preview Neon projects don't need this — they'll happily run the baseline against a blank schema.)