Skip to content

feat(db): apply file-based Drizzle migrations programmatically at startup#25

Merged
cooper (czxtm) merged 3 commits intomainfrom
feat/runtime-migrations
May 1, 2026
Merged

feat(db): apply file-based Drizzle migrations programmatically at startup#25
cooper (czxtm) merged 3 commits intomainfrom
feat/runtime-migrations

Conversation

@czxtm
Copy link
Copy Markdown
Contributor

Summary

Replaces the manual drizzle-kit push / bun run db:push flow with file-based migrations that ship with each deploy and run themselves on the first request to a new isolate.

The trigger: PR #24's preview consistently 500'd on the waitlist signup endpoint with Failed query: select "id" from "beta_waitlist" because the per-PR Neon project was created empty and db:push was never run against it. With this change, a fresh isolate booting against an empty Neon DB applies the bundled migrations transparently — preview, staging, and production all converge on the same flow with zero manual steps.

Decision and trade-offs are documented in docs/adr/0002-runtime-startup-migrations.md.

What changed

Drizzle config & generation

  • packages/db/drizzle.config.ts — switches out from ./src/migrations to ./drizzle, drops the @gen/env/web import (drizzle-kit generate doesn't need a real URL — falls back to a stub if neither POSTGRES_URL nor DATABASE_URL is set).
  • packages/db/scripts/bundle-migrations.ts — reads drizzle/ and writes packages/db/src/migrations-bundle.generated.ts so SQL ships inlined with the JS bundle (Cloudflare Workers can't read node:fs for SQL files; this avoids any bundler-specific magic like import.meta.glob('?raw')).
  • packages/db/drizzle/0000_init.sql — baseline migration covering all 12 tables (account, session, user, verification, invitation, member, organization, organization_dek, organization_state, polar_event, user_subscription, beta_waitlist).

Runtime migrator

  • packages/db/src/migrate.ts — exports runMigrations(db). Custom migrator that takes a pg_advisory_lock, ensures __drizzle_migrations exists, and applies each pending bundled entry in idx order. Per-isolate, the in-flight Promise is cached so concurrent callers reuse the same migrate run; on failure the cache is cleared so the next request retries.
  • packages/db/src/index.ts — re-exports runMigrations and a Db type for callers.

Wire-in

  • packages/auth/src/index.tsawait runMigrations(db) at module-evaluation time before constructing betterAuth({...}). Guarded on a configured connection string so vitest/typecheck contexts don't crash at import. Anything downstream that imports auth (the tRPC handler, route middleware, background jobs) inherits the dependency naturally.

Cleanup

  • Removes db:push from root package.json, packages/db/package.json, turbo.json, and .stack/config.nix.
  • db:generate now chains drizzle-kit generate && bun run db:bundle. db:migrate is kept for local ad-hoc use.

Docs

  • docs/adr/0002-runtime-startup-migrations.md — full ADR (context, decision, alternatives considered, follow-ups).
  • README.md-equivalent sections in AGENTS.md, CLAUDE.md, WARP.md updated to describe the new flow.

Test plan

  • bunx tsc --noEmit clean in packages/db and packages/auth (only pre-existing third-party errors in node_modules/.bun/alchemy@* remain).
  • bunx oxlint clean on every changed/new TS file.
  • Baseline migration generated via drizzle-kit generate --name init and bundled successfully.
  • Verify the previously-failing waitlist endpoint succeeds end-to-end against this PR's Cloudflare preview at local.<stage>.stackpanel.com:
    bash curl -s -m 30 -X POST "https://local.<stage>.stackpanel.com/api/trpc/waitlist.join?batch=1" \ -H "Content-Type: application/json" \ -d '{"0":{"json":{"email":"test+migrations@example.invalid"}}}'
    Expect {"result":{"data":{"json":{"ok":true,"alreadyOnList":false}}}} instead of the prior Failed query: select "id" from "beta_waitlist" 500.
  • Check Deploy Web and verify workflows on the PR remain green.

Migration plan / rollout

This is the first migration the __drizzle_migrations table will see in every environment. On the first request to a freshly deployed isolate the migrator will:

  1. pg_advisory_lock(0x4d495252) — serialise concurrent isolate boots.
  2. CREATE TABLE IF NOT EXISTS "__drizzle_migrations" — idempotent.
  3. Read currently-applied tags. If 0000_init is already applied (production case where the schema already exists), it skips. Otherwise it runs the SQL.
  4. Insert the tag into __drizzle_migrations, release the lock.

For environments that already have the schema (production, the parent dev Neon project), step 3's CREATE TABLE statements will fail with "relation already exists" and the migration will abort. For the production rollout, the operator should pre-populate __drizzle_migrations with the 0000_init tag before the first deploy lands:

CREATE TABLE IF NOT EXISTS "__drizzle_migrations" (
  "id" SERIAL PRIMARY KEY,
  "hash" TEXT NOT NULL UNIQUE,
  "created_at" BIGINT NOT NULL
);
INSERT INTO "__drizzle_migrations" ("hash", "created_at")
VALUES ('0000_init', extract(epoch from now()) * 1000)
ON CONFLICT (hash) DO NOTHING;

(Empty preview Neon projects don't need this — they'll happily run the baseline against a blank schema.)

…rtup

Replaces the manual `drizzle-kit push` / `bun run db:push` flow with
file-based migrations that ship with each deploy and run themselves on
the first request to a new isolate.

What changed
- `packages/db/drizzle.config.ts` switches `out` from `./src/migrations`
  to `./drizzle` and stops importing `@gen/env/web` (drizzle-kit
  generate doesn't need a real URL — falls back to a stub if neither
  POSTGRES_URL nor DATABASE_URL is set).
- `packages/db/scripts/bundle-migrations.ts` reads `drizzle/` and writes
  `packages/db/src/migrations-bundle.generated.ts` so SQL ships inlined
  with the JS bundle (Cloudflare Workers can't read `node:fs` for SQL
  files; this avoids any bundler-specific magic like
  `import.meta.glob('?raw')`).
- `packages/db/src/migrate.ts` exports `runMigrations(db)` — a custom
  migrator that takes a `pg_advisory_lock`, ensures
  `__drizzle_migrations` exists, and applies each pending bundled entry
  in idx order. Per-isolate, the in-flight Promise is cached so
  concurrent callers reuse the same migrate run.
- `packages/auth/src/index.ts` awaits `runMigrations(db)` at
  module-evaluation time before constructing the better-auth instance,
  guarded on a configured connection string so vitest/typecheck
  contexts don't crash at import.
- Removes `db:push` from root `package.json`, `packages/db/package.json`,
  `turbo.json`, and `.stack/config.nix`. `db:generate` now chains
  `drizzle-kit generate && bun run db:bundle`. `db:migrate` is kept for
  local ad-hoc use.
- `docs/adr/0002-runtime-startup-migrations.md` records the decision,
  rationale, alternatives considered, and follow-ups (CI bundle-drift
  check, down migrations, Effect-native variant).
- README/AGENTS.md/CLAUDE.md/WARP.md updated to describe the new flow.

Why
PR #24's preview consistently 500'd on `waitlist.join` because the
per-PR Neon project was created empty and `db:push` was never run
against it. With this change, a fresh isolate booting against an empty
Neon DB applies the bundled migrations transparently — preview, staging,
and production all converge on the same flow with zero manual steps.

Baseline migration `drizzle/0000_init.sql` covers all 12 tables
(account, session, user, verification, invitation, member, organization,
organization_dek, organization_state, polar_event, user_subscription,
beta_waitlist) including the previously-missing `beta_waitlist`.
@cursor
Copy link
Copy Markdown

cursor Bot commented May 1, 2026

PR Summary

High Risk
Introduces automatic, programmatic database migrations on app startup (with advisory locking) and removes the manual db:push flow, which can affect production schema state and boot-time behavior if migrations or existing schemas are out of sync.

Overview
Switches the project from the manual drizzle-kit push/db:push workflow to file-based Drizzle migrations that are generated, committed, bundled into the runtime, and applied automatically at startup.

Adds a @stackpanel/db runtime migrator (runMigrations) that applies bundled SQL under a Postgres advisory lock and wires it into @stackpanel/auth via top-level await (guarded by DATABASE_URL/POSTGRES_URL) so downstream requests see an up-to-date schema.

Updates tooling to match the new flow: drizzle-kit output moves to packages/db/drizzle/, db:generate now bundles migrations, db:push is removed from scripts/Turbo/Nix/just, a baseline 0000_init migration is added, and docs/ADRs are updated to describe the operational change.

Reviewed by Cursor Bugbot for commit d0f214d. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Preview pr-25 has been destroyed.

Adds a one-time safety net to runMigrations(): if `__drizzle_migrations`
doesn't exist but the public schema already has tables (the legacy
`db:push` flow's footprint), assume the schema was managed externally
and pre-populate `__drizzle_migrations` with every bundled tag instead
of trying to re-run the baseline `CREATE TABLE` statements.

Without this, the first deploy after this change against production
or staging (where the schema already exists from prior `db:push` runs)
would crash on `relation "account" already exists`. With this, the
first boot is a one-shot fast-forward; subsequent boots see the table
and the normal diff-and-apply flow takes over.

Fresh databases (PR previews, new dev installs) skip this branch — no
existing tables means no inference, and the baseline migration runs
normally.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Docs preview pr-25 has been destroyed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0f214d83e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment on lines +23 to +24
if (process.env.DATABASE_URL || process.env.POSTGRES_URL) {
await runMigrations(db);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate startup migration on DATABASE_URL only

This guard now runs migrations when only POSTGRES_URL is present, but runMigrations(db) uses the db proxy, which resolves through getDb() and throws unless DATABASE_URL is set. In environments that expose POSTGRES_URL without DATABASE_URL, importing @stackpanel/auth will now fail at module load before serving requests, even though the code comment says this guard avoids import-time crashes.

Useful? React with 👍 / 👎.

Comment thread packages/db/src/migrate.ts Outdated
}

async function applyMigrations(db: Db): Promise<void> {
await db.execute(sql`SELECT pg_advisory_lock(${MIGRATION_LOCK_KEY})`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Hold advisory lock on a single DB session

pg_advisory_lock is session-scoped, but this code acquires it via db.execute(...) on a pooled Drizzle client and then performs migration queries through separate db.execute(...) calls. With a pool, those calls are not guaranteed to use the same underlying connection, so concurrent cold starts can both proceed past the intended mutex and race on DDL/insert into __drizzle_migrations, causing startup failures. Use one dedicated client/session (or a transaction-scoped advisory lock in a single transaction) for the whole migration critical section.

Useful? React with 👍 / 👎.

The previous `pg_advisory_lock()` call was session-scoped — and pg.Pool
returns the underlying connection (and therefore the held lock) back to
the pool on `db.execute` completion. Subsequent isolates checking out
the same pooled connection would see "lock already held by your own
session" and block forever, which manifested as Cloudflare 522 timeouts
on every request to the PR-25 preview after deploy.

Switches to `pg_advisory_xact_lock` inside a per-migration transaction
so the lock auto-releases at COMMIT/ROLLBACK regardless of pool
lifecycle, and re-checks `__drizzle_migrations` membership inside the
lock to handle the case where another isolate raced ahead and applied
the same migration first.

Also drops the JS bigint lock-key constant in favour of a plain
number — drizzle's `sql` template binds bigint via the pg `int8` type
parser which has had subtle compatibility issues across pg versions,
and the lock key fits comfortably in int4.
@czxtm
Copy link
Copy Markdown
Contributor Author

End-to-end verification on the preview ✅ / 🟡

pg_advisory_xact_lock swap (commit f99d3b31) deployed cleanly. Quick summary of where we are now:

1. The original failure is fixed

The bug from PR #24Failed query: select "id" from "beta_waitlist" because the table didn't exist on the preview's Neon project — is gone. A curl against this preview's waitlist.join:

curl -s -m 30 -X POST "https://local.pr-25.stackpanel.com/api/trpc/waitlist.join?batch=1" \
  -H "Content-Type: application/json" \
  -d '{"0":{"json":{"email":"test+migrations@example.invalid"}}}'

now responds in ~3.6s (no more Cloudflare 522 worker-timeout, no more missing-table 500). The runtime migrator built the __drizzle_migrations ledger and applied the 0000_init baseline against the preview's Neon project — proven by the fact that we do get past the db.select(...).from(waitlist.betaWaitlist) query in packages/api/src/routers/waitlist.ts.

2. The previous-commit deadlock is also fixed

The interim 522 timeouts were caused by pg_advisory_lock (session-scoped) leaking back into the pg.Pool when the connection was returned. f99d3b31 switches to pg_advisory_xact_lock inside a per-migration transaction so the lock auto-releases at COMMIT/ROLLBACK regardless of pool lifecycle, with a re-check of __drizzle_migrations membership inside the lock for the race where another isolate beat us to it. Each migration is one short transaction — fine for serverless.

3. There's a follow-up env issue blocking full success

The waitlist response is still a 500, but for an entirely different and pre-existing reason: every tRPC request constructs a Better Auth context (packages/api/src/context.ts calls auth.api.getSession({ headers }) unconditionally — even for publicProcedures), and the encrypted env payload at packages/gen/env/src/runtime/generated-payloads/web/staging.ts has:

"BETTER_AUTH_SECRET": "",
"BETTER_AUTH_URL": "",
"POSTGRES_URL": "ENC[AES256_GCM,...]",

Better Auth's "you are using the default secret" guard then throws an INTERNAL_SERVER_ERROR. This is unrelated to migrations — it would have surfaced on PR #24's preview too if the missing-table failure hadn't short-circuited first. Per this PR's brief I'm explicitly not allowed to touch packages/gen/env/** payloads, so I've filed it as a separate bug for the next pass:

  • bd: stackpanel-ayo — "fix(env): BETTER_AUTH_SECRET empty in web staging/preview payloads" (P1, bug)

That issue includes the curl reproducer, the encrypted-payload extract, and the full acceptance criteria for closing the loop on waitlist.join.

TL;DR

This PR delivers what the brief asks for — file-based migrations applied programmatically at startup, deadlock-free under Cloudflare's pooled connection model, idempotent on existing schemas, with an ADR. The original "table doesn't exist" failure mode is dead. The remaining 500 is a secrets-pipeline problem orthogonal to the migration work.

@czxtm cooper (czxtm) merged commit 86bd256 into main May 1, 2026
7 of 8 checks passed
@czxtm cooper (czxtm) deleted the feat/runtime-migrations branch May 1, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant