Skip to content

Fix PostgreSQL CREATE INDEX race condition on container startup (v2.5.62)#34

Merged
ttlequals0 merged 2 commits intomainfrom
fix/advisory-lock-migration
Feb 15, 2026
Merged

Fix PostgreSQL CREATE INDEX race condition on container startup (v2.5.62)#34
ttlequals0 merged 2 commits intomainfrom
fix/advisory-lock-migration

Conversation

@ttlequals0
Copy link
Owner

Summary

  • Replace fcntl.flock file lock with PostgreSQL advisory lock (pg_advisory_lock) for migration coordination across containers
  • File locks only work within a single container /tmp -- the app and celery-worker containers raced on every restart, causing ~8 pg_class_relname_nsp_index duplicate key errors per restart event
  • Advisory locks work across all connections to the same PostgreSQL database, fully preventing the race condition
  • Falls back to uncoordinated execution for non-PG databases (each DDL statement already has its own idempotency handling)

Test plan

  • All 5 new migration lock tests pass (constant stability, fallback, lock release on success, lock release on failure, waiter path)
  • Full test suite passes (209 passed, 8 skipped)
  • Docker image built and pushed as ttlequals0/pixelprobe:2.5.62
  • Deploy to production and verify no pg_class_relname_nsp_index errors on container restart
  • Verify logs show Acquired PostgreSQL advisory lock from one process and waiting for completion from others

….62)

Replace fcntl.flock file lock with PostgreSQL advisory lock for migration
coordination. File locks only work within a single container /tmp filesystem,
so the app and celery-worker containers raced against each other on every restart,
causing duplicate key value violates unique constraint pg_class_relname_nsp_index
errors per restart event.

Advisory locks work across all connections to the same PostgreSQL database.
Winner process runs migrations while others block and skip. Falls back to
uncoordinated execution if advisory lock fails (each DDL statement already
has its own idempotency handling).
Schedule CRUD endpoints caught only ImportError from reload_schedules_task.delay(),
but when the app module is imported first (resolving a circular import), the Celery
task loads successfully and .delay() fails with a Redis ConnectionError instead.
Widen to catch Exception so schedule operations succeed regardless of Celery/Redis
availability. Also refactor migration lock tests to avoid importing app module
directly (prevents Celery initialization side effects).
@ttlequals0 ttlequals0 merged commit 8ba3261 into main Feb 15, 2026
6 checks passed
@ttlequals0 ttlequals0 deleted the fix/advisory-lock-migration branch February 15, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant