Skip to content

perf(migrations): bulk backfill in 0268 release_authorization_to_pro#15044

Open
Maffooch wants to merge 1 commit into
bugfixfrom
perf/0268-bulk-backfill-authorized-users
Open

perf(migrations): bulk backfill in 0268 release_authorization_to_pro#15044
Maffooch wants to merge 1 commit into
bugfixfrom
perf/0268-bulk-backfill-authorized-users

Conversation

@Maffooch

Copy link
Copy Markdown
Contributor

Problem

A customer with 15k users / 17k products reported that
dojo/db_migrations/0268_release_authorization_to_pro.py took many hours to run.

The slow part is backfill_authorized_users, which translates RBAC rows into the
legacy authorized_users M2M:

  • Direct members: one get_or_create per Product_Member / Product_Type_Member row — a SELECT + conditional INSERT each.
  • Group flattening (the killer): a nested loop running a fresh Dojo_Group_Member query for every Product_Group / Product_Type_Group row, then a get_or_create per member — O(groups × members) sequential round-trips.
  • No logging, so a multi-hour run looked like a hang.

At customer scale this is ~1.9M sequential queries, which on a remote/managed Postgres
(with pghistory audit triggers per INSERT) reaches multiple hours.

Fix

  • One pass over Dojo_Group_Member into an in-memory group_id → [user_id] map, reused by the group-grant expansion and the Global_Role flag updates.
  • Accumulate (obj_id, user_id) pairs into a set (in-memory dedup), then bulk_create(batch_size=1000, ignore_conflicts=True) per through table. ignore_conflicts + the unique constraint preserves the get_or_create idempotency.
  • Global_Roleis_superuser / is_staff updates stay bulk .update() and reuse the map.
  • logger.info for detection, per-batch progress, flag counts, and completion (matches the 0082 / 0201 migration pattern).

The migration's operations list is unchanged (makemigrations --check reports no changes) — only the RunPython callable internals changed.

Benchmarks (dockerized Postgres, results identical at every scale)

Scale (users / products) OLD time OLD queries NEW time NEW queries
5k / 2.5k (~86k pairs) 54.10 s 175,438 2.92 s 92
15k / 17k (~954k pairs) ~10 min (extrapolated) ~1.94M (extrapolated) 36.45 s 959

~1,900–2,000× fewer queries — the load-independent proof. The old path is
latency-bound on sequential round-trips; the new path issues a near-constant handful.

Verification

  • Functional test of the real edited function: direct + group-flattened pairs correct (with dedup), is_superuser/is_staff flags correct.
  • Idempotent on a second run (no duplicates, no error).

🤖 Generated with Claude Code

Replace the row-by-row get_or_create in backfill_authorized_users with a
set-based, in-memory-deduplicated, batched bulk_create, and add progress
logging.

The old backfill issued ~2 queries per membership pair, and the group
flattening ran a Dojo_Group_Member query per Product_Group /
Product_Type_Group row plus a get_or_create per member -- O(groups x
members) sequential round-trips. On a 15k-user / 17k-product instance this
was ~1.9M queries and took many hours, with no log output to show progress.

Now: one pass over Dojo_Group_Member into an in-memory group->users map
(reused by both the group-grant expansion and the Global_Role flag
updates), pairs deduplicated in a set, then bulk_create(batch_size=1000,
ignore_conflicts=True) per through table. ignore_conflicts + the through
table's unique constraint preserves the idempotency get_or_create provided.
logger.info now reports detection, per-batch progress, flag counts, and
completion.

Benchmarked against a 15k/17k dataset: ~1,900-2,000x fewer queries
(e.g. 175,438 -> 92 at 86k pairs); results verified identical and
idempotent on re-run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Maffooch Maffooch requested a review from mtesauro as a code owner June 18, 2026 18:22
@github-actions github-actions Bot added the New Migration Adding a new migration file. Take care when merging. label Jun 18, 2026

@mtesauro mtesauro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@Maffooch Maffooch added this to the 3.0.100 milestone Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New Migration Adding a new migration file. Take care when merging.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants