perf(expand): prefetch descendant grants to eliminate per-principal SQL queries by arreyder · Pull Request #863 · ConductorOne/baton-sdk

arreyder · 2026-05-24T02:54:51Z

Summary

Replace O(N) per-principal ListGrantsForEntitlement SQL queries in runAction with a single bulk pre-fetch into an in-memory map
For a large tenant (180K users, 139K groups, 72 GB c1z), a single expansion action was taking 38 minutes — 180K individual SQLite queries through pure-Go modernc.org/sqlite
After: one paginated scan of the descendant entitlement + O(1) map lookups per source grant → expected seconds instead of minutes

Problem

Profile evidence from be-temporal-sync prod (1h aggregate):

100% of CPU in dotc1z.listGrantsGeneric → ListGrantsForEntitlement → database/sql.Rows.Next → modernc.org/sqlite._sqlite3VdbeExec
Block profile: 0 blocked samples — CPU-bound, not I/O-bound
277 pending grant-expansion actions × ~38 min each = days to complete one sync

The inner loop in runAction (previously lines 278-350) called ListGrantsForEntitlement with a specific PrincipalId for each source grant. Even though the compound index (entitlement_id, principal_resource_type_id, principal_resource_id) makes each a point lookup returning 0-1 rows, the per-query overhead in pure-Go SQLite (~0.1-1ms per call) compounds to 38 minutes at 180K source grants.

Fix

Pre-fetch ALL grants on the descendant entitlement into map[principalKey][]*v2.Grant before entering the source grants loop. Each source grant then does a map lookup (O(1)) instead of a SQL query.

The pre-fetch is per-runAction call, not cached across actions. This ensures correctness in diamond graphs: if action A→C creates grants that action B→C must see, the B→C call does a fresh pre-fetch that includes the writes from A→C.

Memory tradeoff

For the worst case (descendant with 180K grants × ~2KB each = ~360MB peak), this is within budget for worker pods (32GB RAM). Most entitlements are much smaller.

Test plan

go test ./pkg/sync/expand/... — all pass (including diamond graph, multi-level, mixed directness, cycle tests)
go test ./pkg/sync/... — all pass
go test -short ./pkg/dotc1z/... — all pass
go test -bench=BenchmarkExpand ./pkg/sync/expand/... — 70% improvement

Benchmark

                              main (baseline)      branch (prefetch)     delta
BenchmarkExpandSmall            89.0s                26.4s              -70.3%
BenchmarkExpandSmallMedium     113.5s                34.4s              -69.7%
allocs/op (Small)              299M                 129M                -57.0%
B/op (Small)                  14.8 GB               7.2 GB             -51.4%

🤖 Generated with Claude Code

…QL queries The grant expansion inner loop previously issued one ListGrantsForEntitlement SQL query per source grant to check whether the principal already holds a grant on the descendant entitlement. For large tenants (Eli Lilly: 180K users, 139K groups), a single expansion action issued 180K individual SQLite queries — each cheap in isolation but taking 38 minutes total due to pure-Go SQLite per-query overhead (modernc.org/sqlite). Replace the per-principal inner loop with a single bulk pre-fetch of all grants on the descendant entitlement into an in-memory map keyed by (principal_resource_type_id, principal_resource_id). The source grant loop then does O(1) map lookups instead of O(N) SQL queries. Expected impact for Lilly's GLB Entra connector: Before: 180K SQL queries per action → ~38 minutes After: 1 full scan of descendant (paginated) + map lookups → seconds The pre-fetch is per-runAction call (not cached across actions) to ensure correctness in diamond graphs where action A→C creates grants that action B→C must see as pre-existing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions · 2026-05-24T02:58:21Z

+		descendantGrants := descendantByPrincipal[key]

-			resp, err := e.store.ListGrantsForEntitlement(ctx, req)
+		if len(descendantGrants) == 0 {
+			descendantGrant, err := newExpandedGrant(descendantEntitlement.GetEntitlement(), sourceGrant.GetPrincipal(), action.SourceEntitlementID, isSourceDirect)
 			if err != nil {
-				l.Error("runAction: error fetching descendant grants", zap.Error(err))
-				return "", fmt.Errorf("runAction: error fetching descendant grants: %w", err)
+				l.Error("runAction: error creating new grant", zap.Error(err))
+				return "", fmt.Errorf("runAction: error creating new grant: %w", err)
 			}
-			descendantGrants := resp.GetList()
-
-			// If we have no grants for the principal in the descendant entitlement, make one.
-			if pageToken == "" && resp.GetNextPageToken() == "" && len(descendantGrants) == 0 {
-				descendantGrant, err := newExpandedGrant(descendantEntitlement.GetEntitlement(), sourceGrant.GetPrincipal(), action.SourceEntitlementID, isSourceDirect)
-				if err != nil {
-					l.Error("runAction: error creating new grant", zap.Error(err))
-					return "", fmt.Errorf("runAction: error creating new grant: %w", err)
-				}
-				newGrants = append(newGrants, descendantGrant)
-				newGrants, err = PutGrantsInChunks(ctx, e.store, newGrants, 10000)
-				if err != nil {
-					l.Error("runAction: error updating descendant grants", zap.Error(err))
-					return "", fmt.Errorf("runAction: error updating descendant grants: %w", err)
-				}
-				break
-			}
-
-			// Add the source entitlement as a source to all descendant grants.
-			grantsToUpdate := make([]*v2.Grant, 0)
+			newGrants = append(newGrants, descendantGrant)


🟡 Suggestion: The descendantByPrincipal map is built once before the loop but never updated after creating new grants. If two source grants in the same page share the same principal and no pre-existing descendant grant exists, the second iteration will also see len(descendantGrants) == 0 and create a duplicate grant (same deterministic ID) instead of updating the first grant's sources. Consider inserting the newly created grant into the map:

Suggested change

descendantGrants := descendantByPrincipal[key]

resp, err := e.store.ListGrantsForEntitlement(ctx, req)

if len(descendantGrants) == 0 {

descendantGrant, err := newExpandedGrant(descendantEntitlement.GetEntitlement(), sourceGrant.GetPrincipal(), action.SourceEntitlementID, isSourceDirect)

if err != nil {

l.Error("runAction: error fetching descendant grants", zap.Error(err))

return "", fmt.Errorf("runAction: error fetching descendant grants: %w", err)

l.Error("runAction: error creating new grant", zap.Error(err))

return "", fmt.Errorf("runAction: error creating new grant: %w", err)

}

descendantGrants := resp.GetList()

// If we have no grants for the principal in the descendant entitlement, make one.

if pageToken == "" && resp.GetNextPageToken() == "" && len(descendantGrants) == 0 {

descendantGrant, err := newExpandedGrant(descendantEntitlement.GetEntitlement(), sourceGrant.GetPrincipal(), action.SourceEntitlementID, isSourceDirect)

if err != nil {

l.Error("runAction: error creating new grant", zap.Error(err))

return "", fmt.Errorf("runAction: error creating new grant: %w", err)

}

newGrants = append(newGrants, descendantGrant)

newGrants, err = PutGrantsInChunks(ctx, e.store, newGrants, 10000)

if err != nil {

l.Error("runAction: error updating descendant grants", zap.Error(err))

return "", fmt.Errorf("runAction: error updating descendant grants: %w", err)

}

break

}

// Add the source entitlement as a source to all descendant grants.

grantsToUpdate := make([]*v2.Grant, 0)

newGrants = append(newGrants, descendantGrant)

descendantGrant, err := newExpandedGrant(descendantEntitlement.GetEntitlement(), sourceGrant.GetPrincipal(), action.SourceEntitlementID, isSourceDirect)

if err != nil {

l.Error("runAction: error creating new grant", zap.Error(err))

return "", fmt.Errorf("runAction: error creating new grant: %w", err)

}

newGrants = append(newGrants, descendantGrant)

descendantByPrincipal[key] = append(descendantByPrincipal[key], descendantGrant)

github-actions · 2026-05-24T02:58:39Z

General PR Review: perf(expand): prefetch descendant grants to eliminate per-principal SQL queries

Blocking Issues: 0 | Suggestions: 0 | Threads Resolved: 0
Review mode: incremental since 3627df6
View review run

Review Summary

The new commit removes the Expander-level descendant grant cache (prefetchedDescendantID / prefetchedDescendants fields) and converts the cached getDescendantGrants method into a stateless prefetchDescendantGrants function that takes the store as a parameter. This directly addresses the previous stale-cache suggestion: since there is no longer any caching, each runAction invocation does a fresh prefetch from the store, eliminating the risk of cross-page duplicate grants. The change also aligns with alan-lee-12's observation that production creates a new Expander per RunSingleStep, meaning the instance-level cache was never actually reused across pages in production. No new issues found.

Security Issues

None found.

Correctness Issues

None found.

Suggestions

None.

github-actions

No blocking issues found.

…principal guard Addresses code review findings on PR #863: 1. Cache descendant prefetch across pages of the same action (High): the same descendant entitlement is now fetched once and reused for all source-grant pages within a single action. Cache invalidated when the action completes and moves to the next edge. 2. Context cancellation check in prefetch loop (Medium): ctx.Err() is checked at the top of each page iteration so the loop exits promptly on timeout or shutdown. 3. Page-count safety valve (Critical → defense-in-depth): maxPrefetchPages (10,000) bounds the loop against infinite pagination from a buggy store. At 10K pages × 10K page size = 100M grants, well beyond any realistic entitlement. 4. Nil-principal skip (Low → defensive): source grants with nil principals are skipped rather than producing a key collision on "\x00". Matches newExpandedGrant's nil rejection. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions · 2026-05-24T03:07:01Z

+	if e.prefetchedDescendantID == entID && e.prefetchedDescendants != nil {
+		return e.prefetchedDescendants, nil


🟡 Suggestion: Cross-page stale cache — RunSingleStep calls runAction repeatedly (once per source-grants page) for the same action, and the cache is only cleared when the action completes. Grants created by page N's PutGrantsInChunks are written to the store but invisible to page N+1 because the cached map is returned here. If a principal appears in source grants across two pages, page N+1 will see an empty slice and create a duplicate expanded grant instead of updating the one from page N. The pre-caching code re-fetched from the DB each page, so it saw prior writes. Consider invalidating the cache after PutGrantsInChunks flushes (e.g. e.prefetchedDescendants = nil), or inserting newly created grants into the cached map.

github-actions

No blocking issues found.

arreyder · 2026-05-24T03:26:41Z

Benchmark results

Ran on the existing BenchmarkExpand{Small,SmallMedium} fixtures (232-node graph, 787 edges, real c1z file). These use full-blob mode (slim writer OFF) — same configuration as Lilly's prod connector.

goos: linux
goarch: arm64
cpu: Neoverse-V2

                              main (baseline)      branch (prefetch)     delta
─────────────────────────────────────────────────────────────────────────────────
BenchmarkExpandSmall            89.0s                26.4s              -70.3%
BenchmarkExpandSmallMedium     113.5s                34.4s              -69.7%

                              main                 branch               delta
─────────────────────────────────────────────────────────────────────────────────
allocs/op (Small)              299M                 129M                -57.0%
allocs/op (SmallMedium)        384M                 170M                -55.8%
B/op (Small)                  14.8 GB               7.2 GB             -51.4%
B/op (SmallMedium)            19.0 GB               9.5 GB             -50.0%

70% wall-clock improvement, 57% fewer allocations, 51% less memory allocated.

The benchmark fixture is a 232-node graph — much smaller than Lilly's 139K-group tenant. The improvement scales with graph density: the more grants per entitlement, the more per-principal SQL queries the old code issued. Lilly's 180K-member "all employees" group (the action that takes 38 minutes today) should see an even larger relative speedup because:

The fixed per-query overhead (statement prepare, cursor open/close) dominates at high N
The prefetch cache (getDescendantGrants) amortizes across all 18 pages of source grants within a single action

alan-lee-12 · 2026-05-24T04:13:37Z

re. the benchmark above:

my 🤖 pointed out that it used one Expander for the whole run, while production syncer uses a new Expander per RunSingleStep in syncer.go (line 2555), so the cache does not persist across pages.

Since each source-grant page is one step, production will prefetch the descendant
entitlement once per source page, not once per action.

The changes in this PR may still be much faster than 180,000 separate point queries.
But it means the claimed performance model is not quite true in production, and the
benchmark is likely too optimistic because it uses expander.Run(ctx), which keeps the
same Expander alive across pages.

I’d make the cache lifetime explicit and owned by the syncer, not by a throwaway Expander.

kans · 2026-05-24T04:23:02Z

+
+	result := make(map[string][]*v2.Grant)
+	pageToken := ""
+	for page := 0; page < maxPrefetchPages; page++ {


What happens if we hit maxPrefetchPages?

Addresses alan-lee-12's review: the syncer creates a new Expander per RunSingleStep (syncer.go:2555), so struct-level cache fields don't persist across source-grant pages in production. The benchmark (expander.Run) was overly optimistic because it reuses one Expander. Remove the prefetchedDescendantID/prefetchedDescendants fields and the cache-invalidation logic. The core optimization is unchanged: each runAction call does ONE bulk scan of the descendant entitlement (prefetchDescendantGrants) instead of N per-principal SQL queries. This is the source of the 70% benchmark win — the cross-page cache was bonus amortization that only helped the Run() path. Production behavior per page: Before: 10K individual point-lookup SQL queries After: 1 bulk scan + map lookups The cross-page staleness concern is also resolved: since there's no cache, each page always sees the latest store state. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions

No blocking issues found.

arreyder requested a review from a team May 24, 2026 02:54

github-actions Bot reviewed May 24, 2026

View reviewed changes

arreyder mentioned this pull request May 24, 2026

perf(dotc1z): add prepared statement cache for repeated SQL queries #864

Open

3 tasks

kans reviewed May 24, 2026

View reviewed changes

github-actions Bot reviewed May 24, 2026

View reviewed changes

btipling mentioned this pull request May 24, 2026

ci(pr-leakage): add reusable workflow that scans PRs for customer-data leaks ConductorOne/github-workflows#85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(expand): prefetch descendant grants to eliminate per-principal SQL queries#863

perf(expand): prefetch descendant grants to eliminate per-principal SQL queries#863
arreyder wants to merge 3 commits into
mainfrom
arreyder/perf-prefetch-descendant-grants

arreyder commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

arreyder commented May 24, 2026

Uh oh!

alan-lee-12 commented May 24, 2026

Uh oh!

kans May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if e.prefetchedDescendantID == entID && e.prefetchedDescendants != nil {
		return e.prefetchedDescendants, nil

Conversation

arreyder commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Memory tradeoff

Test plan

Benchmark

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

General PR Review: perf(expand): prefetch descendant grants to eliminate per-principal SQL queries

Review Summary

Security Issues

Correctness Issues

Suggestions

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

arreyder commented May 24, 2026

Benchmark results

Uh oh!

alan-lee-12 commented May 24, 2026

Uh oh!

kans May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arreyder commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading