-
-
Notifications
You must be signed in to change notification settings - Fork 1
perf(db): Phase 1 - indexes, batched cleanup, and distributed cleanup lock #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
appleboy
wants to merge
4
commits into
main
Choose a base branch
from
worktree-DB
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
6d71baf
perf(db): add Phase 1 indexes, batched cleanup, and distributed clean…
appleboy 3c74ea3
fix(db): align gorm index names with migration SQL and honor audit cl…
appleboy d8fe321
fix(db): guard cleanup batch helper against misconfiguration and docu…
appleboy ef52a80
fix(db): harden cleanup config validation and remove redundant index
appleboy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # AuthGate production environment template for multi-pod deployments | ||
| # (20k+ users, 5+ replicas, PostgreSQL + Redis). | ||
| # | ||
| # Copy to .env or inject via your secrets manager / Helm values, then override | ||
| # secrets (JWT_SECRET, SESSION_SECRET, DATABASE_DSN, REDIS_ADDR, etc.) to match | ||
| # your infrastructure. Adjust cache/pool sizes only after observing real | ||
| # traffic metrics (cache hit rate, DB CPU, connection count). | ||
| # ============================================================================= | ||
|
|
||
| ENVIRONMENT=production | ||
|
|
||
| # ---- Secrets (REQUIRED: generate fresh values with `openssl rand -hex 32`) --- | ||
| JWT_SECRET=CHANGE_ME | ||
| SESSION_SECRET=CHANGE_ME | ||
|
|
||
| # ---- Database --------------------------------------------------------------- | ||
| DATABASE_DRIVER=postgres | ||
| DATABASE_DSN=host=postgres user=authgate password=CHANGE_ME dbname=authgate port=5432 sslmode=require | ||
|
|
||
| # Connection pool: 5 pods × 25 conns = 125; ensure PG max_connections >= 200. | ||
| DB_MAX_OPEN_CONNS=25 | ||
| DB_MAX_IDLE_CONNS=10 | ||
| DB_CONN_MAX_LIFETIME=5m | ||
| DB_CONN_MAX_IDLE_TIME=10m | ||
|
|
||
| # ---- Redis (shared cache + rate limit + cleanup lock) ----------------------- | ||
| REDIS_ADDR=redis:6379 | ||
| # REDIS_PASSWORD= | ||
| # REDIS_DB=0 | ||
|
|
||
| # ---- Token verification cache (major DB-load reducer) ----------------------- | ||
| # Off by default in .env.example; production should enable this. | ||
| TOKEN_CACHE_ENABLED=true | ||
| TOKEN_CACHE_TYPE=redis-aside | ||
| TOKEN_CACHE_TTL=10h | ||
| TOKEN_CACHE_CLIENT_TTL=1h | ||
| # If pod memory is tight, drop this to 16 (MB) per connection. | ||
| TOKEN_CACHE_SIZE_PER_CONN=32 | ||
|
|
||
| # ---- Client / User / Metrics cache (shared across pods) --------------------- | ||
| CLIENT_CACHE_TYPE=redis-aside | ||
| CLIENT_COUNT_CACHE_TYPE=redis-aside | ||
| USER_CACHE_TYPE=redis-aside | ||
| METRICS_CACHE_TYPE=redis-aside | ||
|
|
||
| # ---- Expired token / device code cleanup ------------------------------------ | ||
| # All pods may enable this: a Redis-backed distributed lock (below) prevents | ||
| # concurrent runs — only one pod does the DELETE each interval. | ||
| ENABLE_EXPIRED_TOKEN_CLEANUP=true | ||
| EXPIRED_TOKEN_CLEANUP_INTERVAL=30m | ||
|
|
||
| # Distributed cleanup lock via rueidislock. Required for multi-pod. | ||
| ENABLE_CLEANUP_LOCK=true | ||
| CLEANUP_LOCK_KEY_VALIDITY=5m | ||
|
|
||
| # ---- Audit logging ---------------------------------------------------------- | ||
| ENABLE_AUDIT_LOGGING=true | ||
| AUDIT_LOG_RETENTION=2160h # 90 days | ||
|
|
||
| # ---- Rate limiting (distributed) -------------------------------------------- | ||
| ENABLE_RATE_LIMIT=true | ||
| RATE_LIMIT_STORE=redis | ||
|
|
||
| # ---- Metrics ---------------------------------------------------------------- | ||
| METRICS_ENABLED=true | ||
| # Gauge updates query global counts; if every pod runs them you get duplicated | ||
| # values across the fleet. Default this template to false so copying the file | ||
| # into all pods is safe. On ONE dedicated replica set METRICS_GAUGE_UPDATE_ENABLED=true | ||
| # (or set it true on all pods and aggregate with avg()/max() in PromQL). | ||
| METRICS_GAUGE_UPDATE_ENABLED=false | ||
|
|
||
| # ---- Sessions --------------------------------------------------------------- | ||
| SESSION_FINGERPRINT=true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| package bootstrap | ||
|
|
||
| import ( | ||
| "context" | ||
| "errors" | ||
| "fmt" | ||
| "log" | ||
|
|
||
| "github.com/redis/rueidis" | ||
| "github.com/redis/rueidis/rueidislock" | ||
|
|
||
| "github.com/go-authgate/authgate/internal/config" | ||
| ) | ||
|
|
||
| // Lock names for the distributed cleanup jobs. Keep in sync with docs/runbooks | ||
| // that may reference these keys for debugging stuck cleanups. | ||
| const ( | ||
| cleanupLockAuditLogs = "cleanup:audit-logs" | ||
| cleanupLockExpiredTokens = "cleanup:expired-tokens" | ||
| ) | ||
|
|
||
| // initializeCleanupLocker builds a Redis-backed distributed locker that | ||
| // serializes periodic cleanup jobs across multi-pod deployments. Returns | ||
| // (nil, nil) when cleanup lock is disabled; callers treat a nil locker as | ||
| // "run unconditionally" (single-instance mode). | ||
| // | ||
| // KeyMajority is 1 (single Redis target) rather than a Redlock quorum. A | ||
| // Redis failover window could allow two pods to hold the lock simultaneously, | ||
| // but cleanup DELETEs are idempotent (the inner SELECT finds no matching rows | ||
| // on the second pod), so this is safe — the worst case is transient double | ||
| // work, never data loss or corruption. | ||
| func initializeCleanupLocker(cfg *config.Config) (rueidislock.Locker, error) { | ||
| if !cfg.EnableCleanupLock { | ||
| return nil, nil //nolint:nilnil // locker not needed when feature is disabled | ||
| } | ||
| if cfg.RedisAddr == "" { | ||
| return nil, errors.New("ENABLE_CLEANUP_LOCK requires REDIS_ADDR to be set") | ||
| } | ||
|
|
||
| locker, err := rueidislock.NewLocker(rueidislock.LockerOption{ | ||
| ClientOption: rueidis.ClientOption{ | ||
| InitAddress: []string{cfg.RedisAddr}, | ||
| Password: cfg.RedisPassword, | ||
| SelectDB: cfg.RedisDB, | ||
| }, | ||
| KeyPrefix: "authgate:lock", | ||
| KeyMajority: 1, | ||
| KeyValidity: cfg.CleanupLockKeyValidity, | ||
| }) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create cleanup locker: %w", err) | ||
| } | ||
|
|
||
| log.Printf("Cleanup lock initialized (validity: %v)", cfg.CleanupLockKeyValidity) | ||
| return locker, nil | ||
| } | ||
|
|
||
| // runWithCleanupLock executes fn while holding the named distributed lock. | ||
| // When locker is nil (single-instance mode) fn runs unconditionally. | ||
| // When another pod currently holds the lock, fn is skipped silently and | ||
| // nil is returned — the next tick will try again. | ||
| func runWithCleanupLock( | ||
| ctx context.Context, | ||
| locker rueidislock.Locker, | ||
| name string, | ||
| fn func(context.Context) error, | ||
| ) error { | ||
| if locker == nil { | ||
| return fn(ctx) | ||
| } | ||
| lockCtx, cancel, err := locker.TryWithContext(ctx, name) | ||
| if err != nil { | ||
| if errors.Is(err, rueidislock.ErrNotLocked) { | ||
| return nil | ||
| } | ||
| return fmt.Errorf("acquire cleanup lock %q: %w", name, err) | ||
| } | ||
| defer cancel() | ||
| return fn(lockCtx) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.