Skip to content

API: add opt-in per-token auth identity cache#1719

Open
MrDirkelz wants to merge 3 commits into
mainfrom
1692-api-token-based-auth-caching-cache-resolved-identity-accessmap-per-token
Open

API: add opt-in per-token auth identity cache#1719
MrDirkelz wants to merge 3 commits into
mainfrom
1692-api-token-based-auth-caching-cache-resolved-identity-accessmap-per-token

Conversation

@MrDirkelz

@MrDirkelz MrDirkelz commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Every REST request through AuthGuard re-runs the full identity pipeline (RS256 verify + 3-4 Mango user lookups + a lastLogin write + accessMap projection). The socket handshake amortizes this per connection; the REST path does not, so a busy authenticated client hammers resolveIdentity and the DB.

Add IdentityCacheService, a transport-agnostic cache in front of AuthIdentityService.resolveOrDefault, used by both the REST guard and the socket handshake. It caches the resolved membership + identity keyed by a sha256 of the token and re-derives the accessMap live from PermissionSystem on each hit, so an ACL (groupUpdate) change needs no invalidation.

Design notes:

  • TTL = min(token exp, IDENTITY_CACHE_TTL_MS): a hit never serves a token past expiry; a different/expired token misses and is fully verified.
  • Bounded by a maxEntries cap + lazy/sweep eviction, no timers (BoundedTtlCache, reusing the strikeLimiter pattern).
  • Invalidate on User memberOf removal (PermissionChange DeleteCmd), AuthProvider/AutoGroupMappings changes, and DB disconnect. Not on groupUpdate (accessMap is re-projected) or plain User updates, so lastLogin writes never evict.
  • lastLogin now updates once per TTL per instance instead of per request.
  • Ships disabled (IDENTITY_CACHE_ENABLED=false); enable per environment.
  • Keeps all auth resolution behind one seam, easing a future SSE transport.

NOTE ON THE FOLLOWING DIAGRAM:
The accessMap is not cached in the current code, this part of the plan is just if we decide we want to deduplicate accessMaps that can be the same like super-admin access

flowchart TB
    A(["/query arrives with token"]) --> C{"IDENTITY<br>lookup token?"}
    C -- if found --> D["got mergedGroups<br>skip verify + DB + lastLogin"]
    C -- else --> E["resolveIdentity:<br>verify, DB lookups, lastLogin<br>store in identity cache under token"]
    D --> F["key2 = hash sorted mergedGroups"]
    E --> F
    F --> G{"Access map caching<br>lookup key2?"}
    G -- if found --> H["got shared accessMap<br>skip projection"]
    G -- else --> I["getAccessMap mergedGroups<br>store under key2<br>link key1 → key2"]
    H --> J["QueryService injects filters<br>run query"]
    I --> J
    J --> K(["return results"])
    E1["event: User.memberOf change"] --> A1@{ label: "Evict that identity's Layer I entry<br>NOTE: ignore lastLogin-only changes" }
    E2["event: AutoGroupMappings /<br>AuthProvider change"] --> A2["Full flush of Identity Cache"]
    E3["event: groupUpdate"] --> A3["Recompute each distinct projection<br>diff vs cached, update in place<br>linked tokens auto-update"]
    E4["event: DB disconnect"] --> A4["Clear BOTH caches"]
    A1 --> L1[("Identity Cache<br>token → membership")]
    A2 --> L1
    A4 --> L1 & L2[("Access map cache<br>membership-set → accessMap")]
    A3 --> L2

    A1@{ shape: rect}
Loading

Every REST request through AuthGuard re-runs the full identity pipeline
(RS256 verify + 3-4 Mango user lookups + a lastLogin write + accessMap
projection). The socket handshake amortizes this per connection; the REST
path does not, so a busy authenticated client hammers resolveIdentity and
the DB.

Add IdentityCacheService, a transport-agnostic cache in front of
AuthIdentityService.resolveOrDefault, used by both the REST guard and the
socket handshake. It caches the resolved membership + identity keyed by a
sha256 of the token and re-derives the accessMap live from PermissionSystem
on each hit, so an ACL (groupUpdate) change needs no invalidation.

Design notes:
- TTL = min(token exp, IDENTITY_CACHE_TTL_MS): a hit never serves a token
  past expiry; a different/expired token misses and is fully verified.
- Bounded by a maxEntries cap + lazy/sweep eviction, no timers
  (BoundedTtlCache, reusing the strikeLimiter pattern).
- Invalidate on User memberOf removal (PermissionChange DeleteCmd),
  AuthProvider/AutoGroupMappings changes, and DB disconnect. Not on
  groupUpdate (accessMap is re-projected) or plain User updates, so
  lastLogin writes never evict.
- lastLogin now updates once per TTL per instance instead of per request.
- Ships disabled (IDENTITY_CACHE_ENABLED=false); enable per environment.
- Keeps all auth resolution behind one seam, easing a future SSE transport.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@MrDirkelz

Copy link
Copy Markdown
Collaborator Author

@claude review this PR

@MrDirkelz MrDirkelz self-assigned this Jun 19, 2026
MrDirkelz and others added 2 commits June 19, 2026 10:30
Follow-up to the opt-in identity cache, addressing review findings:

- Group additions no longer lag: DbService.upsertDoc emits a write-side
  `permissionChange` event on any non-Group memberOf change (add or
  remove); IdentityCacheService clears on it for User docs. Pure additions
  produce no DeleteCmd, so the change feed alone couldn't distinguish them
  from a lastLogin-style update — the cache keeps ignoring plain User
  updates, so lastLogin writes never churn it.
- Close the user-deletion revocation gap: the Deleted DeleteCmd handler
  now also covers DocType.User, so a deleted user's still-valid JWT stops
  resolving from cache immediately instead of after the TTL.
- Make the clock consistent: IdentityCacheService takes one injectable
  `now`, shared with BoundedTtlCache and used in computeTtlMs (was
  Date.now() directly).
- Note in .env.example that membership/deletion invalidate promptly and
  residual staleness (IdP revocation, provider reassignment) is TTL-bounded.

Adds tests for permissionChange (User vs non-User), the Deleted-User
revocation, and injected-clock TTL/expiry. lastLogin write cadence is
intentionally left unchanged (write-on-miss) and decoupled from
invalidation, pending a separate decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the opt-in identity cache, addressing review findings:

- Group additions no longer lag: DbService.upsertDoc emits a write-side
  `permissionChange` event on any non-Group memberOf change (add or
  remove); IdentityCacheService clears on it for User docs. A pure
  addition produces no DeleteCmd, so the change feed alone couldn't
  distinguish it from a lastLogin-style update — the cache keeps ignoring
  plain User updates, so lastLogin writes never churn it.
- Close the user-deletion revocation gap: the Deleted DeleteCmd handler
  now also covers DocType.User, so a deleted user's still-valid JWT stops
  resolving from cache immediately instead of after the TTL.
- Make the clock consistent: IdentityCacheService takes one injectable
  `now`, shared with BoundedTtlCache and used in computeTtlMs (was
  Date.now() directly). Marked `@Optional()` so Nest DI leaves it
  unresolved and the Date.now default applies — without it the provider
  fails to instantiate (app boot and every DB-backed test module).
- Note in .env.example that membership/deletion invalidate promptly and
  residual staleness (IdP revocation, provider reassignment) is
  TTL-bounded.

Adds tests for permissionChange (User vs non-User), the Deleted-User
revocation, and injected-clock TTL/expiry. lastLogin write cadence is
intentionally left unchanged (write-on-miss) and decoupled from
invalidation, pending a separate decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Token-based auth caching (cache resolved identity + accessMap per token)

1 participant