Skip to content

fix: Harden SessionManager + KeycloakSessionHandler against cross-DB JWT reuse, race conditions and stale role/session caches#601

Merged
yamelsenih merged 4 commits into
solop-develop:developfrom
EdwinBetanc0urt:bugfix/keycloak-jwt-session-hardening
Jun 2, 2026
Merged

fix: Harden SessionManager + KeycloakSessionHandler against cross-DB JWT reuse, race conditions and stale role/session caches#601
yamelsenih merged 4 commits into
solop-develop:developfrom
EdwinBetanc0urt:bugfix/keycloak-jwt-session-hardening

Conversation

@EdwinBetanc0urt
Copy link
Copy Markdown
Member

@EdwinBetanc0urt EdwinBetanc0urt commented May 15, 2026

fix: Harden SessionManager + KeycloakSessionHandler against cross-DB JWT reuse, race conditions and stale role/session caches

Brings into the library a bundle of hardening fixes that have been living as local overlays in adempiere-grpc-server, adempiere-report-engine-service and sabana for the past weeks. With this merge those consumers can drop their patched copies and inherit the same behaviour from the JAR.

The bundle closes four classes of problems reported under issue #2389 and its follow-ups.

SessionManager

JWT can no longer be reused across databases (opt-in)

  • HMAC signing key derivation: when ECA52_JWT_BIND_TO_DATABASE=Y is set in AD_SysConfig (default N), the signing key becomes HmacSHA256(configuredSecret, dbName) instead of the configured secret directly. Two installations sharing ECA52_JWT_SECRET_KEY no longer produce the same effective key.
  • New opaque tenant_id claim (SHA-256(dbName) in hex) emitted alongside the existing claims, also gated on the flag. getSessionFromToken rejects tokens whose tenant_id does not match the current installation before any DB access.
  • The flag is opt-in / default OFF so deployments with several services on different release cadences can adopt the new code without breaking JWT interop. Operators flip the flag in AD_SysConfig once every connected service is on the new build; existing JWTs in circulation get rejected after the flip, users re-log once.

Multi-tenant integrations can plug their own DB-name resolver

  • New public setDatabaseNameResolver(Supplier<String>). Single-tenant services keep the default (CConnection.get().getDbName()); multi-tenant services that route per request (e.g. Sabana) register a per-thread resolver that returns the current tenant's catalog. Avoids the JVM-wide singleton drifting under concurrent requests for different tenants.

Cross-validation of JWT claims against the AD_Session row

  • After loading the AD_Session from DB, getSessionFromToken checks:
    • JWT.AD_User_ID == session.getCreatedBy(),
    • JWT.AD_Client_ID == session.getAD_Client_ID(),
    • session.isProcessed() == false.
  • Mismatch or processed session → @Invalid@ @AD_Session_ID@. Blocks JWT replay after server-side logout and AD_Session_ID collisions across installations.
  • These checks are always on — no flag needed; they read claims that have always been in the JWT, so they interoperate with both old and new tokens.

New helper: createTokenFromSession(MSession, boolean)

  • Lets callers build the MSession externally (e.g. change-role / change-organization flows that need the instance before issuing the token) and mint the JWT for it without re-creating the session. createSessionAndGetToken now delegates to it — DRY.

KeycloakSessionHandler

Refuse rows whose WebSession doesn't belong to the Keycloak flow

  • loadExistingSessionContext rejects an AD_Session whose WebSession does not start with "keycloak:". Protects against AD_Session_ID collisions with rows created by other flows.

Force-reload of MRole when resolving a Keycloak session

  • After setting #AD_Role_ID / #AD_Client_ID / #AD_Org_ID on the context, evict the cached MRole and reload with reload=true. Rebuilds access SQL with the current tenant context — protects against stale role instances cached under a previous #AD_Client_ID.

Per-sid lock prevents duplicate AD_Session rows under concurrency

  • New ConcurrentMap<String, Object> sidLocks. The slow path (DB-fallback + create) runs inside synchronized (sidLocks.computeIfAbsent(sid, …)) with double-checked cache and DB lookup inside the lock. The winner creates; every loser observes the winner's session. The fast path (cache hit) stays outside the lock.

Downstream coordination

Each consumer (adempiere-grpc-server, adempiere-report-engine-service, sabana) currently ships a byte-identical copy of these two files in src/patches/.../authentication/ (or equivalent). Once this PR releases:

  1. Cut a new adempiere-base version and publish.
  2. Open follow-up PRs in each consumer that:
    • Bump the dependency.
    • Delete the local patched copies of SessionManager.java and KeycloakSessionHandler.java.
  3. Sabana's BaseInit.loadInitValues already calls SessionManager.setDatabaseNameResolver(...) — no change needed there once the new lib is on the classpath.

Until step 2 lands in each consumer, the local patches will continue to shadow the JAR; consumers running mixed versions will see no behavioural change but also no benefit from the new lib.

Test

  • Boot a single backend connected to one DB, log in, hit endpoints → expected: behaviour unchanged (flag default OFF).
  • Set ECA52_JWT_BIND_TO_DATABASE=Y in AD_SysConfig, restart → existing JWTs rejected (signature mismatch), new logins issue tokens with the tenant_id claim, cross-DB JWT replay attempts rejected with @Invalid@ @AD_Session_ID@.
  • N parallel requests with the same fresh Keycloak sid (no AD_Session yet) → exactly one row in AD_Session with WebSession='keycloak:<sid>' (was 1–N before).
  • After explicit logout, replay the same JWT → rejected.

Additional context

ref https://github.com/solop-develop/adempiere-solop/issues/2389
ref https://github.com/solop-develop/adempiere-solop/issues/1356

When two JVMs share the same database (e.g. `adempiere-grpc-server` + `adempiere-report-engine-service`), their in-memory caches diverge after writes performed by one of them: the second JVM keeps serving the previous state until TTL. This is most visible after `change-role` — reports run with the wrong tenant/org filter and return empty.

## Changes

- `MSession.afterSave(...)`: new override that refreshes `s_sessions` with the current instance on every successful save. Keeps the cache aligned with the row's actual state (including `Processed`) without an extra DB hit on read. Processed sessions stay visible — callers that need to reject them (e.g. `KeycloakSessionHandler.loadExistingSessionContext`) check `isProcessed()` themselves; callers that need them (audit log, changelog, keepalive on closing sessions) still get the row.
- `MSession.logout()`: dropped the redundant explicit `s_sessions.remove(...)` — `afterSave` already refreshes the entry after `setProcessed(true)` + `saveEx()`.
- `MRole.removeFromCache(int roleId, int userId)`: new static helper to evict a `(role, user)` cache entry so the next `get()` rebuilds the access SQL. Required when the user's effective tenant/org changes.

## Test

Login as System → change role to a company → run a report whose query goes through `MRole.addAccessSQL`. SQL filter must contain the company's `AD_Client_ID` / `AD_Org_ID`, not `0, 0`. Full cross-JVM coverage still requires a distributed cache (out of scope here).
Closes session-propagation gaps in `SessionManager` (JWT path) and `KeycloakSessionHandler` (Keycloak path) that allowed JWTs from one database to validate against another, stale sessions to be reused after logout, and stale roles to be applied to fresh requests.

## `SessionManager` (JWT path)

- Bind the HMAC signing key to the current DB via `HmacSHA256(configuredSecret, dbName)`. Two installations sharing the same `ECA52_JWT_SECRET_KEY` no longer produce the same effective key.
- New `tenant_id` claim (SHA-256 of the DB fingerprint); mismatch is rejected before any DB lookup.
- After loading `AD_Session`, cross-validate `JWT.AD_User_ID == session.getCreatedBy()` and `JWT.AD_Client_ID == session.getAD_Client_ID()`; reject on mismatch.
- Reject JWTs whose `AD_Session` is already `Processed='Y'` (server-side logout completed).

## `KeycloakSessionHandler`

- Refuse to reuse an `AD_Session` whose `WebSession` does not start with `"keycloak:"` — protects against collisions with sessions created by other flows.
- Force-reload `MRole` (`removeFromCache` + `get(..., reload=true)`) when resolving an existing session, so the access SQL is regenerated with the current tenant context.

## Test

- JWT issued against DB A presented to DB B → rejected with `@Invalid@ @AD_Session_ID@`.
- Replay a JWT after server-side logout → rejected.
- Change role under Keycloak → subsequent requests use the new role.
@EdwinBetanc0urt EdwinBetanc0urt marked this pull request as draft May 15, 2026 22:07
…JWT reuse, race conditions and stale role/session caches

Brings into the library a bundle of hardening fixes that have been living as local overlays in `adempiere-grpc-server`, `adempiere-report-engine-service` and `sabana` for the past weeks. With this merge those consumers can drop their patched copies and inherit the same behaviour from the JAR.

The bundle closes four classes of problems reported under issue #2389 and its follow-ups.

## `SessionManager`

### JWT can no longer be reused across databases (opt-in)

- HMAC signing key derivation: when `ECA52_JWT_BIND_TO_DATABASE=Y` is set in `AD_SysConfig` (default `N`), the signing key becomes `HmacSHA256(configuredSecret, dbName)` instead of the configured secret directly. Two installations sharing `ECA52_JWT_SECRET_KEY` no longer produce the same effective key.
- New opaque `tenant_id` claim (`SHA-256(dbName)` in hex) emitted alongside the existing claims, also gated on the flag. `getSessionFromToken` rejects tokens whose `tenant_id` does not match the current installation **before** any DB access.
- The flag is **opt-in / default OFF** so deployments with several services on different release cadences can adopt the new code without breaking JWT interop. Operators flip the flag in `AD_SysConfig` once every connected service is on the new build; existing JWTs in circulation get rejected after the flip, users re-log once.

### Multi-tenant integrations can plug their own DB-name resolver

- New public `setDatabaseNameResolver(Supplier<String>)`. Single-tenant services keep the default (`CConnection.get().getDbName()`); multi-tenant services that route per request (e.g. Sabana) register a per-thread resolver that returns the current tenant's catalog. Avoids the JVM-wide singleton drifting under concurrent requests for different tenants.

### Cross-validation of JWT claims against the AD_Session row

- After loading the `AD_Session` from DB, `getSessionFromToken` checks:
    - `JWT.AD_User_ID == session.getCreatedBy()`,
    - `JWT.AD_Client_ID == session.getAD_Client_ID()`,
    - `session.isProcessed() == false`.
- Mismatch or processed session → `@Invalid@ @AD_Session_ID@`. Blocks JWT replay after server-side logout and AD_Session_ID collisions across installations.
- These checks are **always on** — no flag needed; they read claims that have always been in the JWT, so they interoperate with both old and new tokens.

### New helper: `createTokenFromSession(MSession, boolean)`

- Lets callers build the `MSession` externally (e.g. change-role / change-organization flows that need the instance before issuing the token) and mint the JWT for it without re-creating the session. `createSessionAndGetToken` now delegates to it — DRY.

## `KeycloakSessionHandler`

### Refuse rows whose `WebSession` doesn't belong to the Keycloak flow

- `loadExistingSessionContext` rejects an `AD_Session` whose `WebSession` does not start with `"keycloak:"`. Protects against `AD_Session_ID` collisions with rows created by other flows.

### Force-reload of `MRole` when resolving a Keycloak session

- After setting `#AD_Role_ID` / `#AD_Client_ID` / `#AD_Org_ID` on the context, evict the cached `MRole` and reload with `reload=true`. Rebuilds access SQL with the current tenant context — protects against stale role instances cached under a previous `#AD_Client_ID`.

### Per-sid lock prevents duplicate `AD_Session` rows under concurrency

- New `ConcurrentMap<String, Object> sidLocks`. The slow path (DB-fallback + create) runs inside `synchronized (sidLocks.computeIfAbsent(sid, …))` with **double-checked cache and DB lookup** inside the lock. The winner creates; every loser observes the winner's session. The fast path (cache hit) stays outside the lock.

## Downstream coordination

Each consumer (`adempiere-grpc-server`, `adempiere-report-engine-service`, `sabana`) currently ships a byte-identical copy of these two files in `src/patches/.../authentication/` (or equivalent). Once this PR releases:

1. Cut a new `adempiere-base` version and publish.
2. Open follow-up PRs in each consumer that:
    - Bump the dependency.
    - **Delete** the local patched copies of `SessionManager.java` and `KeycloakSessionHandler.java`.
3. Sabana's `BaseInit.loadInitValues` already calls `SessionManager.setDatabaseNameResolver(...)` — no change needed there once the new lib is on the classpath.

Until step 2 lands in each consumer, the local patches will continue to shadow the JAR; consumers running mixed versions will see no behavioural change but also no benefit from the new lib.

## Test

- Boot a single backend connected to one DB, log in, hit endpoints → expected: behaviour unchanged (flag default OFF).
- Set `ECA52_JWT_BIND_TO_DATABASE=Y` in `AD_SysConfig`, restart → existing JWTs rejected (signature mismatch), new logins issue tokens with the `tenant_id` claim, cross-DB JWT replay attempts rejected with `@Invalid@ @AD_Session_ID@`.
- N parallel requests with the same fresh Keycloak `sid` (no `AD_Session` yet) → exactly **one** row in `AD_Session` with `WebSession='keycloak:<sid>'` (was 1–N before).
- After explicit logout, replay the same JWT → rejected.

### Additional context
ref solop-develop/adempiere-solop#2389
ref solop-develop/adempiere-solop#1356
@EdwinBetanc0urt EdwinBetanc0urt changed the title fix: Harden Keycloak/JWT session resolution fix: Harden SessionManager + KeycloakSessionHandler against cross-DB JWT reuse, race conditions and stale role/session caches May 21, 2026
@EdwinBetanc0urt EdwinBetanc0urt marked this pull request as ready for review June 2, 2026 18:38
@yamelsenih yamelsenih merged commit 1e237f7 into solop-develop:develop Jun 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants