Skip to content

pgconn: add optional SCRAM-SHA-256 SaltedPassword cache#2523

Open
mattrobenolt wants to merge 1 commit intojackc:masterfrom
mattrobenolt:master
Open

pgconn: add optional SCRAM-SHA-256 SaltedPassword cache#2523
mattrobenolt wants to merge 1 commit intojackc:masterfrom
mattrobenolt:master

Conversation

@mattrobenolt
Copy link
Copy Markdown

I wasn't sure if this warranted an issue or discussion first, but I was writing this as an experiment before I considered upstreaming, so here we are.

I'm happy to discuss through anything here as far as implementation details or how this can best be leveraged.

My intention is that this is disabled by default, documented on how to enable this performance boost.

Eventually I'd like this to be something that is capable of being graduated to a default on behavior since there's near no downsides, only positives IMO.

One thing to note about this, is there really is not much of a precedent of this in the Postgres ecosystem, so I think this is something worth being able of setting a precedent. This is a free win for client side connection poolers specifically.

Thanks!

RFC 5802 §5.1 and RFC 7677 §4 both permit caching derived SCRAM
key material to avoid repeated PBKDF2 computation:

  RFC 5802 §5.1:
    "a client implementation MAY cache ClientKey&ServerKey
     (or just SaltedPassword) for later reauthentication to
     the same service, as it is likely that the server is
     going to advertise the same salt value upon
     reauthentication."

  RFC 7677 §4:
    "This computational cost can be avoided by caching the
     ClientKey (assuming the Salt and hash iteration-count
     is stable)."

Add a ScramDeriveCache interface on Config that, when set,
caches the 32-byte SaltedPassword keyed by an opaque SHA-256
fingerprint of (password, salt, iterations). On subsequent
connections with the same verifier, PBKDF2 is skipped entirely.

Cache entries are invalidated on authentication failure when
cached material was used, so a password change or salt rotation
cannot produce stale hits.

Includes SimpleScramDeriveCache, a mutex+map implementation
suitable for most use cases. The interface allows callers to
provide their own LRU or sharded implementations.
@jackc
Copy link
Copy Markdown
Owner

jackc commented Mar 21, 2026

This is getting beyond my expertise. But from what I could gather from the code, this caches something to establish connections faster?

  • Have you quantified the improvement?
  • Is there precedent for this approach outside of PostgreSQL?
  • What are the security implications?

@mattrobenolt
Copy link
Copy Markdown
Author

Yes, great questions!

Have you quantified the improvement?

Yes, I can share some benchmarks, but they are highly dependent on a lot of variables, including the number of iterations in the PBKDF2.

Is there precedent for this approach outside of PostgreSQL?

I'm not sure honestly if any other real uses outside of Postgres, but my focus has been improving this for Postgres specifically, and it's in the RFC that you can do this.

What are the security implications?

None, this is considered safe in the RFC.

I explained more of this in my commit message, not in the PR body, including citing the appropriate RFC sections.

But tl;dr, there is a plaintext password the user supplies in their config. This is already known by pgx and in memory.

The material we are caching is called the SaltedPassword.

To derive the salted password, you combine the plaintext password with the server salt + iteration count.

This derivation part is computationally expensive intentionally. In the sense that bcrypt or scrypt bake in iterations of a derivation function to mitigate brute forcing.

The salt and iteration count is public information. This analogy is not correct, but akin to a public key exchange. It's a constant and does not change for the same server.

The expensive part of this is calculating the SaltedPassword from the plaintext password + the salt and iteration count. Thus the 3-tuple of (password, salt, iteration) is the unique bits.

From the SaltedPassword, two bit of cryptographic material are then derived, the ClientKey and ServerKey, but those are fast/cheap to calculate, and are also simple constants derived from the SaltedPassword.

So in this context, choosing to just cache the intermediate SaltedPassword instead of the Client/ServerKey.

@mattrobenolt
Copy link
Copy Markdown
Author

Anything I can do here to help push this along? Happy to work through all the CI stuff, but mostly want to get agreement on the implementation, and I want you to be comfortable knowing how this works.

@abrightwell
Copy link
Copy Markdown
Contributor

Overall, it makes sense to me and seems like a valid approach. I especially like having an interface for it too, +1.

What are the security implications?

None, this is considered safe in the RFC.

The RFC explicitly states it's acceptable to cache, but it doesn't directly address security concerns. Though, with that said, as you point out, I don't think this is any different than storing the Password in the config. I do, however, think it would be good to perhaps document the sensitive nature of it though?

Have you quantified the improvement?

Yes, I can share some benchmarks, but they are highly dependent on a lot of variables, including the number of iterations in the PBKDF2.

Perhaps a simple comparison with iterations like: 4096 (obviously), 10k, 100k, 600k[1] would suffice? I think that would be the key variable given it's the motivation for having such a cache?

Is there precedent for this approach outside of PostgreSQL?

I'm not sure honestly if any other real uses outside of Postgres, but my focus has been improving this for Postgres specifically, and it's in the RFC that you can do this.

My understanding is that there a few places that utilize such a cache. Based on a quick search, the most prevalent seems to be the MongoDB driver[2] but it is also supported by xdg/go-scram[3] and some others. Granted it doesn't seem like it's widely implemented, but it definitely has precedent. And I could imagine at scale it could potentially have a dramatic impact.

I am curious about the SimpleScramDeriveCache implementation. Specifically, is a map implementation potentially overkill? It seems that the majority use case would be a single cached value, right?

Looking at the Mongo implementation[4] which take a single slot approach, it perhaps seems reasonable keeping it truly 'simple' and leaving more complex implementations through the interface to those that need it?

[1] https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2
[2] https://github.com/mongodb/mongo-java-driver/blob/8731b19d932c4ae4fc7c61996c3a8226f0d5caf5/driver-core/src/main/com/mongodb/internal/connection/ScramShaAuthenticator.java#L236-L244
[3] https://github.com/xdg-go/scram/blob/b6d6a0b27c123984bef7d14cdb7f487bdbdd68d2/client.go#L46
[4] https://github.com/mongodb/mongo-java-driver/blob/8731b19d932c4ae4fc7c61996c3a8226f0d5caf5/driver-core/src/main/com/mongodb/internal/connection/MongoCredentialWithCache.java#L87-L111

@jackc
Copy link
Copy Markdown
Owner

jackc commented Mar 28, 2026

I've now had a chance to review this more carefully. I'm satisfied with the security implications (or lack thereof) and am in favor of the feature.

However, I have a few thoughts on the implementation.

  1. Since there is apparently no negative security implication I don't see why this would ever be disabled. Or configured for that matter. Could this be entirely internal?
  2. The cache can grow without bound. I think there should be some sort of bound and eviction policy on the cache, especially if this feature is always on as suggested above.
  3. Why is iterations changed from int to uint64? It needs to be converted back to int for the call to pbkdf2.Key? This seems like it could cause unexpected behavior with a value that caused an overflow.

@mattrobenolt
Copy link
Copy Markdown
Author

Perhaps a simple comparison with iterations like: 4096 (obviously), 10k, 100k, 600k[1] would suffice? I think that would be the key variable given it's the motivation for having such a cache?

Yeah, I'll write some up.

I am curious about the SimpleScramDeriveCache implementation. Specifically, is a map implementation potentially overkill? It seems that the majority use case would be a single cached value, right?

Likely, but that's part of what I wanted to bring up for discussion and why I left this as an interface, I wanted this to just serve as an example. I intended that a more robust LRU would be ideal, but didn't want to do that as a part of adding the ability to cache SCRAM material.

Looking at the Mongo implementation[4] which take a single slot approach, it perhaps seems reasonable keeping it truly 'simple' and leaving more complex implementations through the interface to those that need it?

I think this is very fair. I do think multiple is pretty common within a single applicable, at least in Postgres. I have next to 0 context about how this is used in the Mongo community or if that was based on any measurements. My bias would be towards some small array even, that just gives a history of N of them, just so it's not unbounded in pathological case.

Since there is apparently no negative security implication I don't see why this would ever be disabled. Or configured for that matter. Could this be entirely internal?

Yes, I didn't want to do that for my initial proposal since I wanted to show that it could be opt-in and could be configurable if there was anything against it, but I'd be down to make it totally internal if we agree on the implementation for that.

The cache can grow without bound. I think there should be some sort of bound and eviction policy on the cache, especially if this feature is always on as suggested above.

yes, see replies above

Why is iterations changed from int to uint64? It needs to be converted back to int for the call to pbkdf2.Key? This seems like it could cause unexpected behavior with a value that caused an overflow.

I personally prefer working with correctly sized types, and also uint better represents something that can't go negative. I'm fine going back to an int, but that is a bit of a stretch to suggest unexpected behavior with an overflow. An iteration count of whatever would overflow a system specific int32 size would be insane and just not work in practice anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants