Skip to content

Make password history replication-aware via a custom WAL-rmgr#68

Merged
darold merged 1 commit into
HexaCluster:masterfrom
Giperboloid:replication_aware_pgph
May 20, 2026
Merged

Make password history replication-aware via a custom WAL-rmgr#68
darold merged 1 commit into
HexaCluster:masterfrom
Giperboloid:replication_aware_pgph

Conversation

@Giperboloid

Copy link
Copy Markdown
Contributor

Problem --
The password history file is credcheck's persistent record of previously used password hashes. Before this patch, every write to it was a direct local file operation with no WAL coverage. Standbys never received those changes — meaning that in any streaming-replication setup, the password history was silently inconsistent between the primary and its replicas. After a failover, the promoted standby would enforce a completely different (or empty) password history, undermining the entire point of the password_reuse_history / password_reuse_interval policy.

Approach --
PostgreSQL 15 introduced the Custom WAL Resource Manager API (RegisterCustomRmgr), which allows extensions to plug into the WAL machinery with their own record types and redo routines. This patch uses that API to emit a WAL record for every mutation of the in-memory password history hash, so that standbys can replay those mutations and stay in sync.

The feature is compiled only on PG ≥ 15 and is entirely gated by #if PG_VERSION_NUM >= 150000. On older versions the code builds and behaves exactly as before.

Implementation details --
A new resource manager credcheck_rmgr is registered under RM_EXPERIMENTAL_ID — a stable ID will be requested from the PostgreSQL project when the patch matures. It exposes the three required callbacks: credcheck_rmgr_redo for replaying records during recovery, credcheck_rmgr_desc for producing human-readable descriptions for pg_waldump, and credcheck_rmgr_identify for mapping opcodes to symbolic names.

Six WAL record types cover every operation that mutates the history. XLOG_CREDCHECK_PWD_ADD is emitted when a new password hash is appended on CREATE/ALTER ROLE ... PASSWORD. XLOG_CREDCHECK_PWD_REMOVE fires when the oldest entry is evicted to stay within the history limit. XLOG_CREDCHECK_PWD_REMOVE_USER covers DROP ROLE, removing all entries for a given role at once. XLOG_CREDCHECK_PWD_RENAME handles ALTER ROLE ... RENAME TO. XLOG_CREDCHECK_PWD_RESET is written when history is truncated via pg_password_history_reset(), optionally scoped to one user. Finally, XLOG_CREDCHECK_PWD_TIMESTAMP is emitted when a password date is updated during password_reuse_interval enforcement.

RESET and TIMESTAMP records are immediately followed by XLogFlush() to ensure the standby has received them before the originating transaction is considered complete. The remaining record types rely on normal WAL flushing behavior, which is sufficient for history consistency.

credcheck_rmgr_redo() mirrors the primary's in-memory hash operations: it inserts, removes, renames, or resets entries in pgph_hash and then calls the existing pgph_write() helper to persist the result to history file on the standby's data directory.

One operational requirement: credcheck must be listed in shared_preload_libraries on the standby so that RegisterCustomRmgr() is called before recovery begins reading WAL records with RM_CREDCHECK_ID. This is the standard requirement for any extension using Custom WAL RMgrs

This patch introduces replication support only. The underlying extension logic still does not provide transactional behavior for password history writes.

Testing --
A new TAP 001_history_replication.pl tests the full end-to-end replication path. It sets up a primary and streaming standby pair with credcheck preloaded on both, then verifies that credcheck appears in pg_get_wal_resource_managers() on both nodes. It creates roles, generates several password changes, and confirms the standby's pg_password_history view matches the primary after wait_for_catchup. Each operation type is tested individually: rename, drop, and explicit reset. The standby is then restarted to verify that history file written during redo is correctly reloaded from disk. Finally the standby is promoted, and the test confirms that the replicated history survives promotion: a reused password is rejected and a fresh one is accepted.

@Giperboloid

Copy link
Copy Markdown
Contributor Author

Hi!

I'd be glad to get any thoughts about the patch.

Regards

@Giperboloid Giperboloid force-pushed the replication_aware_pgph branch from 3d92d0f to 3f03c31 Compare May 19, 2026 16:38
Prior to this patch, the password history file was a purely local,
in-memory/on-disk structure with no WAL coverage.  On a standby the
file was never updated, so any replication-capable setup had an
inconsistent password history between primary and replicas.

This patch introduces a custom WAL resource manager for credcheck
(registered under RM_EXPERIMENTAL_ID until a stable ID is reserved in
access/rmgrlist.h) available on PostgreSQL 15 and later, where the
Custom WAL Resource Manager API was introduced.

Six WAL record types are defined:

  XLOG_CREDCHECK_PWD_ADD         – a password entry was appended
  XLOG_CREDCHECK_PWD_REMOVE      – a single entry was removed
  XLOG_CREDCHECK_PWD_REMOVE_USER – all entries for a role were removed
  XLOG_CREDCHECK_PWD_RENAME      – a role was renamed
  XLOG_CREDCHECK_PWD_RESET       – history was truncated (optionally per-user)
  XLOG_CREDCHECK_PWD_TIMESTAMP   – a password date was updated

Every mutation of the password history now emits a corresponding WAL
record.  RESET and TIMESTAMP records are immediately flushed with
XLogFlush() to guarantee the standby receives them before the
originating transaction completes.  The redo routine
credcheck_rmgr_redo() replays each record type by applying the same
operation to the standby's local password history file.

On PostgreSQL versions prior to 15 the code compiles and behaves
exactly as before; the new paths are guarded by

A TAP regression test (t/001_history_replication.pl) is added to
verify end-to-end replication of all password history operations across
a primary/standby pair.
@Giperboloid Giperboloid force-pushed the replication_aware_pgph branch from 3f03c31 to 052dd5e Compare May 19, 2026 16:40
@Giperboloid

Copy link
Copy Markdown
Contributor Author

Hi!
I've updated my branch by adding two minor fixes on issues highlighted in CI:

  1. const qualifier has been removed from credcheck_rmgr struct definition
  2. tap-test's directory t has been moved to project's root due to pgxs rules

It seems that there should not be problems in CI now.

@darold

darold commented May 20, 2026

Copy link
Copy Markdown
Contributor

Hi Aidar,

Yes I have already found that the t/ directory must be on top of the project and review the patch. I have also tested on PG version from 16 to 18. It is an excellent work, thanks you very much! Your help is much appreciated. I'm applying the PR, the installcheck works well on my computer too but I think that the CI file might need to be fixed with the TAP test but your patch works as expected. I will have a look today.

Thanks again!

@darold darold merged commit 675a944 into HexaCluster:master May 20, 2026
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants