Skip to content

PostgreSQL PANIC due to SMgrRelation hashtable corruption under pgactive active-active replication #310

@umutoguz

Description

@umutoguz

Description

When running pgactive in an active-active topology, PostgreSQL can crash with a PANIC during TRUNCATE TABLE or other DDL-like operations.
The crash is caused by internal relation state corruption, reported as:
ERROR: SMgrRelation hashtable corrupted
PANIC: cannot abort transaction , it was already committed
This issue has been observed on PostgreSQL 17 and PostgreSQL 18, and only when pgactive is enabled. The same operations do not cause crashes on the same PostgreSQL versions when pgactive is not in use.

Steps to reproduce

1- Set up a multi-node PostgreSQL cluster (4 nodes in our case) using pgactive active-active replication.

2- Ensure pgactive background workers and logical decoding slots are running normally.

3- On one node, execute a TRUNCATE TABLE statement (or other DDL in PostgreSQL 17) on a replicated table, for example:

TRUNCATE TABLE inventory.test_table;

4- Observe that the PostgreSQL backend immediately crashes and triggers a full server restart.

Note: This reproduces even in a low-load test environment with no external concurrency.

Expected outcome

  • The TRUNCATE TABLE (or DDL) operation should either:
    • Complete successfully, or
    • Fail with a regular SQL error.
  • PostgreSQL should never reach a PANIC state or crash due to such operations.
  • pgactive should not corrupt internal PostgreSQL relation or transaction state.

Actual outcome

  • PostgreSQL reports:

ERROR: SMgrRelation hashtable corrupted
WARNING: AbortTransaction while in COMMIT state
PANIC: cannot abort transaction , it was already committed

  • The backend process is terminated with SIGABRT.
  • PostgreSQL forcibly terminates all remaining backends.
  • Crash recovery is initiated and the database restarts.
  • pgactive supervisor and workers are restarted after recovery.

This results in temporary database unavailability and poses a serious risk in active-active setups.

Analysis

Several observations suggest this is a pgactive-related bug, not a PostgreSQL core issue:

  • The error indicates internal SMgrRelation / relation cache corruption, typically caused by extension-level misuse or race conditions.
  • The crash occurs only when pgactive background workers are active and logical decoding is running.
  • The issue reproduces across:
    • PostgreSQL 17 (with DDL present)
    • PostgreSQL 18 (with TRUNCATE TABLE only, even when DDL replication is disabled)
  • pgactive.skip_ddl_replication = on does not prevent the crash.

This points to a potential race condition between pgactive background workers and DDL/TRUNCATE execution, possibly involving relfilenode changes and stale relation references, leading to an inconsistent transaction commit/abort state.

Logs

ERROR: SMgrRelation hashtable corrupted
STATEMENT: truncate table inventory.test_table;
WARNING: AbortTransaction while in COMMIT state
PANIC: cannot abort transaction 109231405, it was already committed
LOG: client backend was terminated by signal 6: Aborted
LOG: terminating any other active server processes
LOG: database system was not properly shut down; automatic recovery in progress

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions