Add citus_block_writes_for_backup UDFs for LTR backup by codeforall · Pull Request #8543 · citusdata/citus

codeforall · 2026-04-15T08:29:13Z

Citus uses Two-Phase Commit (2PC) for distributed write transactions. When taking cluster-wide backups, independent per-node snapshots can capture mid-flight 2PC transactions in inconsistent states — e.g. a coordinator with the commit record but a worker missing the prepared transaction — leading to irrecoverable data inconsistency on restore.

To solve this, we temporarily block distributed 2PC commit decisions and schema/topology changes across the entire cluster while disk snapshots are taken. Acquiring ExclusiveLock on pg_dist_transaction guarantees that all in-flight 2PC transactions have fully completed (including COMMIT PREPARED on every worker) before the lock is granted, and no new commit decisions can begin while it is held. This creates a clean partition: everything before the lock is fully committed on all nodes; everything after hasn't started. Read queries and single-shard writes are never blocked.

New UDFs:

citus_block_writes_for_backup(timeout_ms int DEFAULT 300000)
Spawns a dedicated background worker that acquires ExclusiveLock on
pg_dist_transaction, pg_dist_partition, and pg_dist_node on the
coordinator, and sends LOCK TABLE ... IN EXCLUSIVE MODE on
pg_dist_transaction and pg_dist_partition to all metadata workers.
Returns true when all locks are held cluster-wide.

citus_unblock_writes_for_backup()
Signals the background worker to release all locks and exit. Can be
called from any session. Returns true on success.

citus_backup_block_status()
Returns a single row with the current block state, worker PID,
requestor PID, block start time, timeout, and locked node count.

Architecture:

A dedicated background worker holds a single transaction whose
lifetime equals the lock-holding period. This ensures locks survive
the caller's session disconnect. All termination paths (explicit
release, timeout expiry, SIGTERM, error, postmaster death) end the
transaction, guaranteeing locks cannot leak.

The worker uses PostgreSQL's WaitEventSet API to simultaneously
monitor its latch (for release/SIGTERM), the postmaster, and every
remote connection socket. If any worker-node connection drops, the
failure is detected immediately via PQconsumeInput/PQstatus and the
block is released with an error.

Shared memory (BackupBlockControlData, protected by an LWLock)
coordinates state between the block/unblock/status UDFs and the
background worker. Stale state from a crashed worker is auto-cleaned
by the status UDF via kill(pid, 0) liveness checks.

Remote lock acquisition is bounded by SET LOCAL statement_timeout on
each worker connection, preventing indefinite hangs if a remote node
is unresponsive during the LOCK TABLE phase.

The BLOCK_DISTRIBUTED_WRITES_COMMAND macro (shared with
citus_create_restore_point) is defined in backup_block.h to avoid
duplication.

DESCRIPTION: PR description that will go into the change log, up to 78 characters

Citus uses Two-Phase Commit (2PC) for distributed write transactions. When taking cluster-wide backups, independent per-node snapshots can capture mid-flight 2PC transactions in inconsistent states — e.g. a coordinator with the commit record but a worker missing the prepared transaction — leading to irrecoverable data inconsistency on restore. To solve this, we temporarily block distributed 2PC commit decisions and schema/topology changes across the entire cluster while disk snapshots are taken. Acquiring ExclusiveLock on pg_dist_transaction guarantees that all in-flight 2PC transactions have fully completed (including COMMIT PREPARED on every worker) before the lock is granted, and no new commit decisions can begin while it is held. This creates a clean partition: everything before the lock is fully committed on all nodes; everything after hasn't started. Read queries and single-shard writes are never blocked. New UDFs: citus_block_writes_for_backup(timeout_ms int DEFAULT 300000) Spawns a dedicated background worker that acquires ExclusiveLock on pg_dist_transaction, pg_dist_partition, and pg_dist_node on the coordinator, and sends LOCK TABLE ... IN EXCLUSIVE MODE on pg_dist_transaction and pg_dist_partition to all metadata workers. Returns true when all locks are held cluster-wide. citus_unblock_writes_for_backup() Signals the background worker to release all locks and exit. Can be called from any session. Returns true on success. citus_backup_block_status() Returns a single row with the current block state, worker PID, requestor PID, block start time, timeout, and locked node count. Architecture: A dedicated background worker holds a single transaction whose lifetime equals the lock-holding period. This ensures locks survive the caller's session disconnect. All termination paths (explicit release, timeout expiry, SIGTERM, error, postmaster death) end the transaction, guaranteeing locks cannot leak. The worker uses PostgreSQL's WaitEventSet API to simultaneously monitor its latch (for release/SIGTERM), the postmaster, and every remote connection socket. If any worker-node connection drops, the failure is detected immediately via PQconsumeInput/PQstatus and the block is released with an error. Shared memory (BackupBlockControlData, protected by an LWLock) coordinates state between the block/unblock/status UDFs and the background worker. Stale state from a crashed worker is auto-cleaned by the status UDF via kill(pid, 0) liveness checks. Remote lock acquisition is bounded by SET LOCAL statement_timeout on each worker connection, preventing indefinite hangs if a remote node is unresponsive during the LOCK TABLE phase. The BLOCK_DISTRIBUTED_WRITES_COMMAND macro (shared with citus_create_restore_point) is defined in backup_block.h to avoid duplication.

codecov · 2026-04-15T10:19:52Z

Codecov Report

❌ Patch coverage is 9.06433% with 311 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.38%. Comparing base (42edfe8) to head (d685dea).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8543      +/-   ##
==========================================
- Coverage   88.88%   88.38%   -0.50%     
==========================================
  Files         286      287       +1     
  Lines       63763    64115     +352     
  Branches     8017     8056      +39     
==========================================
- Hits        56678    56671       -7     
- Misses       4776     5100     +324     
- Partials     2309     2344      +35

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codeforall force-pushed the muusama/ltr_block branch from 1d3f68b to d685dea Compare April 15, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add citus_block_writes_for_backup UDFs for LTR backup#8543

Add citus_block_writes_for_backup UDFs for LTR backup#8543
codeforall wants to merge 1 commit intomainfrom
muusama/ltr_block

codeforall commented Apr 15, 2026

Uh oh!

codecov bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeforall commented Apr 15, 2026

Uh oh!

codecov bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 15, 2026 •

edited

Loading