Skip to content

fix bug #8470 citus_activate_node with secondary nodes is not activate#8535

Open
akalend wants to merge 8 commits intocitusdata:mainfrom
akalend:akalend/activate_secondary_node
Open

fix bug #8470 citus_activate_node with secondary nodes is not activate#8535
akalend wants to merge 8 commits intocitusdata:mainfrom
akalend:akalend/activate_secondary_node

Conversation

@akalend
Copy link
Copy Markdown

@akalend akalend commented Apr 5, 2026

DESCRIPTION: bug #8470 The citus_activate_node() does not activate secondary nodes. This action is necessary when failover is performed from the secondary node to the primary. You have to do this manually with the UPDATE pg_dist_node command. This patch adds the citus_activate_node command to secondary nodes.

@microsoft-github-policy-service agree

@codeforall
Copy link
Copy Markdown
Contributor

Thanks for the fix. The bug analysis is correct. I just digged into the original cause of this issue and it seems like a regression from #6728 . Before that refactor, the old ActivateNodeList had explicit NodeIsPrimary guards that skipped metadata sync for secondaries while still setting isactive=true via SetWorkerColumnLocalOnly, and then SetNodeState → SetWorkerColumn propagated the change to other metadata-holding primaries. The refactor lost those guards, so ActivateNodeList now tries to sync metadata/objects to the secondary (a read-only replica), which fails.

The fix direction is right (early return for secondaries), but the new ActivateNode() function reimplements what SetWorkerColumn already does, and IMHO the entire fix can be simplified to something like

if (NodeIsSecondary(workerNode))
{
    EnsureTransactionalMetadataSyncMode();
    SetWorkerColumn(workerNode, Anum_pg_dist_node_isactive, BoolGetDatum(true));
    TransactionModifiedNodeMetadata = true;
    PG_RETURN_INT32(workerNode->nodeId);
}

This would eliminate the new function, and quite a few lines of duplicated code.

# test that no tests leaked intermediate results. This should always be last
test: ensure_no_intermediate_data_leak
test: check_mx
test: check_activate_secondary_node
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test is added after ensure_no_intermediate_data_leak, which has the comment "This should always be last."
Please move it before that line (e.g., before ensure_no_intermediate_data_leak, after multi_add_node_from_backup_sync_replica).


-- error this operation cannot be completed in nontransactional metadata sync mode
-- if the GUC citus.metadata_sync_mode set to 'nontransactional'
SELECT 1 FROM citus_activate_node('localhost', :follower_worker_2_port);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test adds a secondary node and activates it but doesn't clean up. Consider adding cleanup at the end to avoid polluting state for any future tests that might be added after this one in the schedule:
May be:
-- cleanup
SELECT citus_remove_node('localhost', :follower_worker_2_port);

@akalend
Copy link
Copy Markdown
Author

akalend commented Apr 15, 2026

@codeforall , Thanks for review. Your patch is more usable.
All code fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants