fix bug #8470 citus_activate_node with secondary nodes is not activate#8535
fix bug #8470 citus_activate_node with secondary nodes is not activate#8535akalend wants to merge 8 commits intocitusdata:mainfrom
Conversation
|
Thanks for the fix. The bug analysis is correct. I just digged into the original cause of this issue and it seems like a regression from #6728 . Before that refactor, the old ActivateNodeList had explicit NodeIsPrimary guards that skipped metadata sync for secondaries while still setting isactive=true via SetWorkerColumnLocalOnly, and then SetNodeState → SetWorkerColumn propagated the change to other metadata-holding primaries. The refactor lost those guards, so ActivateNodeList now tries to sync metadata/objects to the secondary (a read-only replica), which fails. The fix direction is right (early return for secondaries), but the new ActivateNode() function reimplements what SetWorkerColumn already does, and IMHO the entire fix can be simplified to something like This would eliminate the new function, and quite a few lines of duplicated code. |
| # test that no tests leaked intermediate results. This should always be last | ||
| test: ensure_no_intermediate_data_leak | ||
| test: check_mx | ||
| test: check_activate_secondary_node |
There was a problem hiding this comment.
The new test is added after ensure_no_intermediate_data_leak, which has the comment "This should always be last."
Please move it before that line (e.g., before ensure_no_intermediate_data_leak, after multi_add_node_from_backup_sync_replica).
|
|
||
| -- error this operation cannot be completed in nontransactional metadata sync mode | ||
| -- if the GUC citus.metadata_sync_mode set to 'nontransactional' | ||
| SELECT 1 FROM citus_activate_node('localhost', :follower_worker_2_port); |
There was a problem hiding this comment.
The test adds a secondary node and activates it but doesn't clean up. Consider adding cleanup at the end to avoid polluting state for any future tests that might be added after this one in the schedule:
May be:
-- cleanup
SELECT citus_remove_node('localhost', :follower_worker_2_port);
|
@codeforall , Thanks for review. Your patch is more usable. |
DESCRIPTION: bug #8470 The citus_activate_node() does not activate secondary nodes. This action is necessary when failover is performed from the secondary node to the primary. You have to do this manually with the UPDATE pg_dist_node command. This patch adds the citus_activate_node command to secondary nodes.
@microsoft-github-policy-service agree