Fix galera bootstrap deadlock when all pods are killed simultaneously#499
Open
lmiccini wants to merge 1 commit into
Open
Fix galera bootstrap deadlock when all pods are killed simultaneously#499lmiccini wants to merge 1 commit into
lmiccini wants to merge 1 commit into
Conversation
Three issues combined to delay galera cluster recovery by 12-54+ minutes when all pods were killed at once. 1. Status update race: injectGcommURI() stored gcomm state in Status.Attributes. The deferred PatchInstance (JSON merge patch) wrote this back, overwriting pod-pushed attributes (including ContainerIDs) with stale data from the start of the reconcile. Fix: move gcomm injection tracking to an in-memory map (gcommState) on the reconciler. The operator no longer modifies Status.Attributes, so pod-pushed data is preserved. 2. Unnecessary ContainerID check in findBestCandidate(): the function required all replicas to have attributes with ContainerIDs matching the currently running containers. But pods restart faster than the reconcile cycle, so CIDs never match. This check is unnecessary during bootstrap recovery because no pod has started mysqld (pods are blocked waiting for gcomm_uri), so the seqno on the persistent volume cannot change between container restarts. Fix: remove the CID comparison from findBestCandidate(). Only require that all replicas have pushed attributes with valid seqnos. Log CID mismatches for observability without blocking the decision. 3. Spurious joiner push: when all pods were killed, the StatefulSet's AvailableReplicas could remain > 0 briefly (stale status), causing Bootstrapped to be true. The operator pushed joiner gcomm URIs to pods, making them start mysqld against dead peers and wasting a restart cycle. Fix: skip joiner gcomm push when Bootstrapped is set but no pods are actually Ready. Also add a periodic 10s requeue when not bootstrapped, ensuring the reconciler retries even when no external events trigger a reconcile. Functions converted to GaleraReconciler methods: - injectGcommURI: tracks injection in gcommState instead of attr.Gcomm - isBootstrapInProgress: checks gcommState instead of attr.Gcomm - getPodsWaitingForGcomm: checks gcommState instead of attr.Gcomm Generated-By: claude-opus-4-6 Signed-off-by: Luca Miccini <lmiccini@redhat.com>
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lmiccini The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three issues combined to delay galera cluster recovery by 12-54+ minutes when all pods were killed at once.
Status update race: injectGcommURI() stored gcomm state in Status.Attributes. The deferred PatchInstance (JSON merge patch) wrote this back, overwriting pod-pushed attributes (including ContainerIDs) with stale data from the start of the reconcile.
Fix: move gcomm injection tracking to an in-memory map (gcommState) on the reconciler. The operator no longer modifies Status.Attributes, so pod-pushed data is preserved.
Unnecessary ContainerID check in findBestCandidate(): the function required all replicas to have attributes with ContainerIDs matching the currently running containers. But pods restart faster than the reconcile cycle, so CIDs never match. This check is unnecessary during bootstrap recovery because no pod has started mysqld (pods are blocked waiting for gcomm_uri), so the seqno on the persistent volume cannot change between container restarts.
Fix: remove the CID comparison from findBestCandidate(). Only require that all replicas have pushed attributes with valid seqnos. Log CID mismatches for observability without blocking the decision.
Spurious joiner push: when all pods were killed, the StatefulSet's AvailableReplicas could remain > 0 briefly (stale status), causing Bootstrapped to be true. The operator pushed joiner gcomm URIs to pods, making them start mysqld against dead peers and wasting a restart cycle.
Fix: skip joiner gcomm push when Bootstrapped is set but no pods are actually Ready.
Also add a periodic 10s requeue when not bootstrapped, ensuring the reconciler retries even when no external events trigger a reconcile.
Functions converted to GaleraReconciler methods:
Generated-By: claude-opus-4-6
Jira: https://redhat.atlassian.net/browse/OSPRH-32408