Skip to content

fix: pods rotation#2394

Open
rugggger wants to merge 2 commits intomainfrom
03-24-fix_pods_rotation
Open

fix: pods rotation#2394
rugggger wants to merge 2 commits intomainfrom
03-24-fix_pods_rotation

Conversation

@rugggger
Copy link
Contributor

@rugggger rugggger commented Mar 25, 2026

TL;DR

Introduces cluster configuration hash tracking to trigger pod rotation when cluster spec changes beyond just image updates, enabling proper rolling updates for configuration changes.

What changed?

  • Added PodConfigVersion environment variable configuration and WekaRuntimeVersion constant to track runtime changes
  • Implemented clusterConfigHash() function that generates SHA-256 hash from cluster image, version numbers, and tracked spec fields (network, tolerations, hugepages, etc.)
  • Modified upgrade controller to accept target config hash alongside target image, with new logic to check both image and config alignment
  • Added TargetClusterSpecHash field to container specs and ClusterSpecHashAnnotation to pod annotations for tracking
  • Updated container reconciliation to trigger pod rotation when either image or cluster config hash changes
  • Refactored spec propagation into reusable ApplyUpdatableSpecToContainer() and PropagateSpecToContainer() functions
  • Enhanced upgrade flows to handle both image upgrades and configuration-only changes

How to test?

  1. Deploy a WekaCluster and verify initial pod creation includes config hash annotations
  2. Change cluster spec fields like network configuration, tolerations, or hugepages settings without changing image
  3. Verify containers get updated with new TargetClusterSpecHash and pods are rotated to apply changes
  4. Test image-only upgrades still work as expected
  5. Test combined image + config changes trigger proper rolling updates

Why make this change?

Previously, only image changes would trigger pod rotation during upgrades. Configuration changes to cluster specs (network settings, resource allocations, tolerations, etc.) would update container specs but not rotate pods, leaving running pods with stale configurations. This change ensures any meaningful cluster configuration change triggers proper pod rotation to apply the new settings.

@github-actions
Copy link

github-actions bot commented Mar 25, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 7fb8bbd.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Contributor Author

rugggger commented Mar 25, 2026


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@rugggger rugggger marked this pull request as ready for review March 25, 2026 09:01
@rugggger rugggger requested a review from a team as a code owner March 25, 2026 09:01
@graphite-app
Copy link

graphite-app bot commented Mar 25, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/25/26)

3 reviewers were added to this PR based on Anton Bykov's automation.

CpuPolicy: spec.CpuPolicy,
}
if spec.Dynamic != nil {
fields.ComputeExtraCores = spec.Dynamic.ComputeExtraCores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if ComputeExtraCores is changed, but we are drive wekacontainer?
will it mean that hash is chnaged then and pod should be rotated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all tracked fields will cause a rotation similar to an upgrade (for all containers, same order, one by one, but no image-specific operations)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like this hash should be calculated per role so that we do not rotate pods that won't actually get any updates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sounds right

@rugggger rugggger force-pushed the 03-24-fix_pods_rotation branch from 14174a7 to e6b40b8 Compare March 26, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants