Skip to content

Skip SecretHashes update on ansibleLimit-scoped dataplane deployments#1781

Open
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:nodeset_rmqu_finalizer_configmap
Open

Skip SecretHashes update on ansibleLimit-scoped dataplane deployments#1781
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:nodeset_rmqu_finalizer_configmap

Conversation

@lmiccini
Copy link
Copy Markdown
Contributor

@lmiccini lmiccini commented Jan 27, 2026

For servicesOverride deployments, merge SecretHashes into the existing map without resetting it, since all nodes were touched.
For ansibleLimit-scoped deployments, skip the update entirely as not all nodes received the secrets.

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/56ac80bd0e7547ad88350eb0206886b5

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 18m 47s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 38s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 37m 31s
adoption-standalone-to-crc-ceph-provider FAILURE in 3h 01m 55s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 51m 23s
openstack-operator-docs-preview POST_FAILURE in 2m 32s

@stuggi stuggi requested a review from slagle January 28, 2026 08:13
Comment thread api/core/v1beta1/openstackcontrolplane_webhook.go Outdated
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/db62c9cd33b34a538c7eccf243769b6a

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 26s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 56s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 03s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 46m 57s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 34m 08s
openstack-operator-docs-preview POST_FAILURE in 3m 15s

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 3885c4a to c1fe8f8 Compare February 7, 2026 18:56
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b5d3972863e64857b2da5055f867ef55

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 43s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 41s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 22s
adoption-standalone-to-crc-ceph-provider FAILURE in 2h 05m 30s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 43m 01s
✔️ openstack-operator-docs-preview SUCCESS in 3m 14s

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

/retest

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from cbfbb7c to f52529a Compare February 8, 2026 15:01
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from f52529a to 017d2ca Compare February 10, 2026 06:41
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test functional

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 97bb482 to b1d9350 Compare February 12, 2026 15:46
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini lmiccini requested a review from slagle April 27, 2026 07:33
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 187eecc to 9f7cae7 Compare April 27, 2026 07:38
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

1 similar comment
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

@rabi
Copy link
Copy Markdown
Contributor

rabi commented Apr 28, 2026

Why can't infra-operator block deletion of there is mismatch between latest deployment and nodeset secret hashes? I am not sure if we should go this path as it is othrogonal to the original design. Also, users should be discouraged to do secret rotations using ansible_limits. It is there for scenarios where nodes are not reachable for some reason and users are expected to do full deployment across all nodes of the nodeset eventually to keep things in sync.

@lmiccini
Copy link
Copy Markdown
Contributor Author

Why can't infra-operator block deletion of there is mismatch between latest deployment and nodeset secret hashes? I am not sure if we should go this path as it is othrogonal to the original design. Also, users should be discouraged to do secret rotations using ansible_limits. It is there for scenarios where nodes are not reachable for some reason and users are expected to do full deployment across all nodes of the nodeset eventually to keep things in sync.

Thanks @rabi . One of the challenges with that approach is that the last deployment may not be representative of the overall status of the nodeset. AFAICU we would have to introspect the deployments to check whether ansible_limits was used and/or if only a subset of the services were deployed, and repeat this pattern not just in infra but other operators that need to know if the dataplane is in sync (keystone for appcred, possibly others?).
I thought having a generic "am I up to date" status field would be in line with the loosely coupled approach we have for the dataplane, plus we would be able to use this for credentials rotation (rabbitmq, appcred, etc) and also to facilitate the cleanup after we migrate from mirrored to quorum queues among others. I am sure there are more use cases we could use this for.

@rabi
Copy link
Copy Markdown
Contributor

rabi commented Apr 30, 2026

One of the challenges with that approach is that the last deployment may not be representative of the overall status of the nodeset

I am not sure if we need to solve that by adding per-node tracking or by introspecting all deployments. That would start to break the current NodeSet design and make it overly complex IMO.

For this requirement, I think we can keep it simple:

  • do not update NodeSet.Status.SecretHashes when ansibleLimit is set, since that deployment did not fully update the NodeSet

  • with servicesOverride, the last deployment SecretHashes map is only partial, so it should not fully match the NodeSet SecretHashes. This can be false negative but conservative.

  • block deletion whenever NodeSet.Status.SecretHashes and Deployment.Status.SecretHashes differ

With this approach, any scoped deployment, either via ansibleLimit or servicesOverride, keeps deletion blocked. Only a full unscoped deployment across all nodes brings the NodeSet back in sync.

We can add the ansibleLimit guard where SecretHashes are copied back to the NodeSet in internal/controller/dataplane/openstackdataplanenodeset_controller.go.

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 9f7cae7 to e816aa3 Compare April 30, 2026 07:58
@lmiccini lmiccini changed the title Add secret rotation tracking with composite hash drift detection Skip SecretHashes update on scoped dataplane deployments Apr 30, 2026
@lmiccini lmiccini requested a review from rabi April 30, 2026 08:00
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test functional

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/44b91fa09db54b80829d1eb5022cd8ba

✔️ openstack-k8s-operators-content-provider SUCCESS in 12m 54s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 30s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 28s
openstack-operator-tempest-multinode RETRY_LIMIT in 30s
openstack-operator-edpm-baremetal-minor-update RETRY_LIMIT in 28s

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from e816aa3 to 13d1c14 Compare April 30, 2026 12:11
@lmiccini lmiccini changed the title Skip SecretHashes update on scoped dataplane deployments Skip SecretHashes update on ansibleLimit-scoped dataplane deployments Apr 30, 2026
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 13d1c14 to 5761424 Compare April 30, 2026 12:28
@rabi
Copy link
Copy Markdown
Contributor

rabi commented May 4, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label May 4, 2026
@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented May 4, 2026

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented May 4, 2026

recheck

For servicesOverride deployments, merge SecretHashes into the existing
map without resetting it, since all nodes were touched. For
ansibleLimit-scoped deployments, skip the update entirely as not all
nodes received the secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 5761424 to b043596 Compare May 5, 2026 05:39
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci Bot removed the lgtm label May 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lmiccini
Once this PR has been reviewed and has the lgtm label, please ask for approval from rabi. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@centosinfra-prod-github-app
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

@lmiccini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/openstack-operator-build-deploy-kuttl 1698305 link true /test openstack-operator-build-deploy-kuttl
ci/prow/openstack-operator-build-deploy-kuttl-4-18 b043596 link true /test openstack-operator-build-deploy-kuttl-4-18

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants