Skip to content

Probes configuration and override support#610

Merged
openshift-merge-bot[bot] merged 4 commits intoopenstack-k8s-operators:mainfrom
fmount:probes
May 5, 2026
Merged

Probes configuration and override support#610
openshift-merge-bot[bot] merged 4 commits intoopenstack-k8s-operators:mainfrom
fmount:probes

Conversation

@fmount
Copy link
Copy Markdown
Contributor

@fmount fmount commented Feb 18, 2026

Add ProbeOverrides interface and CreateProbeSet() function from lib-common for unified probe management across Cinder services. Enable probe customization through CRD overrides and remove code duplication. Updates all services (API, Scheduler, Volume, Backup) to use the new pattern with proper scheme handling and consistent defaults.
In addition, webhook validation for probes have been introduced.

Depends-On: openstack-k8s-operators/lib-common#673

@fmount fmount requested a review from stuggi February 18, 2026 20:21
@openshift-ci openshift-ci Bot requested a review from eharney February 18, 2026 20:21
@fmount fmount changed the title Probe configuration with override support Probes configuration and override support Feb 18, 2026
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2813b48c394243ce9a2e8ef372592913

openstack-k8s-operators-content-provider FAILURE in 9m 19s
⚠️ cinder-operator-kuttl SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cinder-operator-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2896e982a2c54980ac9db89bd06ae45a

openstack-k8s-operators-content-provider FAILURE in 11m 52s
⚠️ cinder-operator-kuttl SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cinder-operator-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/fcdd543dbfb848aab77b2a7d913569b8

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 03m 39s
cinder-operator-kuttl FAILURE in 34m 31s
✔️ cinder-operator-tempest SUCCESS in 1h 44m 57s

Comment thread internal/cinder/funcs.go
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b2cd3b0a467e469493dfd89037ba89ad

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 01m 18s
cinder-operator-kuttl FAILURE in 36m 02s
✔️ cinder-operator-tempest SUCCESS in 1h 42m 23s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/28e584974ac74471995b2d48929d570d

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 14m 03s
cinder-operator-kuttl FAILURE in 43m 02s
✔️ cinder-operator-tempest SUCCESS in 1h 47m 53s

@fmount
Copy link
Copy Markdown
Contributor Author

fmount commented Apr 10, 2026

recheck

@fmount fmount force-pushed the probes branch 2 times, most recently from b368ecf to 6d648b3 Compare April 20, 2026 10:05
@fmount
Copy link
Copy Markdown
Contributor Author

fmount commented Apr 24, 2026

Here's a quick comparison between what we had before this patch (in terms of defaults) and what we have now (regardless they are used or not):


  ┌───────────┬─────────────────────┬─────────────────┬────────────────┬────────┐
  │   Probe   │        Field        │ Old (hardcoded) │ New (computed) │ Change │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Liveness  │ TimeoutSeconds      │ 5               │ 5              │ same   │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Liveness  │ PeriodSeconds       │ 3               │ 20             │ +17    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Liveness  │ InitialDelaySeconds │ 3               │ 15             │ +12    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Readiness │ TimeoutSeconds      │ --              │ 5              │ new    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Readiness │ PeriodSeconds       │ --              │ 20             │ new    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Readiness │ InitialDelaySeconds │ --              │ 15             │ new    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Startup   │ TimeoutSeconds      │ 5               │ 5              │ same   │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Startup   │ PeriodSeconds       │ 5               │ 10             │ +5     │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Startup   │ InitialDelaySeconds │ 5               │ 20             │ +15    │
  ├───────────┼─────────────────────┼─────────────────┼────────────────┼────────┤
  │ Startup   │ FailureThreshold    │ 12              │ 12             │ same   │
  └───────────┴─────────────────────┴─────────────────┴────────────────┴────────┘

@TristanCacqueray
Copy link
Copy Markdown

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/0bfe4cda658848298ae0d1af2cea59e5

✔️ openstack-k8s-operators-content-provider SUCCESS in 21m 04s
cinder-operator-kuttl RETRY_LIMIT in 29s
cinder-operator-tempest RETRY_LIMIT in 30s

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/6da5f313602949f89d590c1c4fb221ba

✔️ openstack-k8s-operators-content-provider SUCCESS in 19m 34s
cinder-operator-kuttl RETRY_LIMIT in 31s
cinder-operator-tempest RETRY_LIMIT in 29s

@TristanCacqueray
Copy link
Copy Markdown

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/63ed3ad49c4e48cb9eacdd55a1ac2873

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 11m 20s
cinder-operator-kuttl FAILURE in 21m 44s
✔️ cinder-operator-tempest SUCCESS in 1h 53m 49s

InitialDelaySeconds: 5,

// Note that by default we create probes with the same URIScheme and port
apiProbes, err := probes.CreateProbeSet(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abays this is exactly the same patch we did in manila, can you take a look and see if we can land this as well?

Comment thread api/v1beta1/cinderbackup_types.go Outdated
})
})

When("Cinder CR instance is built with custom probes", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason CinderBackup isn't covered here as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think is that we currently do not test cinderBackup in envTests, and we should! Let me try to add cbak basic coverage for probes, and we can keep any refactoring for follow up patches.

tl;dr we don't test cinder-backup at all with envTest, but I'm going to enable it as part of an additional commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added basic support for CinderBackup. I think it deserves a dedicated PR to go through the interfaces and make sure we are able to cover everything, but here the scope is to try probes override, so we should be good. Thanks @abays, this is a good catch and as a result of that I enabled a missing component in our envTests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool man 👍

fmount added 3 commits May 5, 2026 12:08
Add ProbeOverrides interface and CreateProbeSet() function from lib-common
for unified probe management across Cinder services. Enable probe
customization through CRD overrides and remove code duplication.
Updates all services (API, Scheduler, Volume, Backup) to use the new
pattern with proper scheme handling and consistent defaults.
In addition, webhook validation for probes have been introduced.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
Replace static probe timeouts with dynamic scaling based on APITimeout parameter.
Creates separate probe configurations for API services (HTTP endpoints) and RPC
workers (internal services) with appropriate scaling factors. API services use
full APITimeout scaling while RPC workers get proportional timeouts, preventing
premature pod kills during high load scenarios.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
This patch introduces a document where the design decisions related to
the probes settings are described.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
This patch enables cinder-backup in envTests and extend the probes
override test to this component. GetDefaultCinderSpec is now extended
to return cinderBackup as part of the top-level CR and we can now
test overrides for probes.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
@fmount fmount requested a review from abays May 5, 2026 10:12
Copy link
Copy Markdown
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label May 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, fmount

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit e0a83a3 into openstack-k8s-operators:main May 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants