Skip to content

[release-4.22] CNTRLPLANE-3526: Add spec.monitoring API for metrics forwarding#8872

Open
muraee wants to merge 6 commits into
openshift:release-4.22from
muraee:backport-8626-release-4.22
Open

[release-4.22] CNTRLPLANE-3526: Add spec.monitoring API for metrics forwarding#8872
muraee wants to merge 6 commits into
openshift:release-4.22from
muraee:backport-8626-release-4.22

Conversation

@muraee

@muraee muraee commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Manual backport of #8626 to release-4.22. The automated cherry-pick failed due to conflicts from the reconciliation loop restructuring on main.

Changes backported:

  • Introduces spec.monitoring.metricsForwarding API on HostedCluster and HostedControlPlane, replacing the annotation-based hypershift.openshift.io/enable-metrics-forwarding mechanism
  • Adds per-cluster metricsSet field (Telemetry/SRE/All) that overrides the global METRICS_SET env var on the HyperShift Operator
  • Updates all consumers (CPO predicates, HCCO, HO SRE config sync) to use the new spec field
  • Maintains backward compatibility: the deprecated annotation is honored when the spec field is not set
  • Mirrors metrics forwarding spec to annotation for N-1 CPO compatibility

Excluded from backport:

  • test/e2e/ changes (not needed for release-4.22)

Conflict resolution:

  • hostedcluster_controller.go: Applied monitoring changes to the pre-refactor reconciliation flow (release-4.22 uses the linear flow, not the phase-based report.execute flow from main)
  • resources.go: Used util.DeleteAllIfNeeded (release-4.22 import) instead of k8sutil.DeleteAllIfNeeded (main)
  • reconcile_legacy.go: Skipped (file doesn't exist in release-4.22)
  • hostedcluster_controller_test.go: Applied monitoring test additions manually to match release-4.22 imports and structure

Test plan

  • go build ./hypershift-operator/controllers/hostedcluster/... compiles
  • go build ./control-plane-operator/... compiles
  • TestReconcileMetricsForwarder passes
  • TestPredicate (endpoint_resolver) passes
  • CI passes

/cc @openshift/hypershift-team

muraee and others added 6 commits June 30, 2026 14:48
Add MonitoringSpec and MetricsForwardingSpec types to HostedCluster
and HostedControlPlane specs, replacing the annotation-based
EnableMetricsForwarding mechanism with a proper API field.

The new API adds:
- monitoring.metricsForwarding.mode (Forward/None) to control
  metrics forwarding per cluster
- monitoring.metricsSet (Telemetry/SRE/All) to override the global
  METRICS_SET environment variable per cluster

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated by: make update

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all consumers to check the new spec field:

- HC controller copies Monitoring from HC to HCP with backward compat
  for the deprecated EnableMetricsForwarding annotation (only when
  mode is unset; explicit Disabled takes precedence)
- CPO resolves per-cluster MetricsSet override before SRE config
  loading and passes it through ControlPlaneContext
- metrics-proxy and endpoint-resolver predicates check Mode enum
- HCCO reconcileMetricsForwarder checks spec instead of annotation
- HO SRE ConfigMap sync supports per-cluster MetricsSet=SRE

(cherry picked from commit ea1eb17)
(backport: excluded test/e2e/ changes, resolved conflicts for release-4.22)

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow the forwarded metrics set (guest-side, via metrics-proxy) to be
configured independently from the MC-side ServiceMonitor/PodMonitor
relabel configs. When MetricsForwardingSpec.MetricsSet is not set, it
falls back to MonitoringSpec.MetricsSet (then to the global METRICS_SET
env var).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Broaden SRE ConfigMap watch to enqueue HostedClusters with per-cluster
  metricsSet SRE override, not just when operator global is SRE
- Extract effectiveMetricsSet() helper to deduplicate override resolution
- Fix comment to say "None" instead of "Disabled" matching the enum
- Fix test names to use Forward/None instead of Enabled/Disabled
- Add positive test for Mode=Forward verifying resources are preserved
- Add envtest for required mode field on MetricsForwardingSpec
- Add godoc to exported Predicate function
- Add TestMetricsSetConstantsInSync asserting hyperv1/metrics type parity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Older CPOs (4.22) only read the EnableMetricsForwarding annotation, not
the new spec.monitoring.metricsForwarding field. When the HO sets the
spec on the HCP without also setting the annotation, a 4.22 CPO never
deploys metrics forwarding components, breaking 4.22 e2e tests and
real-world N-1 version skew scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 30, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 30, 2026

Copy link
Copy Markdown

@muraee: This pull request references CNTRLPLANE-3526 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

Manual backport of #8626 to release-4.22. The automated cherry-pick failed due to conflicts from the reconciliation loop restructuring on main.

Changes backported:

  • Introduces spec.monitoring.metricsForwarding API on HostedCluster and HostedControlPlane, replacing the annotation-based hypershift.openshift.io/enable-metrics-forwarding mechanism
  • Adds per-cluster metricsSet field (Telemetry/SRE/All) that overrides the global METRICS_SET env var on the HyperShift Operator
  • Updates all consumers (CPO predicates, HCCO, HO SRE config sync) to use the new spec field
  • Maintains backward compatibility: the deprecated annotation is honored when the spec field is not set
  • Mirrors metrics forwarding spec to annotation for N-1 CPO compatibility

Excluded from backport:

  • test/e2e/ changes (not needed for release-4.22)

Conflict resolution:

  • hostedcluster_controller.go: Applied monitoring changes to the pre-refactor reconciliation flow (release-4.22 uses the linear flow, not the phase-based report.execute flow from main)
  • resources.go: Used util.DeleteAllIfNeeded (release-4.22 import) instead of k8sutil.DeleteAllIfNeeded (main)
  • reconcile_legacy.go: Skipped (file doesn't exist in release-4.22)
  • hostedcluster_controller_test.go: Applied monitoring test additions manually to match release-4.22 imports and structure

Test plan

  • go build ./hypershift-operator/controllers/hostedcluster/... compiles
  • go build ./control-plane-operator/... compiles
  • TestReconcileMetricsForwarder passes
  • TestPredicate (endpoint_resolver) passes
  • CI passes

/cc @openshift/hypershift-team

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@muraee: GitHub didn't allow me to request PR reviews from the following users: openshift/hypershift-team.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Summary

Manual backport of #8626 to release-4.22. The automated cherry-pick failed due to conflicts from the reconciliation loop restructuring on main.

Changes backported:

  • Introduces spec.monitoring.metricsForwarding API on HostedCluster and HostedControlPlane, replacing the annotation-based hypershift.openshift.io/enable-metrics-forwarding mechanism
  • Adds per-cluster metricsSet field (Telemetry/SRE/All) that overrides the global METRICS_SET env var on the HyperShift Operator
  • Updates all consumers (CPO predicates, HCCO, HO SRE config sync) to use the new spec field
  • Maintains backward compatibility: the deprecated annotation is honored when the spec field is not set
  • Mirrors metrics forwarding spec to annotation for N-1 CPO compatibility

Excluded from backport:

  • test/e2e/ changes (not needed for release-4.22)

Conflict resolution:

  • hostedcluster_controller.go: Applied monitoring changes to the pre-refactor reconciliation flow (release-4.22 uses the linear flow, not the phase-based report.execute flow from main)
  • resources.go: Used util.DeleteAllIfNeeded (release-4.22 import) instead of k8sutil.DeleteAllIfNeeded (main)
  • reconcile_legacy.go: Skipped (file doesn't exist in release-4.22)
  • hostedcluster_controller_test.go: Applied monitoring test additions manually to match release-4.22 imports and structure

Test plan

  • go build ./hypershift-operator/controllers/hostedcluster/... compiles
  • go build ./control-plane-operator/... compiles
  • TestReconcileMetricsForwarder passes
  • TestPredicate (endpoint_resolver) passes
  • CI passes

/cc @openshift/hypershift-team

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 804f0f5a-4705-4606-887b-c490499db8ff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/api Indicates the PR includes changes for the API labels Jun 30, 2026
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: muraee
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 30, 2026
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 71.05263% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.83%. Comparing base (491d5ec) to head (0a76412).

Files with missing lines Patch % Lines
...trollers/hostedcluster/hostedcluster_controller.go 70.00% 8 Missing and 1 partial ⚠️
...stedcontrolplane/v2/endpoint_resolver/component.go 66.66% 1 Missing ⚠️
...s/hostedcontrolplane/v2/metrics_proxy/component.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release-4.22    #8872      +/-   ##
================================================
+ Coverage         35.78%   35.83%   +0.05%     
================================================
  Files               774      774              
  Lines             94734    94755      +21     
================================================
+ Hits              33904    33960      +56     
+ Misses            58065    58025      -40     
- Partials           2765     2770       +5     
Files with missing lines Coverage Δ
.../hostedcontrolplane/v2/metrics_proxy/deployment.go 86.02% <100.00%> (+13.84%) ⬆️
...rconfigoperator/controllers/resources/resources.go 50.76% <100.00%> (+0.27%) ⬆️
...stedcontrolplane/v2/endpoint_resolver/component.go 12.00% <66.66%> (-3.39%) ⬇️
...s/hostedcontrolplane/v2/metrics_proxy/component.go 0.00% <0.00%> (ø)
...trollers/hostedcluster/hostedcluster_controller.go 43.75% <70.00%> (+0.45%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@muraee

muraee commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/lgtm

@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@muraee: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants