Skip to content

CNTRLPLANE-3023: Add CEL rule to prevent osImageStream removal#8719

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
sdminonne:CNTRLPLANE-3023
Jun 18, 2026
Merged

CNTRLPLANE-3023: Add CEL rule to prevent osImageStream removal#8719
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
sdminonne:CNTRLPLANE-3023

Conversation

@sdminonne

@sdminonne sdminonne commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a FeatureGateAwareXValidation CEL rule on NodePoolSpec that prevents removing the osImageStream field once set
  • Add OSImageStreamRHEL9 and OSImageStreamRHEL10 constants for consistent stream name usage
  • Add envtest case covering the removal scenario

Description

Optional immutable fields on feature-gated types are subject to a two-step bypass: a user can (1) remove the field, then (2) re-add it with a different value. The existing field-level CEL rule on osImageStream prevents single-step downgrades (rhel-10 → rhel-9), but a field-level transition rule (oldSelf/self) does not fire when the field is removed entirely — the validator requires both oldSelf and self to be present.

This change adds a parent-level CEL rule on NodePoolSpec using +openshift:validation:FeatureGateAwareXValidation (gated on OSStreams) that rejects any update that removes osImageStream once it has been set:

!has(oldSelf.osImageStream) || has(self.osImageStream)

The FeatureGateAwareXValidation marker is used instead of +kubebuilder:validation:XValidation because the osImageStream field is stripped from the CRD schema in the Default variant (where the feature gate is disabled). A regular XValidation rule referencing osImageStream would fail at CRD installation time in that variant.

Test plan

  • Envtest: "When removing osImageStream from an existing NodePool it should fail"
  • CEL rule present in TechPreviewNoUpgrade and CustomNoUpgrade CRD variants
  • CEL rule absent from Default CRD variant
  • make update && make verify passes clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added support for RHEL 9 and RHEL 10 operating system image streams.
  • Enhancements

    • Implemented validation to prevent removing an operating system image stream configuration once it has been set.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 11, 2026

Copy link
Copy Markdown

@sdminonne: This pull request references CNTRLPLANE-3023 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add ReleaseImage.AvailableOSImageStreams() method that returns OS image streams available in a release payload based on version heuristics (pre-5.0: [rhel-9], 5.0+: [rhel-9, rhel-10])
  • Extend validOSImageStreamCondition to validate that the specified spec.osImageStream.name exists in the NodePool's release payload, in addition to the existing transition guards (removal/downgrade)
  • Add NodePoolOSImageStreamNotInPayloadReason constant for the new validation failure reason
  • Add unit tests for both AvailableOSImageStreams() and the new payload validation path in TestValidOSImageStreamCondition

Description

Previously, validOSImageStreamCondition only enforced transition guards (preventing removal once set and preventing downgrades from rhel-10 to rhel-9). It did not verify whether the specified stream actually exists in the release payload.

This change restructures the condition into three phases:

  1. Transition validation (unchanged): removal and downgrade guards that don't need release info
  2. Payload validation (new): looks up the release image and checks that the stream is in AvailableOSImageStreams(); rejects with ValidOSImageStream=False / OSImageStreamNotInPayload if not found
  3. Success: updates the status latch and sets condition True

For example, setting rhel-10 on a 4.18 release will now produce:

ValidOSImageStream=False, Reason=OSImageStreamNotInPayload
Message: osImageStream "rhel-10" is not available in release payload; available streams: [rhel-9]

Test plan

  • TestAvailableOSImageStreams — 4 cases covering 4.18 (rhel-9 only), 5.0/5.1 (both streams), invalid version (conservative default)
  • TestValidOSImageStreamCondition — 10 cases covering transition guards (removal, downgrade) and payload validation (rhel-9 on 4.18, rhel-10 on 4.18 rejected, rhel-9/rhel-10 on 5.0, upgrade on 5.0)
  • make pre-commit passes (update, build, verify, test, gitlint)
  • /test e2e-hypershift

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 76086dc2-767b-4cc5-b5a2-0b2ba5a6b689

📥 Commits

Reviewing files that changed from the base of the PR and between 4496f29 and a59f54d.

⛔ Files ignored due to path filters (5)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (1)
  • api/hypershift/v1beta1/nodepool_types.go

📝 Walkthrough

Walkthrough

In api/hypershift/v1beta1/nodepool_types.go, a feature-gated CEL XValidation rule is added to NodePoolSpec via a +openshift:validation:FeatureGateAwareXValidation annotation (gated by OSStreams). The rule enforces that the osImageStream field cannot be removed once it has been set. Additionally, a new const block exports two string constants: OSImageStreamRHEL9 ("rhel-9") and OSImageStreamRHEL10 ("rhel-10"), representing supported OS image stream names.

🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning The custom check requires reviewing Ginkgo test code for quality requirements. However, the PR description claims tests were added (TestAvailableOSImageStreams, TestValidOSImageStreamCondition) tha... The PR description describes unit tests (TestAvailableOSImageStreams with 4 cases, TestValidOSImageStreamCondition with 10 cases) and methods (AvailableOSImageStreams, NodePoolOSImageStreamNotInPayloadReason) that are not actually implem...
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a CEL rule to prevent osImageStream removal, which is the primary feature introduced in this pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests present in PR. Only Go table-driven tests in nodepool_types_test.go exist, with stable deterministic names containing no dynamic values.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only API type definitions and validation annotations (CEL rules) on NodePoolSpec with no deployment manifests, operator code, or scheduling constraints that would affect topology compat...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR only adds standard Go unit tests in nodepool_types_test.go (TestNodePoolAutoScalingSerializationCompatibility), which uses testing.T, not Ginkgo patterns.
No-Weak-Crypto ✅ Passed The PR adds OSImageStream constants and CEL validation rules to NodePoolSpec. No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-cons...
Container-Privileges ✅ Passed PR only modifies Go API type definitions (constants and CEL validation rules) in nodepool_types.go; contains no container/K8s manifests or container security configurations to flag.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data logging found. Changes only add OS stream constants ("rhel-9", "rhel-10") and validation rules with generic messages. No passwords, tokens, PII, or internal data exposed in logs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.66%. Comparing base (44f5195) to head (a59f54d).
⚠️ Report is 42 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8719   +/-   ##
=======================================
  Coverage   41.66%   41.66%           
=======================================
  Files         758      758           
  Lines       93929    93929           
=======================================
  Hits        39135    39135           
  Misses      52046    52046           
  Partials     2748     2748           
Flag Coverage Δ
cmd-support 34.96% <ø> (ø)
cpo-hostedcontrolplane 44.00% <ø> (ø)
cpo-other 43.45% <ø> (ø)
hypershift-operator 51.65% <ø> (ø)
other 31.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/hypershift/v1beta1/nodepool_types.go (1)

242-261: ⚡ Quick win

Clarify default OS stream behavior in documentation.

Line 252 states "the pool uses the release version's default stream (rhel-9 for OCP < 5.0, rhel-10 for OCP >= 5.0)". This is misleading: for OCP >= 5.0, AvailableOSImageStreams() returns both ["rhel-9", "rhel-10"], not a single default.

When osImageStream is omitted, the actual default selection is implementation-defined by downstream platform code, not by this API field. Consider revising the documentation to:

// When omitted, the pool uses platform-specific default OS images.
// For OCP < 5.0, only rhel-9 is available.
// For OCP >= 5.0, both rhel-9 and rhel-10 are available; set this field
// explicitly to select a non-default stream.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/hypershift/v1beta1/nodepool_types.go` around lines 242 - 261, Doc comment
for the OSImageStream field is misleading about defaults; update the comment
above the OSImageStream (osImageStream) field to state that when omitted the
pool uses platform-specific defaults, that AvailableOSImageStreams() returns
only rhel-9 for OCP < 5.0 but both rhel-9 and rhel-10 for OCP >= 5.0, and
recommend that callers explicitly set OSImageStream to pick a non-default stream
(use the suggested replacement wording from the review to replace the existing
paragraph). Ensure references to OSImageStream and AvailableOSImageStreams()
remain accurate and keep the CEL validation note intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/hypershift/v1beta1/nodepool_types.go`:
- Around line 242-261: Doc comment for the OSImageStream field is misleading
about defaults; update the comment above the OSImageStream (osImageStream) field
to state that when omitted the pool uses platform-specific defaults, that
AvailableOSImageStreams() returns only rhel-9 for OCP < 5.0 but both rhel-9 and
rhel-10 for OCP >= 5.0, and recommend that callers explicitly set OSImageStream
to pick a non-default stream (use the suggested replacement wording from the
review to replace the existing paragraph). Ensure references to OSImageStream
and AvailableOSImageStreams() remain accurate and keep the CEL validation note
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 212bbd47-181c-4e0a-8242-848f008b0507

📥 Commits

Reviewing files that changed from the base of the PR and between 35c0190 and bef198c.

⛔ Files ignored due to path filters (20)
  • api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/zz_generated*
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • client/applyconfiguration/hypershift/v1beta1/nodepoolspec.go is excluded by !client/**
  • client/applyconfiguration/hypershift/v1beta1/nodepoolstatus.go is excluded by !client/**
  • client/applyconfiguration/hypershift/v1beta1/osimagestreamreference.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • cmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-Hypershift-Default.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-Hypershift-TechPreviewNoUpgrade.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-SelfManagedHA-Default.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-SelfManagedHA-TechPreviewNoUpgrade.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*
📒 Files selected for processing (11)
  • api/hypershift/v1beta1/featuregates/featureGate-Hypershift-Default.yaml
  • api/hypershift/v1beta1/featuregates/featureGate-Hypershift-TechPreviewNoUpgrade.yaml
  • api/hypershift/v1beta1/featuregates/featureGate-SelfManagedHA-Default.yaml
  • api/hypershift/v1beta1/featuregates/featureGate-SelfManagedHA-TechPreviewNoUpgrade.yaml
  • api/hypershift/v1beta1/nodepool_conditions.go
  • api/hypershift/v1beta1/nodepool_types.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/conditions_test.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • support/releaseinfo/releaseinfo.go
  • support/releaseinfo/releaseinfo_test.go

@sdminonne sdminonne marked this pull request as ready for review June 12, 2026 09:39
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2026
@openshift-ci openshift-ci Bot requested review from devguyio and jparrill June 12, 2026 09:40
@sdminonne

Copy link
Copy Markdown
Contributor Author

/retest

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks!

CIDRConflictReason = "CIDRConflict"
NodePoolKubeVirtLiveMigratableReason = "KubeVirtNodesNotLiveMigratable"
NodePoolUnsupportedSkewReason = "UnsupportedSkew"
NodePoolOSImageStreamRemovalReason = "OSImageStreamRemoved"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The alignment on these three is off compared to the rest of the const block. The existing constants column-align the = sign — these push it further right. make fmt won't catch it (gofmt doesn't enforce alignment across different-length names in a const block), so it's a manual fix.

Also, the constant name NodePoolOSImageStreamRemovalReason says "Removal" (noun) but its value is "OSImageStreamRemoved" (past tense). The other two are consistent with themselves (DowngradeReason/"...Downgrade", NotInPayloadReason/"...NotInPayload"). Minor, but worth picking one convention.


// --- Phase 3: Valid. Update status latch and set condition True ---

if specStream != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first condition function in this file that mutates a non-condition status field — all the others only call SetStatusCondition. The latch makes sense here because it has to be set atomically with the validation passing, but it's a departure from the pattern. A one-line comment explaining why would help future readers, e.g.:

// Latch must be set here, not in the reconcile body, because transition guards
// depend on it being updated only after validation passes.

Comment thread support/releaseinfo/releaseinfo.go Outdated
return i.ImageStream.Name
}

// AvailableOSImageStreams returns the OS image streams available in this release payload.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a version heuristic, not actual payload introspection — it doesn't look at StreamMetadata or image tags. That's fine for TechPreview, but worth a TODO so it's not forgotten:

// TODO(CNTRLPLANE-3023): Replace version heuristic with payload metadata
// introspection when release images carry OS stream manifests.

Also, consider defining "rhel-9" and "rhel-10" as constants in nodepool_types.go (like ArchitectureAMD64 = "amd64" and UpgradeTypeReplace). They're used here and in conditions.go — centralizing them reduces the chance of typos and makes the hardcoded downgrade check more maintainable.


// NodePoolValidOSImageStreamConditionType signals if the osImageStream requested in the
// NodePool spec is valid. This covers two classes of validation:
// 1. Transition guards (controller-level complement to CEL): prevents removing the field

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enforce this "prevents removing the field", this needs to be implemented via CEL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// 2. Payload validation: verifies that the specified stream is available in the NodePool's
// release payload (e.g., rhel-10 is only available in 5.0+ payloads).
// A failure here requires the user to change the osImageStream field to a valid value.
NodePoolValidOSImageStreamConditionType = "ValidOSImageStream"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is derailing from the enhancement, if something needs adjustment please create a PR against the enhacement

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'm removing it.

Add a FeatureGateAwareXValidation CEL rule on NodePoolSpec that prevents
removing the osImageStream field once set, closing the two-step bypass
for optional immutable fields on feature-gated types. Add an envtest
case covering the removal scenario.

Also add OSImageStreamRHEL9 and OSImageStreamRHEL10 constants for
consistent use of stream name strings across the codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdminonne sdminonne changed the title CNTRLPLANE-3023: Add osImageStream release payload validation CNTRLPLANE-3023: Add CEL rule to prevent osImageStream removal Jun 16, 2026
@sdminonne

Copy link
Copy Markdown
Contributor Author

@enxebre removed the erroneous condition definitions. Left only the CEL rule to prevent stream removal once set.
PR description updated.

@enxebre

enxebre commented Jun 16, 2026

Copy link
Copy Markdown
Member

/approve

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, sdminonne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 16, 2026
@sdminonne

Copy link
Copy Markdown
Contributor Author

/retest

@sdminonne

Copy link
Copy Markdown
Contributor Author

/verified by unit-tests

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 17, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@sdminonne: This PR has been marked as verified by unit-tests.

Details

In response to this:

/verified by unit-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-v2-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2067150317815861248 | Cost: $3.9142590000000004 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@cwbotbot

cwbotbot commented Jun 17, 2026

Copy link
Copy Markdown

Test Results

e2e-aws

e2e-aks

@jparrill

Copy link
Copy Markdown
Contributor

/retest-required

@sdminonne

Copy link
Copy Markdown
Contributor Author

/test e2e-kubevirt-aws-ovn-reduced

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 30a12e6 and 2 for PR HEAD a59f54d in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD fabde37 and 1 for PR HEAD a59f54d in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD c35f662 and 0 for PR HEAD a59f54d in total

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown

The image-registry operator has no status conditions at all — it never reported any conditions, which means it never even started properly. This is consistent with a bootstrap timing issue where the cluster never fully came up.

Now I have all the evidence I need. Let me produce the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Bootstrap failed to complete: timed out waiting for the condition
Failed to wait for bootstrapping to complete. This error usually happens when there
is a problem with control plane hosts that prevents the control plane operators from
creating the control plane.
Installer exit with code 5

Summary

The management cluster IPI installation failed during the bootstrap phase due to the ingress-operator entering CrashLoopBackOff (10 restarts). The ingress-operator crashed because the Route API (*v1.Route) was never registered — its cache sync for the canary_controller timed out waiting for the Route CRD. Without ingress, the router pod failed its startup probes (1,081 failures: "backend-proxy-http failed" and "has-synced failed"), the monitoring operator could not create Routes, and the authentication/oauth operator could not function. The bootstrap timed out after 45 minutes with 7 cluster operators unavailable. This is a CI infrastructure flake unrelated to the PR changes, which only add CEL validation rules for the HyperShift osImageStream field.

Root Cause

The root cause is a bootstrap timing/ordering failure in the management cluster installation. The failure chain is:

  1. openshift-apiserver operator never became Available — It reported PreconditionNotReady since 00:15:59 UTC and never transitioned to Available. The route.openshift.io API group was never registered as an APIService, meaning the Route CRD was not properly installed during bootstrap.

  2. Ingress-operator crashed repeatedly — The ingress-operator started at 00:17:11, but within 2 minutes it crashed with: "failed to wait for canary_controller caches to sync kind source: *v1.Route: timed out waiting for cache to be synced for Kind *v1.Route". It entered CrashLoopBackOff at 00:21:14 and never recovered (10 restarts, 27 BackOff events through 01:06:33).

  3. Router pod never became ready — The router pod failed its startup probe 1,081 times with backend-proxy-http failed and has-synced failed. It was initially unschedulable (only master nodes existed with taints) until the worker node became Ready at 00:35:02, but even after being scheduled, it couldn't sync because the underlying APIs were unavailable.

  4. Cascading failures — Without ingress:

    • Monitoring operator: "creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)"
    • Authentication operator: "OAuthServerServiceEndpointAccessibleControllerDegraded: connection refused"
    • Console operator: Available=Unknown
    • CVO reported: "Unable to apply 4.22.0-0.ci-2026-06-17-021229: MultipleErrors"
  5. Not related to PR — The PR adds CEL validation rules for HyperShift's osImageStream field in the NodePool CRD. The failure occurred in the management cluster IPI installation (pre phase), which uses the OCP 4.22 CI payload — the PR's changes to the HyperShift operator are not involved in this installation at all.

Recommendations
  1. Retry the job — This is a CI infrastructure flake in the management cluster bootstrap. The PR changes (CEL validation rules for osImageStream) are not involved in the management cluster installation. A simple re-trigger should succeed.

  2. No code changes needed — The failure is in the OCP 4.22 CI payload's bootstrap process, not in any HyperShift code modified by this PR.

  3. If the failure recurs, investigate whether the 5.0.0-0.ci-2026-06-16-114448 / 4.22.0-0.ci-2026-06-17-021229 CI payloads have a known bootstrap regression with the openshift-apiserver's PreconditionNotReady condition. The CVO also reported "Could not update imagestream 'openshift/driver-toolkit' (689 of 1013)", suggesting possible payload issues.

Evidence
Evidence Detail
Failed Step ipi-install-install (pre phase), exit code 5
Bootstrap Timeout Bootstrap failed to complete: timed out waiting for the condition at 00:57:44 UTC
Ingress-operator CrashLoopBackOff, 10 restarts, error: "failed to wait for canary_controller caches to sync kind source: *v1.Route: timed out waiting for cache to be synced for Kind *v1.Route"
Router Pod Startup probe failed 1,081 times: backend-proxy-http failed, has-synced failed
openshift-apiserver Available=False since 00:15:59 UTC, reason: APIServices_PreconditionNotReady
Route API route.openshift.io APIService not found in apiservices.json — CRD never registered
Monitoring Operator Degraded: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)
ClusterVersion State=Partial, Failing=True: Multiple errors are preventing progress: Cluster operators authentication, console, image-registry, ingress, monitoring, openshift-apiserver, openshift-samples are not available
Worker Node Became Ready at 00:35:02, but ingress-operator already crash-looping since 00:21:14
PR Relevance PR adds CEL rules for HyperShift NodePool osImageStream field — no management cluster IPI install code modified
Payload 4.22.0-0.ci-2026-06-17-021229 (management cluster), 5.0.0-0.ci-2026-06-16-114448 (release)

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/hold

Revision a59f54d was retested 3 times: holding

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2026
@sdminonne

Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2026
@sdminonne

Copy link
Copy Markdown
Contributor Author

/test e2e-kubevirt-aws-ovn-reduced

@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@sdminonne: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit e4a1ba2 into openshift:main Jun 18, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants