CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation)#8730
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@sdminonne: This pull request references CNTRLPLANE-3553 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR propagates a resolved RHEL OS image stream ( Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8730 +/- ##
==========================================
+ Coverage 41.86% 42.29% +0.43%
==========================================
Files 759 771 +12
Lines 94101 97129 +3028
==========================================
+ Hits 39392 41081 +1689
- Misses 51949 53179 +1230
- Partials 2760 2869 +109
... and 62 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hypershift-operator/controllers/nodepool/config.go (1)
88-94:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThread the resolved OS stream, not the raw spec field.
getRHELStream()is the contract for this feature: whenspec.osImageStreamis unset, the controller derivesrhel-9/rhel-10from the release. These sites currently cache/forwardnodePool.Spec.OSImageStream.Namedirectly, so the value collapses to""for defaulted NodePools. That leaves the hash path out of sync withstatus.osImageStreamtoday, and it will feed the wrong stream into AWS/GCP image selection as soon as the dependent multi-stream metadata work starts usingrhelStreamafter rebase.Compute the effective stream once from
getRHELStream(nodePool, releaseImage)and thread that value throughrolloutConfig,CAPI, and the platform validation helpers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/config.go` around lines 88 - 94, Compute the effective RHEL stream once by calling getRHELStream(nodePool, releaseImage) and pass that computed value into rolloutConfig.rhelStream instead of using nodePool.Spec.OSImageStream.Name; also propagate the same computed value into any CAPI-related code paths and platform validation helper calls that previously read spec.osImageStream so all consumers use the resolved stream (e.g., update usages in rolloutConfig initialization, CAPI helpers, and platform validation functions to accept/consume the new resolved rhelStream variable).
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/osstream_test.go (1)
17-80: ⚡ Quick winAdd test coverage for getRHELStream error path.
TestGetRHELStreamonly covers success cases. Add a test case for whenreleaseImage.Version()returns an invalid semver string, to verify thatgetRHELStreamreturns an error as expected (Line 26-29 in osstream.go handles this, but it's untested).📋 Proposed test case for invalid semver
expectedStream: "rhel-10", }, + { + name: "When spec.osImageStream.Name is empty and version is invalid, it should return error", + nodePool: &hyperv1.NodePool{ + Spec: hyperv1.NodePoolSpec{}, + }, + releaseImage: &releaseinfo.ReleaseImage{ + ImageStream: &imageapi.ImageStream{ObjectMeta: metav1.ObjectMeta{Name: "invalid-version"}}, + }, + expectErr: true, + }, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/osstream_test.go` around lines 17 - 80, TestGetRHELStream lacks coverage for the error path in getRHELStream when releaseImage.Version() yields an invalid semver; add a new test case in TestGetRHELStream that supplies a releaseinfo.ReleaseImage whose ImageStream.ObjectMeta.Name is an invalid version string (e.g., "not-a-semver"), call getRHELStream(nodePool, releaseImage) and assert that an error is returned (Expect(err).To(HaveOccurred())); reference the existing test harness and types (TestGetRHELStream, getRHELStream, releaseinfo.ReleaseImage, imageapi.ImageStream/ObjectMeta.Name) so the new case is consistent with the other table-driven cases.hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
407-409: ⚡ Quick winLog or surface
getRHELStreamerrors in all three status-update paths.In
nodepool_controller.go(Line 407-409),capi.goreconcileMachineDeploymentStatus (Line 617-619), andcapi.goreconcileMachineSet (Line 1018-1020), errors fromgetRHELStreamare silently ignored withif err == nilguards. The shared root cause is missing error visibility: when OS stream resolution fails (e.g., invalid semver in release version),Status.OSImageStreamremains unset or stale and no log or condition surfaces the issue to the user. As per coding guidelines, "Always check errors in Go — don't ignore them." Add logging or set a condition in each path to surface failures.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines 407 - 409, getRHELStream errors are being ignored in three places (nodepool_controller.go when setting nodePool.Status.OSImageStream, capi.go reconcileMachineDeploymentStatus, and capi.go reconcileMachineSet); update each location to handle the error path: when getRHELStream returns an error, log the error with the controller logger (including err.Error()) and set a clear status condition (e.g., "OSImageStreamResolutionFailed") on the owning object with the error message so users see the failure, and when getRHELStream succeeds clear that condition and set Status.OSImageStream as before; reference the getRHELStream call sites and the Status.OSImageStream update points and ensure status updates are persisted.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/nodepool/osstream.go`:
- Around line 21-35: The getRHELStream function dereferences nodePool
(nodePool.Spec.OSImageStream.Name) and releaseImage (releaseImage.Version())
without nil checks; add defensive nil guards at the top of getRHELStream to
return a clear error if nodePool or releaseImage is nil, and also guard
nodePool.Spec or nodePool.Spec.OSImageStream as needed before accessing Name to
avoid panics, then proceed with semver.Parse on releaseImage.Version() only when
releaseImage is non-nil.
---
Outside diff comments:
In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 88-94: Compute the effective RHEL stream once by calling
getRHELStream(nodePool, releaseImage) and pass that computed value into
rolloutConfig.rhelStream instead of using nodePool.Spec.OSImageStream.Name; also
propagate the same computed value into any CAPI-related code paths and platform
validation helper calls that previously read spec.osImageStream so all consumers
use the resolved stream (e.g., update usages in rolloutConfig initialization,
CAPI helpers, and platform validation functions to accept/consume the new
resolved rhelStream variable).
---
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 407-409: getRHELStream errors are being ignored in three places
(nodepool_controller.go when setting nodePool.Status.OSImageStream, capi.go
reconcileMachineDeploymentStatus, and capi.go reconcileMachineSet); update each
location to handle the error path: when getRHELStream returns an error, log the
error with the controller logger (including err.Error()) and set a clear status
condition (e.g., "OSImageStreamResolutionFailed") on the owning object with the
error message so users see the failure, and when getRHELStream succeeds clear
that condition and set Status.OSImageStream as before; reference the
getRHELStream call sites and the Status.OSImageStream update points and ensure
status updates are persisted.
In `@hypershift-operator/controllers/nodepool/osstream_test.go`:
- Around line 17-80: TestGetRHELStream lacks coverage for the error path in
getRHELStream when releaseImage.Version() yields an invalid semver; add a new
test case in TestGetRHELStream that supplies a releaseinfo.ReleaseImage whose
ImageStream.ObjectMeta.Name is an invalid version string (e.g., "not-a-semver"),
call getRHELStream(nodePool, releaseImage) and assert that an error is returned
(Expect(err).To(HaveOccurred())); reference the existing test harness and types
(TestGetRHELStream, getRHELStream, releaseinfo.ReleaseImage,
imageapi.ImageStream/ObjectMeta.Name) so the new case is consistent with the
other table-driven cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e87df744-44ee-4acb-9a91-4746ce49edd8
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (13)
api/hypershift/v1beta1/nodepool_conditions.gohypershift-operator/controllers/nodepool/aws.gohypershift-operator/controllers/nodepool/aws_test.gohypershift-operator/controllers/nodepool/capi.gohypershift-operator/controllers/nodepool/config.gohypershift-operator/controllers/nodepool/gcp.gohypershift-operator/controllers/nodepool/gcp_test.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/nodepool_controller_test.gohypershift-operator/controllers/nodepool/osstream.gohypershift-operator/controllers/nodepool/osstream_test.gohypershift-operator/controllers/nodepool/token.gohypershift-operator/controllers/nodepool/token_test.go
| func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) { | ||
| if nodePool.Spec.OSImageStream.Name != "" { | ||
| return nodePool.Spec.OSImageStream.Name, nil | ||
| } | ||
|
|
||
| version, err := semver.Parse(releaseImage.Version()) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to parse release image version %q: %w", releaseImage.Version(), err) | ||
| } | ||
|
|
||
| if version.Major >= 5 { | ||
| return "rhel-10", nil | ||
| } | ||
| return "rhel-9", nil | ||
| } |
There was a problem hiding this comment.
Add nil-safety guards to prevent panics.
getRHELStream dereferences nodePool (Line 22) and releaseImage (Line 26) without nil checks. If either parameter is nil, the function will panic. Current callers appear to pass valid pointers, but defensive validation would prevent runtime panics if the contract changes or a new caller is added.
🛡️ Proposed fix to add nil guards
func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) {
+ if nodePool == nil {
+ return "", fmt.Errorf("nodePool cannot be nil")
+ }
+ if releaseImage == nil {
+ return "", fmt.Errorf("releaseImage cannot be nil")
+ }
if nodePool.Spec.OSImageStream.Name != "" {
return nodePool.Spec.OSImageStream.Name, nil
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hypershift-operator/controllers/nodepool/osstream.go` around lines 21 - 35,
The getRHELStream function dereferences nodePool
(nodePool.Spec.OSImageStream.Name) and releaseImage (releaseImage.Version())
without nil checks; add defensive nil guards at the top of getRHELStream to
return a clear error if nodePool or releaseImage is nil, and also guard
nodePool.Spec or nodePool.Spec.OSImageStream as needed before accessing Name to
avoid panics, then proceed with semver.Parse on releaseImage.Version() only when
releaseImage is non-nil.
03a5cef to
8e027da
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
api/hypershift/v1beta1/nodepool_types.go (1)
51-60: 💤 Low valueConsider nil check for defensive safety.
The function dereferences
nodePoolat Line 52 without a nil check. While current callers in context snippets (conditions.go:390) pass valid pointers, adding a guard would prevent panics if a new caller is added or the contract changes.🛡️ Optional defensive nil guard
func validateOSImageStream(nodePool *hyperv1.NodePool) error { + if nodePool == nil { + return fmt.Errorf("nodePool cannot be nil") + } name := nodePool.Spec.OSImageStream.Name🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/hypershift/v1beta1/nodepool_types.go` around lines 51 - 60, Locate the function that dereferences the nodePool parameter (likely in the nodepool_types.go file around the specified line range) and add a nil check guard before the dereference operation. The check should verify that nodePool is not nil before attempting to access its fields or methods, returning an appropriate error or early exit if the pointer is nil. This defensive check prevents potential panics if the function contract changes or new callers pass nil values in the future.hypershift-operator/controllers/nodepool/capi_test.go (2)
2771-2773: ⚡ Quick winAdd nil-safety to OSImageStream assertion.
The assertion at line 2772 directly accesses
nodePool.Status.OSImageStream.Namewithout first checking ifnodePool.Status.OSImageStreamis nil. If the status field is unexpectedly nil whenexpectedOSImageStreamis non-empty, this will panic rather than provide a clear test failure message.Safer assertion pattern
if tc.expectedOSImageStream != "" { + g.Expect(nodePool.Status.OSImageStream).ToNot(BeNil(), "OSImageStream should be set when stream is expected") g.Expect(nodePool.Status.OSImageStream.Name).To(Equal(tc.expectedOSImageStream)) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/capi_test.go` around lines 2771 - 2773, In the test assertion block where tc.expectedOSImageStream is checked, add a nil-safety check before accessing the Name field. When tc.expectedOSImageStream is non-empty, first verify that nodePool.Status.OSImageStream is not nil before attempting to access nodePool.Status.OSImageStream.Name in the Expect call. This will prevent a panic and provide a clear test failure message if the OSImageStream field is unexpectedly nil.
2636-2674: ⚡ Quick winAdd test case to verify OSImageStream status population on rollout completion.
The test infrastructure includes an
expectedOSImageStreamfield and conditional assertion (lines 2771-2773), but no test case actually verifies thatnodePool.Status.OSImageStreamis populated when the MachineDeployment completes. The test case at line 2652 setsexpectedOSImageStream: "", which skips the assertion entirely.According to the layer description,
reconcileMachineDeploymentStatusshould callgetRHELStreamand populatenodePool.Status.OSImageStreamwhen the deployment reaches completion. Add a test case that sets a non-emptyexpectedOSImageStreamvalue and verifies the status field is correctly populated.Consider also updating the test name at line 2652 to reflect what is actually being verified, or add a separate test case specifically for OSImageStream status population.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/capi_test.go` around lines 2636 - 2674, Add a new test case to the testCases slice in the TestReconcileMachineDeploymentStatus function that verifies OSImageStream status population when a MachineDeployment completes. This test case should have a non-empty expectedOSImageStream value (such as a valid RHEL stream identifier) and match the completion conditions of the existing test case (all replica counts at 3, ObservedGeneration matching Generation). This will ensure the conditional assertion at lines 2771-2773 that checks nodePool.Status.OSImageStream is actually exercised with a meaningful verification rather than being skipped due to an empty expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 406-408: In the nodepool_controller.go file, the call to
getRHELStream is silently ignoring errors by only handling the success case with
if err == nil. Instead of dropping the error, properly handle failure cases by
either logging the error with context and returning it to trigger
reconciliation, or updating the NodePool status to reflect the error condition.
This ensures that stream-resolution failures are visible and the status remains
synchronized with the actual state rather than remaining stale and unset.
In `@hypershift-operator/controllers/nodepool/token.go`:
- Around line 358-359: The assignment of TokenSecretOSStreamKey to
tokenSecret.Data only executes within a conditional block that checks if
tokenSecret.Data is nil, meaning existing token secrets never get the os-stream
field populated on reconciliation. Move the line
tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.rhelStream) outside of the
nil-check conditional so it runs on every reconciliation, ensuring all token
secrets have the os-stream field persisted consistently. Additionally, add unit
tests to verify that the os-stream field is set during both initial creation and
subsequent reconciliations, and add e2e tests to validate this behavior impacts
consumer behavior correctly.
---
Nitpick comments:
In `@api/hypershift/v1beta1/nodepool_types.go`:
- Around line 51-60: Locate the function that dereferences the nodePool
parameter (likely in the nodepool_types.go file around the specified line range)
and add a nil check guard before the dereference operation. The check should
verify that nodePool is not nil before attempting to access its fields or
methods, returning an appropriate error or early exit if the pointer is nil.
This defensive check prevents potential panics if the function contract changes
or new callers pass nil values in the future.
In `@hypershift-operator/controllers/nodepool/capi_test.go`:
- Around line 2771-2773: In the test assertion block where
tc.expectedOSImageStream is checked, add a nil-safety check before accessing the
Name field. When tc.expectedOSImageStream is non-empty, first verify that
nodePool.Status.OSImageStream is not nil before attempting to access
nodePool.Status.OSImageStream.Name in the Expect call. This will prevent a panic
and provide a clear test failure message if the OSImageStream field is
unexpectedly nil.
- Around line 2636-2674: Add a new test case to the testCases slice in the
TestReconcileMachineDeploymentStatus function that verifies OSImageStream status
population when a MachineDeployment completes. This test case should have a
non-empty expectedOSImageStream value (such as a valid RHEL stream identifier)
and match the completion conditions of the existing test case (all replica
counts at 3, ObservedGeneration matching Generation). This will ensure the
conditional assertion at lines 2771-2773 that checks
nodePool.Status.OSImageStream is actually exercised with a meaningful
verification rather than being skipped due to an empty expected value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8ebdba94-e5e5-4bf0-8e2a-0fa112fff63f
⛔ Files ignored due to path filters (5)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (16)
api/hypershift/v1beta1/nodepool_types.gohypershift-operator/controllers/nodepool/aws.gohypershift-operator/controllers/nodepool/aws_test.gohypershift-operator/controllers/nodepool/capi.gohypershift-operator/controllers/nodepool/capi_test.gohypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/config.gohypershift-operator/controllers/nodepool/config_test.gohypershift-operator/controllers/nodepool/gcp.gohypershift-operator/controllers/nodepool/gcp_test.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/nodepool_controller_test.gohypershift-operator/controllers/nodepool/osstream.gohypershift-operator/controllers/nodepool/osstream_test.gohypershift-operator/controllers/nodepool/token.gohypershift-operator/controllers/nodepool/token_test.go
✅ Files skipped from review due to trivial changes (2)
- hypershift-operator/controllers/nodepool/nodepool_controller_test.go
- hypershift-operator/controllers/nodepool/aws_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- hypershift-operator/controllers/nodepool/config.go
- hypershift-operator/controllers/nodepool/aws.go
- hypershift-operator/controllers/nodepool/gcp_test.go
- hypershift-operator/controllers/nodepool/gcp.go
- hypershift-operator/controllers/nodepool/capi.go
- hypershift-operator/controllers/nodepool/config_test.go
49e64cb to
ebde395
Compare
|
I'll rebase this once #8699 will be merged |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/nodepool/capi_test.go`:
- Around line 2771-2773: The assertion for nodePool.Status.OSImageStream.Name is
conditionally guarded, which skips verification when tc.expectedOSImageStream is
empty. This allows incorrect non-empty status values to pass undetected in test
cases expecting empty values. Remove the conditional guard `if
tc.expectedOSImageStream != ""` and assert nodePool.Status.OSImageStream.Name
unconditionally against tc.expectedOSImageStream to ensure all test cases
properly verify the expected OS image stream value regardless of whether it is
empty or not.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b312f697-6295-426e-b155-a62af1d22c0d
⛔ Files ignored due to path filters (5)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (16)
api/hypershift/v1beta1/nodepool_types.gohypershift-operator/controllers/nodepool/aws.gohypershift-operator/controllers/nodepool/aws_test.gohypershift-operator/controllers/nodepool/capi.gohypershift-operator/controllers/nodepool/capi_test.gohypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/config.gohypershift-operator/controllers/nodepool/config_test.gohypershift-operator/controllers/nodepool/gcp.gohypershift-operator/controllers/nodepool/gcp_test.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/nodepool_controller_test.gohypershift-operator/controllers/nodepool/osstream.gohypershift-operator/controllers/nodepool/osstream_test.gohypershift-operator/controllers/nodepool/token.gohypershift-operator/controllers/nodepool/token_test.go
🚧 Files skipped from review as they are similar to previous changes (13)
- hypershift-operator/controllers/nodepool/nodepool_controller_test.go
- hypershift-operator/controllers/nodepool/token_test.go
- hypershift-operator/controllers/nodepool/capi.go
- hypershift-operator/controllers/nodepool/aws.go
- hypershift-operator/controllers/nodepool/gcp_test.go
- hypershift-operator/controllers/nodepool/config.go
- hypershift-operator/controllers/nodepool/osstream_test.go
- hypershift-operator/controllers/nodepool/aws_test.go
- hypershift-operator/controllers/nodepool/nodepool_controller.go
- hypershift-operator/controllers/nodepool/gcp.go
- hypershift-operator/controllers/nodepool/config_test.go
- api/hypershift/v1beta1/nodepool_types.go
- hypershift-operator/controllers/nodepool/token.go
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
Note: #8699 (CNTRLPLANE-3026) is still pending merge. Once it lands, this PR will need a rebase — defaultNodePoolAMI signature and behavior change significantly there (streamName becomes functional via StreamForName). The //nolint:unparam and TODOs referencing CNTRLPLANE-3553 for that function will be obsolete after rebase.
We'll do multiple review rounds as the dependency chain settles.
| // behavior: no OSImageStream CR is generated and the MCC uses | ||
| // BaseOSContainerImage from ControllerConfig as-is). | ||
| func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) { | ||
| if nodePool.Spec.OSImageStream.Name != "" { |
There was a problem hiding this comment.
This reimplements GetRHELStream() from stream.go in the same package, but with divergent semantics: defaultRHELStream says major >= 5 → rhel-10 unconditionally, while GetRHELStream handles major >= 5 && usesRunc → rhel-9. These two functions will return different answers for 5.x + runc.
Same with validateOSImageStream — GetRHELStream already validates unknown stream names and rejects invalid combinations (rhel-10 on 4.x, rhel-10 with runc).
Can these delegate to GetRHELStream instead of reimplementing? Having two resolution paths for the same question in the same package is a maintenance hazard — when a new stream is added, both must be updated.
There was a problem hiding this comment.
Done. Both defaultRHELStream, getRHELStream, and validateOSImageStream now delegate to GetRHELStream from stream.go. The duplicate resolution logic is removed — see 63532a2.
| if rhelStream != "" { | ||
| defaultStream, err := defaultRHELStream(releaseImage) | ||
| if err == nil && rhelStream == defaultStream { | ||
| rhelStream = "" |
There was a problem hiding this comment.
This normalization is the most load-bearing part of the PR — it's what prevents spurious fleet-wide rollouts when a user explicitly sets the default stream. But `TestNewConfigGenerator` has zero test cases that set `nodePool.Spec.OSImageStream`.
Since the plan is to consolidate on `GetRHELStream()` from `stream.go` (see osstream.go comment), the normalization internals will change. But the observable contract should be tested now:
- Setting `osImageStream.Name` to the version-derived default (e.g. `"rhel-9"` on 4.x) must produce the same hash as not setting it at all (no spurious rollout).
- Setting a non-default stream (e.g. `"rhel-10"` on 4.x) must produce a different hash (rollout triggered).
These tests verify the hash stability guarantee regardless of which internal function does the resolution.
There was a problem hiding this comment.
Done. Added TestOSImageStreamHashStability in config_test.go covering both cases:
- Setting
osImageStream.Nameto the version-derived default (e.g."rhel-9"on 4.x) produces the same hash as not setting it (no spurious rollout). - Setting a non-default stream (e.g.
"rhel-10"on 4.x) produces a different hash (rollout triggered).
The TestConfigHash and TestConfigHashWithoutVersion tables also include rhelStream cases — see 0b8280f and 257394d.
|
/lgtm |
|
Scheduling tests matching the |
|
/retest required |
|
/retest-required |
|
/verified by unit tests |
|
@jparrill: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
…geStream is unset getRHELStream was always resolving to a concrete stream name (e.g. "rhel-9") even when spec.osImageStream.Name was empty. This caused StreamForName to use the OSStreams lookup path instead of the legacy StreamMetadata path, potentially producing a different AMI and changing the AWSMachineTemplate spec hash. During HyperShift Operator upgrades this triggered UpdatingPlatformMachineTemplate=True and an unintended node rollout. Return "" from getRHELStream when osImageStream.Name is unset so that all callers (CAPI machine templates, conditions, token secrets) pass an empty stream to StreamForName, preserving the pre-PR behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrade test regression: spurious
|
|
New changes are detected. LGTM label has been removed. |
|
/test e2e-aws-upgrade-hypershift-operator |
|
/test e2e-aks |
|
/test e2e-aws |
|
/test e2e-kubevirt-aws-ovn-reduced |
|
/test e2e-aws-4-22 |
|
/test e2e-aks-4-22 |
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| // Default behavior for Linux/RHCOS AMIs. | ||
| // Use the resolved stream (via GetRHELStream) rather than the raw spec field, | ||
| // so that the AMI lookup is consistent with the CAPI path. | ||
| rhelStream, err := getRHELStream(nodePool, releaseImage) |
There was a problem hiding this comment.
This new error path (getRHELStream failure -> condition False + return error) is untested. TestSetAWSConditions doesn't have a case with an invalid osImageStream.Name — e.g. rhel-10 on a 4.x release. Worth adding one to make sure the condition message surfaces correctly.
| tokenSecret.Data[TokenSecretHCConfigurationHashKey] = t.globalConfigHash | ||
| // TODO(CNTRLPLANE-3553): consumed by the ignition-server's TokenSecretReconciler once | ||
| // multi-stream ignition support lands. Until then this key is written but not read downstream. | ||
| tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.resolvedRHELStream) |
There was a problem hiding this comment.
This key is written now but consumed downstream later (per the TODO). If someone refactors the tokenSecret.Data == nil init block, the key could be dropped silently — nothing would catch it until the ignition-server starts reading it in a separate PR, likely weeks later. A test asserting tokenSecret.Data["os-stream"] exists with the expected value would catch that at the source. Same applies to the other keys in this block if they're not already tested.
| // changes during HyperShift Operator upgrades. | ||
| // When the field is explicitly set, it delegates to GetRHELStream for | ||
| // version-aware validation and runc constraint checking. | ||
| func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) { |
There was a problem hiding this comment.
Minor test gap: there's no case for a set osImageStream.Name with an unparsable version (e.g. Name: "rhel-9" + version "not-a-version"). The empty-name + unparsable case is covered (returns "" via short-circuit), but the set-name path through semver.Parse at line 24 would fail and return an error — worth one test case.
| }{ | ||
| { | ||
| name: "When MachineDeployment is complete, it should update nodePool version and annotations", | ||
| name: "When MachineDeployment is complete, it should update nodePool version", |
There was a problem hiding this comment.
Is this rename intentional? The test behavior didn't change, just the name. If there's no reason for it, I'd revert to avoid noise in git blame.
There was a problem hiding this comment.
Right. Sorry for this, TY!
…erage - Add TestSetAWSConditions case for invalid osImageStream (rhel-10 on 4.x) - Add TODO for integration test on validMachineConfigCondition path - Assert os-stream key in token secret reconciliation test - Add getRHELStream test for set name with unparsable version - Revert unintentional test rename in capi_test.go Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Now I have all the evidence. Both jobs were aborted by the Prow trigger plugin at the exact same time (07:43:27Z). The kubevirt job had a genuine ingress degradation flake on one cluster (while the other cluster passed), and the AKS job was killed mid-execution with all 36 failures being context cancellation cascades. Neither failure is related to the PR's code changes. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryNeither failure is caused by the PR's code changes. Both jobs were externally aborted by the Prow trigger plugin at the exact same timestamp (07:43:27Z), indicating a new push or re-trigger on PR #8730. The kubevirt job had one genuine infrastructure flake — the ingress ClusterOperator degraded on the single-node autoscaling HostedCluster (while the parallel Root CauseBoth jobs: Aborted by Prow trigger plugin — e2e-kubevirt-aws-ovn-reduced (2 failures / 47 tests):
e2e-aks-4-22 (36 failures / 278 tests):
The PR's code changes are not implicated. The Recommendations
Evidence
|
|
@sdminonne: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| nodePool *hyperv1.NodePool | ||
| controlplaneNamespace string | ||
| // resolvedRHELStream is the explicit RHEL stream name resolved via | ||
| // getRHELStream. Unlike rhelStream (which may be empty after normalization |
There was a problem hiding this comment.
I see it's in awsMachineTemplateSpec(
this always holds a concrete stream name (e.g.
// "rhel-9", "rhel-10")
this seems to contradict https://github.com/openshift/hypershift/pull/8730/changes#diff-c54291673a20f27874b97852879404ae26450496aeda317cf79561255f0d8f1dR21 which returns empty
| "github.com/blang/semver" | ||
| ) | ||
|
|
||
| // getRHELStream returns the effective RHEL stream for the NodePool. |
There was a problem hiding this comment.
When spec.osImageStream.Name is unset, it returns "" so that callers
// pass an empty stream to StreamForName, preserving the legacy
// StreamMetadata
That doesn't seem right if the nodepool is on >5
| ObjectMeta: metav1.ObjectMeta{ | ||
| Annotations: tc.nodePoolAnnotations, | ||
| }, | ||
| Spec: tc.nodePool, |
There was a problem hiding this comment.
can we add test cases for resolveAWSAMI, setKarpenterAMILabels, and gcpMachineTemplateSpec with all possible rhelStream values
Summary
Wire
spec.osImageStreamandstatus.osImageStreaminto the NodePoolreconciliation loop: validation, config hash (triggering rollouts on
stream change), token secret propagation, observation-based status
reporting, and boot image function parameter threading.
Aligns with the dual-stream RHEL NodePool enhancement
Phase 2 controller plumbing.
Dependencies
CNTRLPLANE-3023: CEL rule to prevent osImageStream removal, API constants)Changes
Stream resolution (
stream.go,osstream.go)GetRHELStream()returns:spec.osImageStream.namewhen set (with validation)"rhel-10"when unset and release >= 5.0"rhel-9"when unset and release < 5.0 (always returns a concretestream name so that downstream consumers like
StreamForName()workcorrectly when legacy
StreamMetadatais removed)getRHELStream()wrapper inosstream.goextracts spec and versionfrom NodePool/ReleaseImage and delegates to
GetRHELStream()validateOSImageStream()delegates toGetRHELStream()forconsistent validation (no duplicate resolution logic)
Validation (
conditions.go)NewConfigGeneratorinsidevalidMachineConfigConditionfor fail-fast on invalid stream namesbefore expensive MCO config generation
Config hash integration (
config.go)rhelStreamtorolloutConfig,Hash(), andHashWithoutVersion()rhelStreaminNewConfigGenerator: when the explicitspec.osImageStream.namematches the version-derived default (e.g."rhel-9"on a 4.x release), it is kept as""so the hash doesn'tchange and no spurious rollout is triggered. Only a non-default stream
produces a different hash
resolvedRHELStreamonConfigGenerator(notrolloutConfig)for downstream consumers that need a concrete stream name (GCP, AWS
AMI, token secret)
Token secret (
token.go)os-streamkey into the token secret for future ignition-serverconsumption (TODO: not yet read downstream)
Status observation (
version.go)rhcosStreamFromOSImage()parses MachineNodeInfo.OSImagetodetermine RHEL generation (RHCOS 4xx → rhel-9, 5xx → rhel-10)
osImageStreamFromMachines()determines the observed stream usingstrict majority consensus (
count > total/2) across the poolsetOSImageStreamStatus()setsstatus.osImageStreamwhen amajority of machines report a consistent stream, aligning with the
enhancement's observation-based status design
Boot image threading (
aws.go,gcp.go)defaultNodePoolAMI,defaultNodePoolGCPImage,setAWSConditions, and their callers viagetRHELStream()for consistent boot image resolutionWhat's not yet implemented (future work)
usesRuncguard (explicit rhel-10 + runc → error, implicit >=5.0 + runc → fallback to rhel-9)GetPayload())StreamMetadata/OSStreamsnaming consolidation inReleaseImagestructTest plan
TestGetRHELStream— covers explicit stream, 4.x/5.x/6.x defaults, unparsable versionTestValidateOSImageStream— covers empty, valid, and invalid stream namesTestOSImageStreamHashStability— covers default stream producing same hash, non-default producing different hashTestHash/TestHashWithoutVersion— covers non-default stream changing the hashTestRhcosStreamFromOSImage— covers RHCOS 4xx/5xx parsing, unknown versions, unrecognised OSTestOsImageStreamFromMachines— covers majority consensus: single replica, all same, majority during upgrade, even split, no NodeInfo, unrecognised OSTestSetOSImageStreamStatus— covers integration with fake client: no machines, all RHEL 9, majority RHEL 10, preserve previous statusTestReconcileMachineDeploymentStatus— covers config annotation updatemake verifypasses clean🤖 Generated with Claude Code