CNTRLPLANE-3624: e2e v2 test for control-plane-pki-operator TLS configuration#8793
CNTRLPLANE-3624: e2e v2 test for control-plane-pki-operator TLS configuration#8793kaleemsiddiqu wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughA new e2e test file is added under 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/e2e/v2/tests/control_plane_pki_operator_test.go (2)
246-247: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winReplace hardcoded sleep with Eventually polling.
A hardcoded 60-second sleep is fragile — the pod might need more or less time to restart. Per v2 framework conventions, use
Eventuallyfor state changes that occur over time.Consider polling until the pod has restarted (e.g., by checking that the pod's creation timestamp is newer than when the ConfigMap was updated, or by verifying the pod is Running and ready).
♻️ Suggested approach
// Wait for PKI operator pod to restart with new TLS config Eventually(func(g Gomega) { pkiPodList := &corev1.PodList{} g.Expect(mgmtClient.List(tc.Context, pkiPodList, crclient.InNamespace(tc.ControlPlaneNamespace), crclient.MatchingLabels{"app": "control-plane-pki-operator"}, )).To(Succeed()) g.Expect(pkiPodList.Items).NotTo(BeEmpty()) for _, pod := range pkiPodList.Items { g.Expect(pod.Status.Phase).To(Equal(corev1.PodRunning)) // Optionally check Ready condition for _, cond := range pod.Status.Conditions { if cond.Type == corev1.PodReady { g.Expect(cond.Status).To(Equal(corev1.ConditionTrue)) } } } }, 2*time.Minute, 5*time.Second).Should(Succeed(), "PKI operator pod should be running and ready")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/control_plane_pki_operator_test.go` around lines 246 - 247, Replace the hardcoded time.Sleep call with an Eventually polling loop that waits for the PKI operator pod to actually restart and become ready. Use the mgmtClient to list pods in the ControlPlaneNamespace with the label selector matching the control-plane-pki-operator, then poll until the pod's Phase is Running and its Ready condition is True. Set the timeout to around 2 minutes with a polling interval of 5 seconds to accommodate variable restart times while avoiding fragile hardcoded waits.Source: Coding guidelines
54-54: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueFeature annotation mismatch with outer Describe block.
The
Whenblock uses[Feature:PKIOperatorTLS]but the outerDescribeat line 463 uses[Feature:PKIOperator]. Per AGENTS.md section 19, for single-feature files the Feature annotation should be on theDescribelevel only. Having different Feature tags at different levels may cause confusion in Sippy Component Readiness mapping.Consider removing the Feature annotation from the
Whenblock since the outerDescribealready declares the feature scope.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/control_plane_pki_operator_test.go` at line 54, The `When` block at line 54 declares `[Feature:PKIOperatorTLS]` which conflicts with the outer `Describe` block's `[Feature:PKIOperator]` annotation. Per AGENTS.md section 19, for single-feature files the Feature annotation should only be declared at the `Describe` level. Remove the `[Feature:PKIOperatorTLS]` annotation from the `When` block description string, keeping only the descriptive text "PKI operator TLS configuration is modified" along with the `Ordered` parameter, since the outer `Describe` block already establishes the feature scope.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/control_plane_pki_operator_test.go`:
- Around line 217-220: The NotFound error handling block in the Eventually loop
has unreachable code due to a logical conflict. When apierrors.IsNotFound(err)
is true, the g.Expect(err).NotTo(HaveOccurred()) assertion will fail immediately
since err is non-nil, and the return statement will never execute. Fix this by
removing either the if check and its Expect statement (to let the subsequent
g.Expect handle all errors) or by removing the g.Expect assertion and keeping
only the return (if the intent is to retry the Eventually loop when ConfigMap
doesn't exist).
---
Nitpick comments:
In `@test/e2e/v2/tests/control_plane_pki_operator_test.go`:
- Around line 246-247: Replace the hardcoded time.Sleep call with an Eventually
polling loop that waits for the PKI operator pod to actually restart and become
ready. Use the mgmtClient to list pods in the ControlPlaneNamespace with the
label selector matching the control-plane-pki-operator, then poll until the
pod's Phase is Running and its Ready condition is True. Set the timeout to
around 2 minutes with a polling interval of 5 seconds to accommodate variable
restart times while avoiding fragile hardcoded waits.
- Line 54: The `When` block at line 54 declares `[Feature:PKIOperatorTLS]` which
conflicts with the outer `Describe` block's `[Feature:PKIOperator]` annotation.
Per AGENTS.md section 19, for single-feature files the Feature annotation should
only be declared at the `Describe` level. Remove the `[Feature:PKIOperatorTLS]`
annotation from the `When` block description string, keeping only the
descriptive text "PKI operator TLS configuration is modified" along with the
`Ordered` parameter, since the outer `Describe` block already establishes the
feature scope.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: dd603fd6-5224-45c7-9d8c-c5f22c060279
📒 Files selected for processing (1)
test/e2e/v2/tests/control_plane_pki_operator_test.go
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8793 +/- ##
=======================================
Coverage 43.28% 43.28%
=======================================
Files 771 771
Lines 95503 95503
=======================================
Hits 41335 41335
Misses 51284 51284
Partials 2884 2884
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
857b5f7 to
e8f538f
Compare
csrwng
left a comment
There was a problem hiding this comment.
Review of PR #8793 — e2e v2 test for control-plane-pki-operator TLS configuration.
Overall the test structure follows the v2 framework conventions well. Found one likely bug in the conflict-handling pattern (appears twice), a cleanup gap in AfterAll, and several robustness/maintainability improvements.
11 comments total: 1 bug, 1 cleanup gap, 9 suggestions.
| hostedCluster := tc.GetHostedCluster() | ||
|
|
||
| // Check if profile is default or intermediate (both support TLS 1.2) | ||
| if hostedCluster.Spec.Configuration != nil && |
There was a problem hiding this comment.
Fragile guard: "not Modern" != "supports TLS 1.2"
This skip guard uses != configv1.TLSProfileModernType as a proxy for "should have minTLSVersion TLS 1.2", but if a future profile type is introduced (e.g., Custom with TLS 1.3-only ciphers), this test would run and incorrectly assert VersionTLS12. Consider guarding on the positive case instead — skip unless the profile is nil (default) or explicitly IntermediateType.
The same pattern applies at lines 125-129.
| if hostedCluster.Spec.Configuration != nil && | ||
| hostedCluster.Spec.Configuration.APIServer != nil && | ||
| hostedCluster.Spec.Configuration.APIServer.TLSSecurityProfile != nil && | ||
| hostedCluster.Spec.Configuration.APIServer.TLSSecurityProfile.Type == configv1.TLSProfileModernType { |
There was a problem hiding this comment.
Nit: extract the TLS profile type check into a helper
This same multi-line nil-check-then-compare pattern appears in at least 4 tests (lines 95-99, 125-129, 239-242, 319-322). Consider extracting it into a small helper like hostedClusterHasTLSProfileType(hc, profileType) bool to reduce repetition and make the skip guards more readable.
|
|
||
| // Get control-plane-pki-operator pod from management cluster | ||
| mgmtClient := tc.MgmtClient | ||
| mgmtRestConfig, err := e2eutil.GetConfig() |
There was a problem hiding this comment.
Nit: extract mgmt REST config and kube client setup
e2eutil.GetConfig() + kubernetes.NewForConfig() is repeated in 3 tests (here, line 247, line 389). Consider setting these up once in BeforeAll alongside tc and storing them as shared variables, or extracting into a helper.
| GinkgoWriter.Printf("Testing TLS connections to PKI operator pod %s on port %s\n", pkiPod.Name, metricsPort) | ||
|
|
||
| // Test TLS 1.2 connection using openssl s_client from within the PKI operator pod | ||
| tls12Result, _ := v2util.RunCommandInPod(tc.Context, mgmtKubeClient, mgmtRestConfig, |
There was a problem hiding this comment.
Bug risk: RunCommandInPod error is silently discarded
The error return is ignored here and in every other RunCommandInPod call in this file (lines 157, 165, 280, 288, 422, 430). While || true prevents a non-zero exit from the shell command itself, an infrastructure-level failure (e.g., pod exec API error, container not found) would also be silently swallowed, and the test would assert against an empty or garbage string. Consider checking the error:
tls12Result, err := v2util.RunCommandInPod(...)
Expect(err).NotTo(HaveOccurred(), "failed to exec into PKI operator pod")| // the control-plane-pki-operator config reflects the correct minTLSVersion and that | ||
| // the API server enforces those TLS versions correctly. | ||
| func VerifyPKIOperatorTLSConfigTest(getTestCtx internal.TestContextGetter) { | ||
| When("PKI operator TLS configuration is modified", Ordered, func() { |
There was a problem hiding this comment.
Add Label("Lifecycle"): this container includes tests that mutate the HostedCluster
Per the v2 framework guidelines (Rule 13), only lifecycle tests may mutate cluster state. The It blocks at lines 173 and 307 update the HostedCluster's TLS security profile. Adding Label("Lifecycle") here ensures CI can filter these appropriately (longer timeouts, avoid parallel execution with other mutation tests):
| When("PKI operator TLS configuration is modified", Ordered, func() { | |
| When("PKI operator TLS configuration is modified", Ordered, Label("Lifecycle"), func() { |
| "TLS 1.2 connection should be rejected with modern profile") | ||
| }) | ||
|
|
||
| It("should downgrade to minTLSVersion VersionTLS12 when Modern TLS profile is removed", func() { |
There was a problem hiding this comment.
This test mutates the HostedCluster (removes TLS profile to downgrade to Intermediate). The containing When block should carry Label("Lifecycle") per framework Rule 13.
|
|
||
| // Update to Modern TLS profile in the HostedCluster CR | ||
| hostedCluster.Spec.Configuration.APIServer.TLSSecurityProfile = &configv1.TLSSecurityProfile{ | ||
| Type: configv1.TLSProfileModernType, |
There was a problem hiding this comment.
Bug: conflict-retry pattern silently succeeds without applying the update
g.Expect(err).To(HaveOccurred(), ...) on a non-nil err always passes, so when a conflict occurs this Eventually iteration returns successfully — Eventually considers the body satisfied and stops retrying. The update was never applied, but the test proceeds as if it was.
To actually retry on conflict, you need to fail the iteration explicitly:
| Type: configv1.TLSProfileModernType, | |
| err = mgmtClient.Update(tc.Context, hostedCluster) | |
| g.Expect(apierrors.IsConflict(err)).To(BeFalse(), "conflict on update, retrying") | |
| g.Expect(err).NotTo(HaveOccurred(), "failed to update HostedCluster TLS profile to Modern") |
The same bug appears at line 335-337 in the downgrade test.
|
|
||
| pkiPod = &pkiPodList.Items[0] | ||
| g.Expect(pkiPod.Status.Phase).To(Equal(corev1.PodRunning), | ||
| "control-plane-pki-operator pod should be running") |
There was a problem hiding this comment.
Vacuous pass: PodReady condition may not exist
Per framework Rule 16, condition-search loops need a found guard. If the pod has no PodReady condition in its status (e.g., during init), no assertion fires and the Eventually iteration succeeds without actually verifying readiness. Same issue at lines 410-414.
| "control-plane-pki-operator pod should be running") | |
| readyFound := false | |
| for _, cond := range pkiPod.Status.Conditions { | |
| if cond.Type == corev1.PodReady { | |
| readyFound = true | |
| g.Expect(cond.Status).To(Equal(corev1.ConditionTrue), | |
| "control-plane-pki-operator pod should be ready") | |
| } | |
| } | |
| g.Expect(readyFound).To(BeTrue(), "PodReady condition not found on pod %s", pkiPod.Name) |
| fmt.Sprintf("timeout 2 openssl s_client -connect localhost:%s -tls1_2 2>&1 || true", metricsPort)) | ||
|
|
||
| // TLS 1.2 should be rejected - expect error or alert in output, or no TLSv1.2 in successful output | ||
| lowerResult := strings.ToLower(tls12Result) |
There was a problem hiding this comment.
Overly broad assertion: ContainSubstring("error") matches successful connections too
The Or() matcher has several issues:
ContainSubstring("error")is too broad — even a successful openssl handshake outputs debug lines containing "error" (e.g.,Verify return code: ...preceded by error-related lines,error writing to BIO). This arm would match on a successful TLS 1.2 connection, making the test always pass.ContainSubstring("alert")is also common in successful TLS output.Not(ContainSubstring("protocol : tlsv1.2"))with the double space is openssl-version-dependent formatting.
Consider asserting the negative instead — that the connection was NOT successfully negotiated at TLS 1.2:
Expect(lowerResult).NotTo(
And(
ContainSubstring("new,"),
ContainSubstring("tlsv1.2"),
),
"TLS 1.2 connection should be rejected with modern profile, got: %s", tls12Result)Or check for the specific openssl error that TLS 1.3-only servers return: ContainSubstring("no protocols available") or ContainSubstring("unsupported protocol").
|
|
||
| AfterAll(func() { | ||
| // Restore original TLS profile if it was captured | ||
| if originalTLSProfile != nil && tc != nil { |
There was a problem hiding this comment.
Cleanup gap: AfterAll doesn't restore when the original TLS profile was nil
If the HostedCluster starts with no TLS profile (the common default — originalTLSProfile is nil), this cleanup block is entirely skipped. If the test fails after setting Modern (line 195) but before the downgrade test runs (line 307), AfterAll won't restore the HC — it stays mutated with Modern. With Ordered, a failure in any earlier It causes later ones to be skipped, so this is a realistic scenario.
Consider always running the cleanup and handling the nil case:
| if originalTLSProfile != nil && tc != nil { | |
| AfterAll(func() { | |
| if tc == nil { | |
| return | |
| } | |
| GinkgoWriter.Printf("Restoring original TLS security profile\n") | |
| err := wait.PollUntilContextTimeout(tc.Context, 5*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) { | |
| hostedCluster := &hyperv1.HostedCluster{} | |
| err := tc.MgmtClient.Get(ctx, crclient.ObjectKey{ | |
| Namespace: tc.ClusterNamespace, | |
| Name: tc.ClusterName, | |
| }, hostedCluster) | |
| if err != nil { | |
| return false, err | |
| } | |
| if hostedCluster.Spec.Configuration == nil { | |
| hostedCluster.Spec.Configuration = &hyperv1.ClusterConfiguration{} | |
| } | |
| if hostedCluster.Spec.Configuration.APIServer == nil { | |
| hostedCluster.Spec.Configuration.APIServer = &configv1.APIServerSpec{} | |
| } | |
| hostedCluster.Spec.Configuration.APIServer.TLSSecurityProfile = originalTLSProfile |
This way, if originalTLSProfile is nil, it sets the profile back to nil (restoring the default).
e8f538f to
92505ae
Compare
ricardomaraschini
left a comment
There was a problem hiding this comment.
Have you considered making this part of the existing test on the origin repository ?
|
|
||
| // VerifyPKIOperatorTLSConfigTest validates that when TLS security profile changes are applied, | ||
| // the control-plane-pki-operator config reflects the correct minTLSVersion and that | ||
| // the API server enforces those TLS versions correctly. |
There was a problem hiding this comment.
What API server are we referring to here ?
There was a problem hiding this comment.
Updated text with better description.
| }, cm) | ||
|
|
||
| if apierrors.IsNotFound(err) { | ||
| Skip(fmt.Sprintf("ConfigMap %s/%s not found - this test requires the PKI operator config to be present", |
There was a problem hiding this comment.
Why isn't this a failure instead ?
| isDefaultOrIntermediate := !hasProfile || hostedClusterHasTLSProfileType(hostedCluster, configv1.TLSProfileIntermediateType) | ||
|
|
||
| if !isDefaultOrIntermediate { | ||
| Skip("HostedCluster does not have default or Intermediate TLS profile - skipping TLS 1.2 test") |
There was a problem hiding this comment.
Why we keep skipping tests ? Shouldn't we make sure the transitions work ?
There was a problem hiding this comment.
This is just an additional check here before the transition (upgrade/downgrade) in configmap itself before upgrade transition
Those tests are very broad and covers things in general involving all related operators on both standard cluster and hypershift, this is specific to hypershift only thats why i thought to keep the test for this here only. |
92505ae to
cac6721
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng, kaleemsiddiqu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cac6721 to
3ea0648
Compare
|
@kaleemsiddiqu: This pull request references CNTRLPLANE-3624 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest |
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| // VerifyPKIOperatorTLSConfigTest validates that when TLS security profile changes are applied | ||
| // to the HostedCluster, the control-plane-pki-operator config reflects the correct minTLSVersion | ||
| // and that the control-plane-pki-operator's HTTPS endpoint enforces those TLS versions correctly. | ||
| func VerifyPKIOperatorTLSConfigTest(getTestCtx internal.TestContextGetter) { |
There was a problem hiding this comment.
Needs Serial here too. This test mutates the HC TLS profile and triggers pod restarts — without Serial, parallel tests checking pod health or certs will flake. See backup_restore_test.go for the pattern.
| Expect(configYAML).To(ContainSubstring("TLS"), "config.yaml should contain TLS configuration") | ||
| }) | ||
|
|
||
| It("should have minTLSVersion set to VersionTLS12 with default/intermediate profile", func() { |
There was a problem hiding this comment.
This assertion is meaningless — "TLS" appears everywhere in field names and cipher strings. The subsequent tests already validate the real config. Remove this line.
|
|
||
| // Get control-plane-pki-operator pod from management cluster | ||
| mgmtClient := tc.MgmtClient | ||
| pkiPodList := &corev1.PodList{} |
There was a problem hiding this comment.
Declared 3 times as a local variable. Hoist to a constant next to pkiOperatorConfigMapName.
There was a problem hiding this comment.
Moved as constant and replaced wherever it was used.
| GinkgoWriter.Printf("Testing TLS connections to PKI operator pod %s on port %s\n", pkiPod.Name, metricsPort) | ||
|
|
||
| // Test TLS 1.2 connection using openssl s_client from within the PKI operator pod | ||
| tls12Result, err := v2util.RunCommandInPod(tc.Context, mgmtKubeClient, mgmtRestConfig, |
There was a problem hiding this comment.
g.Expect(apierrors.IsConflict(err)).To(BeFalse()) works by accident but reads wrong. Use:
if apierrors.IsConflict(err) {
return
}
g.Expect(err).NotTo(HaveOccurred())Same at lines 287-288.
There was a problem hiding this comment.
Updated with suggested code
| pkiPodList := &corev1.PodList{} | ||
| g.Expect(mgmtClient.List(tc.Context, pkiPodList, | ||
| crclient.InNamespace(tc.ControlPlaneNamespace), | ||
| crclient.MatchingLabels{"app": "control-plane-pki-operator"}, |
There was a problem hiding this comment.
You wait for a pod to be Ready, but never verify it is a new pod. The old pod with old config can still be Running+Ready briefly. TLS 1.3 passes against both old and new config, so you get a false positive. Capture pod UID before mutation, wait for a different UID after.
There was a problem hiding this comment.
added code to compare pods UID before and after.
| g.Expect(cond.Status).To(Equal(corev1.ConditionTrue), | ||
| "control-plane-pki-operator pod should be ready") | ||
| } | ||
| } |
There was a problem hiding this comment.
timeout 2 is pretty aggressive for CI. The pod exec goes through SPDY streaming via the API server — in a loaded cluster, 2 seconds might not be enough. Bump to timeout 5 across all the openssl commands.
There was a problem hiding this comment.
timeout of 5 secods added.
| Namespace: tc.ClusterNamespace, | ||
| Name: tc.ClusterName, | ||
| }, hostedCluster) | ||
| if err != nil { |
There was a problem hiding this comment.
Two things:
-
wait.PollUntilContextTimeout: No other v2 test uses this — the framework convention isEventually. You already useEventuallywith conflict retry earlier in the same file. Switching also lets you drop the"context"andwaitimports. -
Silent failure: The warning log swallows the restore error. If this fails, the cluster is left with whatever TLS profile was last set, breaking subsequent tests. Use
Expect(err).NotTo(HaveOccurred())to fail loudly.
There was a problem hiding this comment.
- Replaced with Eventually
- Used Expect(err).*
|
Now I have a clear picture. Let me output the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe e2e test suite exceeded the 2-hour process timeout. The Root CauseThe root cause is a cumulative e2e suite timeout (2-hour wall clock), not a specific test assertion failure. What happened step-by-step:
Why Why the PR is not at fault: Recommendations
Evidence
|
3ea0648 to
f92ebfb
Compare
CNTRLPLANE-3624: Add end-to-end v2 test to verify control-plane-pki-operator TLS configuration behavior. Signed-off-by: Kaleemullah Siddiqui <ksiddiqu@redhat.com>
f92ebfb to
e3f3779
Compare
|
@kaleemsiddiqu: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
Scheduling tests matching the |
CNTRLPLANE-3624: Add end-to-end v2 test to verify control-plane-pki-operator TLS configuration behavior.
What this PR does / why we need it:
e2e test for #8768
Summary by CodeRabbit
TLSSecurityProfilechanges, including checks for requiredminTLSVersionvalues and connectivity success or failure for TLS 1.2 vs TLS 1.3 across default/intermediate and Modern scenarios.