Skip to content

NO-JIRA: Force webhook failure policy to Ignore until cluster has bootstrapped#476

Closed
JoelSpeed wants to merge 1 commit intoopenshift:mainfrom
JoelSpeed:bootstrap-webhooks-as-fail
Closed

NO-JIRA: Force webhook failure policy to Ignore until cluster has bootstrapped#476
JoelSpeed wants to merge 1 commit intoopenshift:mainfrom
JoelSpeed:bootstrap-webhooks-as-fail

Conversation

@JoelSpeed
Copy link
Contributor

@JoelSpeed JoelSpeed commented Feb 27, 2026

At the moment, there are connectivity issues sometimes on bootstrap when webhooks are set to FailurePolicy Fail.
In the long term, we hope to introduce a connection between the bootstrap API server and the in cluster running webhooks, but for now, we need a workaround to prevent cluster bootstrap issues failing on broken webhooks.

This PR add mutation to VWC and MWC in the static resource controller to set them to Ignore, until such a time that we see either the existing resource has Fail (or matches what's in the manifest itself), or all ClusterOperators are reporting Available=true, Progressing=false, Degraded=false. In a traditional cluster bootstrap, this happens some time after the bootstrap API server has already shut down, and means we can be confident then at setting the policy to Fail.

Summary by CodeRabbit

  • New Features

    • Admission webhooks now intelligently adjust failure policies during cluster bootstrap to prevent initialization disruptions. Once all cluster components become available and healthy, webhooks return to standard failure policies.
  • Tests

    • Added comprehensive tests verifying webhook failure policy behavior during cluster bootstrap and post-bootstrap states.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 27, 2026
@openshift-ci-robot
Copy link

@JoelSpeed: This pull request explicitly references no jira issue.

Details

In response to this:

At the moment, there are connectivity issues sometimes on bootstrap when webhooks are set to FailurePolicy Fail.
In the long term, we hope to introduce a connection between the bootstrap API server and the in cluster running webhooks, but for now, we need a workaround to prevent cluster bootstrap issues failing on broken webhooks.

This PR add mutation to VWC and MWC in the static resource controller to set them to Ignore, until such a time that we see either the existing resource has Fail (or matches what's in the manifest itself), or all ClusterOperators are reporting Available=true, Progressing=false, Degraded=false. In a traditional cluster bootstrap, this happens some time after the bootstrap API server has already shut down, and means we can be confident then at setting the policy to Fail.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

The controller gains logic to detect cluster bootstrap state and conditionally mutate webhook configurations. When the cluster isn't fully bootstrapped, webhook FailurePolicy is set to Ignore to prevent blocking initialization. Once all ClusterOperators reach Available and NotProgressing states, webhooks are applied unmutated with FailurePolicy set to Fail.

Changes

Cohort / File(s) Summary
Webhook Bootstrap Mutation Logic
pkg/controllers/staticresourceinstaller/controller.go
Introduces cluster bootstrap detection via ClusterOperator status conditions. Adds asset mutation to conditionally override webhook FailurePolicy between Ignore (pre-bootstrap) and Fail (post-bootstrap). Implements clusterBootstrapped and clusterOperatorsBootstrapped workflows with new controller fields and reconciliation logic.
Webhook Policy Tests
pkg/controllers/staticresourceinstaller/controller_test.go
Adds two tests verifying webhook FailurePolicy behavior: non-bootstrapped state expects Ignore, bootstrapped state expects Fail. Uses ClusterOperator status updates to simulate bootstrap state transitions.

Sequence Diagram(s)

sequenceDiagram
    participant Controller
    participant ClusterOperator
    participant AssetMutator
    participant WebhookConfig

    Controller->>ClusterOperator: Query ClusterOperator conditions
    ClusterOperator-->>Controller: Return status (Available, Progressing, Degraded)
    
    alt Cluster Not Bootstrapped
        Controller->>AssetMutator: Load webhook assets
        AssetMutator->>AssetMutator: Set FailurePolicy to Ignore
        AssetMutator->>WebhookConfig: Apply mutated webhook
        WebhookConfig-->>AssetMutator: Configuration applied
    else Cluster Bootstrapped
        Controller->>AssetMutator: Load webhook assets
        AssetMutator->>AssetMutator: Set FailurePolicy to Fail
        AssetMutator->>WebhookConfig: Apply unmutated webhook
        WebhookConfig-->>AssetMutator: Configuration applied
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Webhooks dance upon the bootstrap stage,

First set to Ignore, a gentle cage,

Till ClusterOperators bloom and thrive,

Then FailurePolicy comes alive!

Mutations whisper the cluster's age. 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: forcing webhook failure policy to Ignore during cluster bootstrap, which is the core functionality added across the controller and test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor Author

/pipeline required

I need to see E2E results for this, even if it must change later

@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-ci-robot
Copy link

@JoelSpeed: This pull request explicitly references no jira issue.

Details

In response to this:

At the moment, there are connectivity issues sometimes on bootstrap when webhooks are set to FailurePolicy Fail.
In the long term, we hope to introduce a connection between the bootstrap API server and the in cluster running webhooks, but for now, we need a workaround to prevent cluster bootstrap issues failing on broken webhooks.

This PR add mutation to VWC and MWC in the static resource controller to set them to Ignore, until such a time that we see either the existing resource has Fail (or matches what's in the manifest itself), or all ClusterOperators are reporting Available=true, Progressing=false, Degraded=false. In a traditional cluster bootstrap, this happens some time after the bootstrap API server has already shut down, and means we can be confident then at setting the policy to Fail.

Summary by CodeRabbit

  • New Features

  • Admission webhooks now intelligently adjust failure policies during cluster bootstrap to prevent initialization disruptions. Once all cluster components become available and healthy, webhooks return to standard failure policies.

  • Tests

  • Added comprehensive tests verifying webhook failure policy behavior during cluster bootstrap and post-bootstrap states.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
pkg/controllers/staticresourceinstaller/controller_test.go (1)

116-118: Scope FailurePolicy assertions to expected webhook configurations.

Line 116 and Line 149 assert over the entire list with HaveEach. That can pass on an empty list and can fail when unrelated ValidatingWebhookConfiguration objects exist. Prefer explicit name-scoped assertions plus Not(BeEmpty()).

Suggested assertion pattern
-Eventually(kWithCtx(ctx).ObjectList(&admissionregistrationv1.ValidatingWebhookConfigurationList{}), 10*time.Second).WithContext(ctx).Should(HaveField("Items", HaveEach(
-	HaveField("Webhooks", ConsistOf(HaveField("FailurePolicy", HaveValue(Equal(admissionregistrationv1.Ignore))))),
-)))
+Eventually(kWithCtx(ctx).ObjectList(&admissionregistrationv1.ValidatingWebhookConfigurationList{}), 10*time.Second).WithContext(ctx).Should(
+	HaveField("Items", SatisfyAll(
+		Not(BeEmpty()),
+		ContainElement(SatisfyAll(
+			HaveField("ObjectMeta.Name", Equal("openshift-compatibility-requirements-apiextensions-openshift-io-v1alpha1-compatibilityrequirement-validation")),
+			HaveField("Webhooks", HaveEach(HaveField("FailurePolicy", HaveValue(Equal(admissionregistrationv1.Ignore))))),
+		)),
+		ContainElement(SatisfyAll(
+			HaveField("ObjectMeta.Name", Equal("openshift-compatibility-requirements-apiextensions-k8s-io-v1-customresourcedefinition-validation")),
+			HaveField("Webhooks", HaveEach(HaveField("FailurePolicy", HaveValue(Equal(admissionregistrationv1.Ignore))))),
+		)),
+	)),
+)

Apply the same structure to the Fail-policy assertion.

Also applies to: 149-151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller_test.go` around lines 116
- 118, The test currently asserts FailurePolicy across the entire
ValidatingWebhookConfigurationList using HaveEach, which can pass incorrectly on
empty lists or be affected by unrelated objects; update the assertions to scope
to the expected webhook configurations by selecting the specific
ValidatingWebhookConfiguration by name (e.g., filter the
ValidatingWebhookConfigurationList returned by kWithCtx(ctx).ObjectList(...) for
the expected metadata.Name) and assert Not(BeEmpty()) on that filtered result,
then assert on its Webhooks field that each expected webhook has FailurePolicy
== admissionregistrationv1.Ignore; apply the same pattern to the second
occurrence around lines 149-151 so both checks explicitly target the named
webhook configs instead of the whole list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controllers/staticresourceinstaller/controller.go`:
- Around line 205-220: The NotFound branch currently forces
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore) whenever
the validating webhook is missing; change it to only apply this "ignore on
create" behavior when the cluster is still bootstrapping (e.g., check a
bootstrap-complete flag or method on the controller such as c.isBootstrapped()
or c.bootstrapComplete) and otherwise return an error or proceed to create with
the default/Fail policy; update the same gating logic in the other NotFound
handling block around lines 259-274 so both places only set FailurePolicy to
Ignore if the controller reports the cluster is not yet bootstrapped.
- Line 260: The error message in mutateMutatingWebhookConfiguration incorrectly
says "validating webhook configuration"; update the fmt.Errorf call that returns
an error for obj.Name to refer to "mutating webhook configuration" instead, i.e.
change the string literal passed to fmt.Errorf in
mutateMutatingWebhookConfiguration so it reads "failed to get mutating webhook
configuration %s: %w" (preserving obj.Name and err).
- Around line 311-327: The clusterBootstrapped function currently only updates
policiesMatch when names match, so missing required webhook names in
currentWebhooks can leave policiesMatch true; fix by for each webhook in
webhooks, search for a matching name in currentWebhooks and if none is found set
policiesMatch = false (and continue), and if found compare failurePolicy
pointers as before (require both non-nil and equal) — update the logic in
clusterBootstrapped to explicitly check existence of a matching currentWebhook
for every desired webhook (webhooks/currentWebhooks/policiesMatch).
- Around line 54-58: Replace the plain bool field
initialClusterOperatorsBootstrapped on staticResourceInstallerController with
sync/atomic's atomic.Bool and update all accesses: use
initialClusterOperatorsBootstrapped.Load() where the predicate callbacks and
reconciliation read the flag and
initialClusterOperatorsBootstrapped.Store(true/false) where reconciliation
writes it (ensure you add the "sync/atomic" import). Also fix the error text
that currently mentions "validating webhook configuration" when handling a
MutatingWebhookConfiguration—change the message to "mutating webhook
configuration" so it accurately reflects the MutatingWebhookConfiguration being
mutated.

---

Nitpick comments:
In `@pkg/controllers/staticresourceinstaller/controller_test.go`:
- Around line 116-118: The test currently asserts FailurePolicy across the
entire ValidatingWebhookConfigurationList using HaveEach, which can pass
incorrectly on empty lists or be affected by unrelated objects; update the
assertions to scope to the expected webhook configurations by selecting the
specific ValidatingWebhookConfiguration by name (e.g., filter the
ValidatingWebhookConfigurationList returned by kWithCtx(ctx).ObjectList(...) for
the expected metadata.Name) and assert Not(BeEmpty()) on that filtered result,
then assert on its Webhooks field that each expected webhook has FailurePolicy
== admissionregistrationv1.Ignore; apply the same pattern to the second
occurrence around lines 149-151 so both checks explicitly target the named
webhook configs instead of the whole list.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between b2d9212 and e199f53.

📒 Files selected for processing (2)
  • pkg/controllers/staticresourceinstaller/controller.go
  • pkg/controllers/staticresourceinstaller/controller_test.go

Comment on lines 54 to +58
type staticResourceInstallerController struct {
assetNames []string // The names of the assets to install.
kubeClient kubernetes.Interface
assetNames []string // The names of the assets to install.
client client.Client
kubeClient kubernetes.Interface
initialClusterOperatorsBootstrapped bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check the Go version
cat go.mod | head -20

Repository: openshift/cluster-capi-operator

Length of output: 1001


🏁 Script executed:

# Read the relevant sections of the controller file
wc -l pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 132


🏁 Script executed:

# Check the struct definition and field declaration
sed -n '50,70p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 814


🏁 Script executed:

# Check the predicate callback at line 97
sed -n '90,110p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 1110


🏁 Script executed:

# Check the reconcile method and initialClusterOperatorsBootstrapped access
sed -n '330,360p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 1145


🏁 Script executed:

# Check the error text at line 260
sed -n '255,265p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 884


🏁 Script executed:

# Look for the clusterOperatorsBootstrapped method implementation
sed -n '332,355p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 926


Protect initialClusterOperatorsBootstrapped with synchronization to prevent data race.

Line 97 reads this flag from predicate callbacks that run in the watch event loop, while lines 333 and 352 read and write it during reconciliation in the worker pool. These concurrent execution contexts create an unsynchronized access to a shared boolean field, which is a data race. Use atomic.Bool to synchronize access.

Additionally, the error message at line 260 incorrectly refers to "validating webhook configuration" when the function mutates a MutatingWebhookConfiguration. Update the error text to say "mutating webhook configuration".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 54 - 58,
Replace the plain bool field initialClusterOperatorsBootstrapped on
staticResourceInstallerController with sync/atomic's atomic.Bool and update all
accesses: use initialClusterOperatorsBootstrapped.Load() where the predicate
callbacks and reconciliation read the flag and
initialClusterOperatorsBootstrapped.Store(true/false) where reconciliation
writes it (ensure you add the "sync/atomic" import). Also fix the error text
that currently mentions "validating webhook configuration" when handling a
MutatingWebhookConfiguration—change the message to "mutating webhook
configuration" so it accurately reflects the MutatingWebhookConfiguration being
mutated.

Comment on lines +205 to +220
if err := c.client.Get(ctx, client.ObjectKey{Name: obj.Name}, currentObj); err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
} else if err != nil && apierrors.IsNotFound(err) {
// If the object doesn't currently exist, apply it initially with the failure policy set to ignore
// so that we don't block cluster operators during cluster bootstrap.
for i := range obj.Webhooks {
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
}

data, err := k8syaml.Marshal(obj)
if err != nil {
return nil, fmt.Errorf("failed to marshal object: %w", err)
}

return data, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid forcing Ignore on NotFound after bootstrap is complete.

Line 207 and Line 261 return early with Ignore whenever the webhook object is missing. In a bootstrapped cluster, this still weakens policy even though Fail should be allowed.

Suggested fix (gate NotFound path on bootstrap state)
-	} else if err != nil && apierrors.IsNotFound(err) {
+	} else if err != nil && apierrors.IsNotFound(err) {
+		bootstrapped, berr := c.clusterOperatorsBootstrapped(ctx)
+		if berr != nil {
+			return nil, fmt.Errorf("failed to check if cluster operators are bootstrapped: %w", berr)
+		}
+		if bootstrapped {
+			return raw, nil
+		}
 		// If the object doesn't currently exist, apply it initially with the failure policy set to ignore
 		// so that we don't block cluster operators during cluster bootstrap.
 		for i := range obj.Webhooks {
 			obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
 		}

Also applies to: 259-274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 205 -
220, The NotFound branch currently forces obj.Webhooks[i].FailurePolicy =
ptr.To(admissionregistrationv1.Ignore) whenever the validating webhook is
missing; change it to only apply this "ignore on create" behavior when the
cluster is still bootstrapping (e.g., check a bootstrap-complete flag or method
on the controller such as c.isBootstrapped() or c.bootstrapComplete) and
otherwise return an error or proceed to create with the default/Fail policy;
update the same gating logic in the other NotFound handling block around lines
259-274 so both places only set FailurePolicy to Ignore if the controller
reports the cluster is not yet bootstrapped.

currentObj := &admissionregistrationv1.MutatingWebhookConfiguration{}

if err := c.client.Get(ctx, client.ObjectKey{Name: obj.Name}, currentObj); err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect resource type in error message.

Line 260 is in mutateMutatingWebhookConfiguration but the message says “validating webhook configuration”, which is misleading when debugging failures.

Suggested fix
-		return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
+		return nil, fmt.Errorf("failed to get mutating webhook configuration %s: %w", obj.Name, err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
return nil, fmt.Errorf("failed to get mutating webhook configuration %s: %w", obj.Name, err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` at line 260, The error
message in mutateMutatingWebhookConfiguration incorrectly says "validating
webhook configuration"; update the fmt.Errorf call that returns an error for
obj.Name to refer to "mutating webhook configuration" instead, i.e. change the
string literal passed to fmt.Errorf in mutateMutatingWebhookConfiguration so it
reads "failed to get mutating webhook configuration %s: %w" (preserving obj.Name
and err).

Comment on lines +311 to +327
func (c *staticResourceInstallerController) clusterBootstrapped(ctx context.Context, webhooks, currentWebhooks []webhookPolicy) (bool, error) {
// First check if the existing policies match the desired policies.
// This means we already applied the manifest as it comes from the assets directly
// without any mutation. To do that, the cluster must already have been bootstrapped.
policiesMatch := true

for _, webhook := range webhooks {
for _, currentWebhook := range currentWebhooks {
if webhook.Name == currentWebhook.Name {
policiesMatch = policiesMatch && webhook.FailurePolicy != nil && currentWebhook.FailurePolicy != nil && *webhook.FailurePolicy == *currentWebhook.FailurePolicy
}
}
}

if policiesMatch {
return true, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

clusterBootstrapped can return true for mismatched webhook sets.

At Line 317, the nested comparison only updates policiesMatch when names match. If a required webhook name is missing in currentWebhooks, policiesMatch may stay true and skip bootstrap gating.

Suggested fix (compare by name with existence checks)
 func (c *staticResourceInstallerController) clusterBootstrapped(ctx context.Context, webhooks, currentWebhooks []webhookPolicy) (bool, error) {
-	policiesMatch := true
-
-	for _, webhook := range webhooks {
-		for _, currentWebhook := range currentWebhooks {
-			if webhook.Name == currentWebhook.Name {
-				policiesMatch = policiesMatch && webhook.FailurePolicy != nil && currentWebhook.FailurePolicy != nil && *webhook.FailurePolicy == *currentWebhook.FailurePolicy
-			}
-		}
-	}
+	if len(webhooks) != len(currentWebhooks) {
+		return c.clusterOperatorsBootstrapped(ctx)
+	}
+
+	currentByName := map[string]webhookPolicy{}
+	for _, currentWebhook := range currentWebhooks {
+		currentByName[currentWebhook.Name] = currentWebhook
+	}
+
+	policiesMatch := true
+	for _, webhook := range webhooks {
+		currentWebhook, ok := currentByName[webhook.Name]
+		if !ok {
+			policiesMatch = false
+			break
+		}
+		policiesMatch = policiesMatch &&
+			webhook.FailurePolicy != nil &&
+			currentWebhook.FailurePolicy != nil &&
+			*webhook.FailurePolicy == *currentWebhook.FailurePolicy
+	}
 
 	if policiesMatch {
 		return true, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 311 -
327, The clusterBootstrapped function currently only updates policiesMatch when
names match, so missing required webhook names in currentWebhooks can leave
policiesMatch true; fix by for each webhook in webhooks, search for a matching
name in currentWebhooks and if none is found set policiesMatch = false (and
continue), and if found compare failurePolicy pointers as before (require both
non-nil and equal) — update the logic in clusterBootstrapped to explicitly check
existence of a matching currentWebhook for every desired webhook
(webhooks/currentWebhooks/policiesMatch).

@theobarberbany
Copy link
Contributor

I think the Coderabbit races are probably correct, however LGTM on approach for a stop gap during dev. Maybe comments somewhere that this is temporary?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2026

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-capi-techpreview e199f53 link true /test e2e-vsphere-capi-techpreview
ci/prow/lint e199f53 link true /test lint
ci/prow/e2e-openstack-ovn-techpreview e199f53 link true /test e2e-openstack-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mdbooth
Copy link
Contributor

mdbooth commented Mar 2, 2026

Pretty sure we replaced this with #479. @JoelSpeed reopen if I'm missing something.

/close

@openshift-ci openshift-ci bot closed this Mar 2, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2026

@mdbooth: Closed this PR.

Details

In response to this:

Pretty sure we replaced this with #479. @JoelSpeed reopen if I'm missing something.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants