Skip to content

fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster#8056

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
dpateriya:fix/virt-launcher-netpol-external-infra
Apr 27, 2026
Merged

fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster#8056
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
dpateriya:fix/virt-launcher-netpol-external-infra

Conversation

@dpateriya

@dpateriya dpateriya commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Create the virt-launcher NetworkPolicy on the external infrastructure cluster when deploying HCP KubeVirt with workers on a separate cluster (Credentials != nil)
  • Add reconcileVirtLauncherNetworkPolicyExternalInfra function that builds an adapted policy for external infra (uses infra cluster CIDRs, omits control-plane pod selectors that only exist on the management cluster)
  • Update documented minimum RBAC role (kv-external-infra-role) to include networkpolicies permission

Problem

When deploying a Hosted Control Plane with KubeVirt platform using external infrastructure (workers on a separate cluster from the control plane), the virt-launcher NetworkPolicy is never created on the infrastructure cluster. The code in reconcileNetworkPolicies explicitly skips it when hcluster.Spec.Platform.Kubevirt.Credentials != nil. This leaves guest VMs with unrestricted network access to all pods and services on the infrastructure cluster, breaking tenant isolation in multi-tenant environments.

Fix

Uses the existing infra cluster client (KubevirtInfraClientMap) -- the same client already used for VM creation, route management, and version validation -- to also create the NetworkPolicy on the infra cluster. The policy:

  • Blocks egress to the infra cluster clusterNetwork and serviceNetwork CIDRs
  • Allows inter-VM communication (same infra-id), DNS, ingress controller, and external traffic
  • Omits kube-apiserver, oauth, and ignition-server-proxy pod selectors (those run on the management cluster and are reached via external IPs, already permitted by the 0.0.0.0/0 rule)

Test plan

  • Deploy HCP KubeVirt with external infra (--infra-kubeconfig-file + --infra-namespace)
  • Verify virt-launcher NetworkPolicy is created in the infra namespace on the workers cluster
  • Verify the policy egress rules block infra cluster CIDRs and allow inter-VM, DNS, ingress
  • Verify centralized infra (same-cluster) deployments continue to work unchanged
  • Verify guest VMs can still reach their control plane (kube-apiserver, oauth, ignition) via external routes
  • Verify guest VMs cannot reach arbitrary pods/services on the infra cluster

Made with Cursor

Summary by CodeRabbit

  • Documentation

    • Expanded RBAC guidance: added networkpolicies and events, adjusted volumesnapshot verbs, updated Role examples, and documented cluster-scoped RBAC needed to read cluster network configuration.
  • New Features

    • Network policy handling for external infrastructure clusters: discovers infra network config and applies refined virt-launcher ingress/egress policies with optional CIDR-based egress restrictions.
  • Bug Fixes / Observability

    • Added a hosted cluster condition, infra-cluster warning events, and status update behavior when infra kubeconfig cannot read cluster network config.

@openshift-ci-robot

Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

KubeVirt NetworkPolicy reconciliation now branches by whether external infra credentials exist. If external credentials are present the controller discovers an infra-cluster client, attempts to GET the cluster-scoped configv1.Network "cluster", updates the HostedCluster condition ValidKubeVirtInfraNetworkPolicyRBAC (True/False), emits a warning Event on RBAC/read failure, and reconciles the virt-launcher NetworkPolicy in the infra namespace using infra CIDRs when available. If credentials are absent it reconciles the virt-launcher NetworkPolicy in the management control-plane namespace. Documentation and example RBAC were updated to reflect required networkpolicies and cluster-scoped read access.

Sequence Diagram

sequenceDiagram
    participant MC as Management Cluster Controller
    participant Creds as KubeVirt Credentials
    participant IC as Infrastructure Cluster API
    participant NCfg as configv1.Network ("cluster")
    participant Status as HostedCluster Status
    participant Event as Infra Warning Event
    participant NP as NetworkPolicy Resource

    rect rgba(70, 130, 180, 0.5)
    Note over MC,NP: External Infrastructure Flow
    MC->>Creds: Check if Credentials != nil
    alt External Infrastructure
        Creds-->>MC: Credentials present
        MC->>IC: Discover infra cluster client
        IC->>NCfg: GET configv1.Network "cluster"
        NCfg-->>IC: Return network CIDRs or error
        IC-->>MC: Provide network config or error
        MC->>Status: Update ValidKubeVirtInfraNetworkPolicyRBAC condition
        MC->>Event: Emit infra-cluster RBAC warning (on read failure)
        MC->>NP: Create/Update virt-launcher NetworkPolicy in infra namespace (use CIDRs if available)
    else Centralized Infrastructure
        Creds-->>MC: No credentials
        MC->>NP: Create/Update virt-launcher NetworkPolicy in control-plane namespace
    end
    end
Loading
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform and removed do-not-merge/needs-area labels Mar 24, 2026
@openshift-ci openshift-ci Bot requested review from orenc1 and qinqon March 24, 2026 13:22
@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from adca61c to b3411d4 Compare March 24, 2026 13:29

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)

719-842: Please extract the shared virt-launcher policy builder before these two paths drift.

This function is nearly a copy of reconcileVirtLauncherNetworkPolicy; only the blocked CIDR source and a few egress peers differ. Pull the common ingress/egress construction into a helper and pass the external-infra deltas in, otherwise future policy changes are easy to apply to one path and miss in the other.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 719 - 842, The two functions
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy share almost identical ingress/egress
construction; extract a helper (e.g., buildVirtLauncherPolicyBase or
newVirtLauncherPolicyBuilder) that builds the common policy.Spec.PolicyTypes,
PodSelector, common Ingress rules and base Egress rules and accepts parameters
for the variable pieces (blockedIPv4Networks, blockedIPv6Networks, and a slice
of extra egress peers or a callback to append external-infra-specific peers).
Replace the duplicated blocks in both
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy to call the new helper, then apply the
external-specific deltas (adding service NodePort IPBlocks and the
infra-specific Pod/Namespace peers) to the returned policy or builder before
returning; keep existing symbol names (policy, hcluster, infraClusterNetwork) to
make locating sites straightforward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 166-168: The code performs a cluster-scoped lookup via
infraClient.Get on infraClusterNetwork (configv1.Network{Name:"cluster"}) which
requires cluster-level RBAC; change this by either removing the live
cluster-scoped lookup (avoid calling infraClient.Get in the network policy
reconciliation path and use a safer local/default value or accept a passed-in
configuration) or explicitly require and document ClusterRole/ClusterRoleBinding
granting get on networks.config.openshift.io for the controller; update the code
around infraClusterNetwork/infraClient.Get to implement the chosen approach and
add a RBAC manifest entry (ClusterRole + ClusterRoleBinding) if you opt to keep
the live lookup, ensuring it grants verbs=get on
resource=networks.config.openshift.io.

---

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 719-842: The two functions
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy share almost identical ingress/egress
construction; extract a helper (e.g., buildVirtLauncherPolicyBase or
newVirtLauncherPolicyBuilder) that builds the common policy.Spec.PolicyTypes,
PodSelector, common Ingress rules and base Egress rules and accepts parameters
for the variable pieces (blockedIPv4Networks, blockedIPv6Networks, and a slice
of extra egress peers or a callback to append external-infra-specific peers).
Replace the duplicated blocks in both
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy to call the new helper, then apply the
external-specific deltas (adding service NodePort IPBlocks and the
infra-specific Pod/Namespace peers) to the returned policy or builder before
returning; keep existing symbol names (policy, hcluster, infraClusterNetwork) to
make locating sites straightforward.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 72ef2682-9485-4808-b0a3-ac41022f2c6d

📥 Commits

Reviewing files that changed from the base of the PR and between c8ae120 and adca61c.

📒 Files selected for processing (2)
  • docs/content/how-to/kubevirt/external-infrastructure.md
  • hypershift-operator/controllers/hostedcluster/network_policies.go

Comment thread hypershift-operator/controllers/hostedcluster/network_policies.go Outdated
@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch 2 times, most recently from 6b6911d to a599aa8 Compare March 24, 2026 13:52
@dpateriya dpateriya changed the title Bug: Create virt-launcher NetworkPolicy on external infra cluster fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster Mar 24, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 24, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@dpateriya: This pull request references Jira Issue OCPBUGS-78575, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Create the virt-launcher NetworkPolicy on the external infrastructure cluster when deploying HCP KubeVirt with workers on a separate cluster (Credentials != nil)
  • Add reconcileVirtLauncherNetworkPolicyExternalInfra function that builds an adapted policy for external infra (uses infra cluster CIDRs, omits control-plane pod selectors that only exist on the management cluster)
  • Update documented minimum RBAC role (kv-external-infra-role) to include networkpolicies permission

Problem

When deploying a Hosted Control Plane with KubeVirt platform using external infrastructure (workers on a separate cluster from the control plane), the virt-launcher NetworkPolicy is never created on the infrastructure cluster. The code in reconcileNetworkPolicies explicitly skips it when hcluster.Spec.Platform.Kubevirt.Credentials != nil. This leaves guest VMs with unrestricted network access to all pods and services on the infrastructure cluster, breaking tenant isolation in multi-tenant environments.

Fix

Uses the existing infra cluster client (KubevirtInfraClientMap) -- the same client already used for VM creation, route management, and version validation -- to also create the NetworkPolicy on the infra cluster. The policy:

  • Blocks egress to the infra cluster clusterNetwork and serviceNetwork CIDRs
  • Allows inter-VM communication (same infra-id), DNS, ingress controller, and external traffic
  • Omits kube-apiserver, oauth, and ignition-server-proxy pod selectors (those run on the management cluster and are reached via external IPs, already permitted by the 0.0.0.0/0 rule)

Test plan

  • Deploy HCP KubeVirt with external infra (--infra-kubeconfig-file + --infra-namespace)
  • Verify virt-launcher NetworkPolicy is created in the infra namespace on the workers cluster
  • Verify the policy egress rules block infra cluster CIDRs and allow inter-VM, DNS, ingress
  • Verify centralized infra (same-cluster) deployments continue to work unchanged
  • Verify guest VMs can still reach their control plane (kube-apiserver, oauth, ignition) via external routes
  • Verify guest VMs cannot reach arbitrary pods/services on the infra cluster

Made with Cursor

Summary by CodeRabbit

  • Documentation

  • Updated RBAC and access control docs: include networkpolicies as required and add guidance to bind both a Role and a ClusterRole; new instructions and examples for granting read access to cluster network config.

  • New Features

  • Network policy handling now supports external infrastructure clusters: discovers infra network config and creates appropriate virt-launcher network policies with refined ingress/egress rules.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
hypershift-operator/controllers/hostedcluster/network_policies.go (2)

181-183: Reuse the injected createOrUpdate in the external-infra branch.

Constructing a fresh upsert helper here bypasses the caller-supplied createOrUpdate, so this path no longer shares the same wrapper/test seam as the rest of reconcileNetworkPolicies.

Suggested change
-			infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate
-			if _, err := infraCreateOrUpdate(ctx, infraClient, policy, func() error {
+			if _, err := createOrUpdate(ctx, infraClient, policy, func() error {
 				return reconcileVirtLauncherNetworkPolicyExternalInfra(log, policy, hcluster, infraClusterNetwork)
 			}); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 181 - 183, This code creates a new upsert helper (infraCreateOrUpdate :=
upsert.New(r.EnableCIDebugOutput).CreateOrUpdate) instead of using the injected
caller-supplied createOrUpdate, breaking the test/wrapper seam; change the
branch to call the injected createOrUpdate (the same createOrUpdate used
elsewhere in reconcileNetworkPolicies/external-infra path) when applying policy
= networkpolicy.VirtLauncherNetworkPolicy(infraNamespace) so the call becomes
createOrUpdate(ctx, infraClient, policy, func() error { ... }) and remove the
local upsert.New(...) construction.

733-857: Extract the shared virt-launcher policy builder.

This helper copies most of reconcileVirtLauncherNetworkPolicy: selector setup, ingress rules, CIDR blocking, and NodePort exception handling. Only a small subset of egress peers differs, so keeping two near-identical implementations will drift quickly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 733 - 857, The two virt-launcher reconciler functions are largely
duplicated; extract the shared logic into a helper (e.g.,
buildVirtLauncherPolicyBase or new function buildVirtLauncherPolicySpec) that
accepts the logger, policy pointer or returns a prepared
networkingv1.NetworkPolicySpec, the HostedCluster (hcluster) and
infraClusterNetwork and performs: setting PolicyTypes, PodSelector
(hyperv1.InfraIDLabel and "kubevirt.io":"virt-launcher"), ingress ports,
building blockedIPv4/IPv6 via addToBlockedNetworks, and the NodePort exception
handling loop over hcluster.Spec.Services (preserving netip.ParseAddr, utilsnet
IPv4/IPv6 checks and error returns). Have
reconcileVirtLauncherNetworkPolicyExternalInfra and the other
reconcileVirtLauncherNetworkPolicy call this helper to get the base Spec (or
mutate policy) and then append only their differing egress peers (the
DNS/ingress/peer differences), returning errors from the helper as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 170-179: The infra cluster Network lookup currently swallows
errors (infraClient.Get) leaving infraClusterNetwork nil which causes fallback
to unrestricted egress; instead, when infraClient.Get fails, either return a
reconciliation error from the surrounding reconcile function (so the controller
retries) or mark the HostedCluster as degraded via the existing status/condition
helper (e.g., setHostedClusterDegraded or SetProgressing/SetDegraded) with a
clear message referencing the failed infra network lookup; update the code paths
around infraClusterNetwork, networkObj and the virt-launcher NetworkPolicy
creation to rely on that error flow so we do not silently emit 0.0.0.0/0 and
::/0 rules when the CIDR cannot be retrieved.

---

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 181-183: This code creates a new upsert helper
(infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate)
instead of using the injected caller-supplied createOrUpdate, breaking the
test/wrapper seam; change the branch to call the injected createOrUpdate (the
same createOrUpdate used elsewhere in reconcileNetworkPolicies/external-infra
path) when applying policy =
networkpolicy.VirtLauncherNetworkPolicy(infraNamespace) so the call becomes
createOrUpdate(ctx, infraClient, policy, func() error { ... }) and remove the
local upsert.New(...) construction.
- Around line 733-857: The two virt-launcher reconciler functions are largely
duplicated; extract the shared logic into a helper (e.g.,
buildVirtLauncherPolicyBase or new function buildVirtLauncherPolicySpec) that
accepts the logger, policy pointer or returns a prepared
networkingv1.NetworkPolicySpec, the HostedCluster (hcluster) and
infraClusterNetwork and performs: setting PolicyTypes, PodSelector
(hyperv1.InfraIDLabel and "kubevirt.io":"virt-launcher"), ingress ports,
building blockedIPv4/IPv6 via addToBlockedNetworks, and the NodePort exception
handling loop over hcluster.Spec.Services (preserving netip.ParseAddr, utilsnet
IPv4/IPv6 checks and error returns). Have
reconcileVirtLauncherNetworkPolicyExternalInfra and the other
reconcileVirtLauncherNetworkPolicy call this helper to get the base Spec (or
mutate policy) and then append only their differing egress peers (the
DNS/ingress/peer differences), returning errors from the helper as needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 37a29eba-4fda-4644-8e36-3c43d3834bdc

📥 Commits

Reviewing files that changed from the base of the PR and between adca61c and 6b6911d.

📒 Files selected for processing (2)
  • docs/content/how-to/kubevirt/external-infrastructure.md
  • hypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/content/how-to/kubevirt/external-infrastructure.md

Comment thread hypershift-operator/controllers/hostedcluster/network_policies.go
@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from a599aa8 to 5c389e3 Compare March 24, 2026 14:18
@openshift-ci openshift-ci Bot added the area/api Indicates the PR includes changes for the API label Mar 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)

172-177: ⚠️ Potential issue | 🟠 Major

Don't fall back on every infra-network lookup error.

This path currently treats transient API/auth/connectivity failures the same as missing RBAC and silently creates the weaker no-CIDR-blocking policy instead of retrying. Reserve the fallback for the expected permission / unsupported-resource cases and return the rest.

Possible fix
-			if err := infraClient.Get(ctx, client.ObjectKeyFromObject(networkObj), networkObj); err != nil {
-				log.Info("unable to read networks.config.openshift.io/cluster from the infrastructure cluster; "+
-					"virt-launcher NetworkPolicy will be created without CIDR-based egress restrictions. "+
-					"Grant get permission on networks.config.openshift.io via a ClusterRole for full network isolation",
-					"error", err)
-			} else {
-				infraClusterNetwork = networkObj
-			}
+			if err := infraClient.Get(ctx, client.ObjectKeyFromObject(networkObj), networkObj); err != nil {
+				if apierrors.IsForbidden(err) || apierrors.IsNotFound(err) || meta.IsNoMatchError(err) {
+					log.Info("unable to read networks.config.openshift.io/cluster from the infrastructure cluster; "+
+						"virt-launcher NetworkPolicy will be created without CIDR-based egress restrictions. "+
+						"Grant get permission on networks.config.openshift.io via a ClusterRole for full network isolation",
+						"error", err)
+				} else {
+					return fmt.Errorf("failed to get infrastructure cluster network config: %w", err)
+				}
+			} else {
+				infraClusterNetwork = networkObj
+			}

This needs k8s.io/apimachinery/pkg/api/errors and k8s.io/apimachinery/pkg/api/meta imports above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 172 - 177, The current error handling around infraClient.Get(networkObj)
in network_policies.go treats all errors as "missing RBAC" and silently falls
back; change it to only swallow expected permission/unsupported-resource errors
(e.g., apierrors.IsNotFound(err), apierrors.IsForbidden(err), or
meta.IsNoMatchError(err)) and for any other error return/propagate it so the
reconcile will retry; update the branch around the Get call (the networkObj /
infraClient.Get(...) handling) to import and use
k8s.io/apimachinery/pkg/api/errors and k8s.io/apimachinery/pkg/api/meta and only
perform the no-CIDR-blocking fallback on those specific conditions, otherwise
return the original error.
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)

181-183: Reuse the injected createOrUpdate here.

reconcileNetworkPolicies already receives a CreateOrUpdateFN that takes the target client. Spinning up a fresh upserter only for this branch bypasses that seam and makes tests/observability wrappers easier to miss on the external-infra path.

Small cleanup
-			infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate
-			if _, err := infraCreateOrUpdate(ctx, infraClient, policy, func() error {
+			if _, err := createOrUpdate(ctx, infraClient, policy, func() error {
 				return reconcileVirtLauncherNetworkPolicyExternalInfra(log, policy, hcluster, infraClusterNetwork)
 			}); err != nil {

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 181 - 183, The code creates a new upserter for the infra path
(infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate)
instead of using the injected CreateOrUpdateFN passed into
reconcileNetworkPolicies; remove the upsert.New(...) call and invoke the
existing injected createOrUpdate function with the infraClient (i.e., call
createOrUpdate(ctx, infraClient, policy, func() error { ... })) so the
external-infra path uses the same create/update wrapper, tests and observability
hooks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 172-177: The current error handling around
infraClient.Get(networkObj) in network_policies.go treats all errors as "missing
RBAC" and silently falls back; change it to only swallow expected
permission/unsupported-resource errors (e.g., apierrors.IsNotFound(err),
apierrors.IsForbidden(err), or meta.IsNoMatchError(err)) and for any other error
return/propagate it so the reconcile will retry; update the branch around the
Get call (the networkObj / infraClient.Get(...) handling) to import and use
k8s.io/apimachinery/pkg/api/errors and k8s.io/apimachinery/pkg/api/meta and only
perform the no-CIDR-blocking fallback on those specific conditions, otherwise
return the original error.

---

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 181-183: The code creates a new upserter for the infra path
(infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate)
instead of using the injected CreateOrUpdateFN passed into
reconcileNetworkPolicies; remove the upsert.New(...) call and invoke the
existing injected createOrUpdate function with the infraClient (i.e., call
createOrUpdate(ctx, infraClient, policy, func() error { ... })) so the
external-infra path uses the same create/update wrapper, tests and observability
hooks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: eae1fb84-55eb-47d1-a51c-1034c5cf8414

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6911d and a599aa8.

📒 Files selected for processing (2)
  • docs/content/how-to/kubevirt/external-infrastructure.md
  • hypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/content/how-to/kubevirt/external-infrastructure.md

@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from 5c389e3 to f33bbc9 Compare March 24, 2026 14:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/hypershift/v1beta1/hostedcluster_conditions.go`:
- Around line 125-132: The new condition type
ValidKubeVirtInfraNetworkPolicyRBAC is defined but not registered in the
ExpectedHCConditions initialization, so add ValidKubeVirtInfraNetworkPolicyRBAC
to the KubevirtPlatform case in the ExpectedHCConditions slice/map in
support/conditions/conditions.go (the same place where
ValidKubeVirtInfraNetworkMTU and KubeVirtNodesLiveMigratable are listed) so the
reconciliation logic will initialize and maintain this condition; update the
KubevirtPlatform entry to include the ValidKubeVirtInfraNetworkPolicyRBAC
ConditionType.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2015-2022: The ValidKubeVirtInfraNetworkPolicyRBAC condition set
by reconcileNetworkPolicies must be persisted immediately instead of waiting
until after reconcileKubevirtCSIClusterRBAC; update the HostedCluster status
(call r.Client.Status().Update(ctx, hcluster) and handle apierrors.IsConflict
the same way) right after the condition is set (or at least before any
subsequent paths that may return an error such as
reconcileKubevirtCSIClusterRBAC), so the infra-network RBAC condition is never
lost or left stale even if later KubeVirt steps fail.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 721-730: Update the NamespaceSelector for the peer that targets
ingress router pods so it uses the standard namespace label key
"kubernetes.io/metadata.name" instead of "name"; locate the block that has
PodSelector matching
"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":
"default" and replace the NamespaceSelector MatchLabels key so it matches
"kubernetes.io/metadata.name": "openshift-ingress" to ensure the selector
actually matches the openshift-ingress namespace.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c57503d0-c095-4f5e-8422-9ea2faf6a6e3

📥 Commits

Reviewing files that changed from the base of the PR and between a599aa8 and f33bbc9.

⛔ Files ignored due to path filters (1)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (4)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • docs/content/how-to/kubevirt/external-infrastructure.md
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/content/how-to/kubevirt/external-infrastructure.md

Comment thread api/hypershift/v1beta1/hostedcluster_conditions.go
Comment thread hypershift-operator/controllers/hostedcluster/network_policies.go
@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from f33bbc9 to 04deba2 Compare March 24, 2026 15:28
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/testing Indicates the PR includes changes for e2e testing labels Mar 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)

721-730: ⚠️ Potential issue | 🟠 Major

Use the namespace label key that this router peer can actually match.

The selector at Line 729 still depends on name=openshift-ingress, while this file otherwise matches standard namespaces via kubernetes.io/metadata.name. If the namespace only has the standard label, this peer never matches and virt-launcher egress to router pods stays blocked.

Suggested fix
 			NamespaceSelector: &metav1.LabelSelector{
 				MatchLabels: map[string]string{
-					"name": "openshift-ingress",
+					"kubernetes.io/metadata.name": "openshift-ingress",
 				},
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 721 - 730, The NamespaceSelector in the NetworkPolicy peer uses the
non-standard label key "name" which prevents matching namespaces that only have
the standard label; update the NamespaceSelector.MatchLabels key from "name" to
"kubernetes.io/metadata.name" in the NetworkPolicy block (the peer with
PodSelector matching
"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":
"default") so the selector correctly matches the openshift-ingress namespace.
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

2011-2022: ⚠️ Potential issue | 🟠 Major

Persist ValidKubeVirtInfraNetworkPolicyRBAC before the CSI RBAC reconcile.

The extra status write at Line 2016 still runs only after reconcileKubevirtCSIClusterRBAC. If that call fails, the new infra-network RBAC condition can still be left stale or missing, which removes the main diagnostic signal for the external-infra path.

Suggested fix
 case hyperv1.KubevirtPlatform:
+	if hcluster.Spec.Platform.Kubevirt != nil && hcluster.Spec.Platform.Kubevirt.Credentials != nil {
+		if err := r.Client.Status().Update(ctx, hcluster); err != nil {
+			if apierrors.IsConflict(err) {
+				return ctrl.Result{Requeue: true}, nil
+			}
+			return ctrl.Result{}, fmt.Errorf("failed to update status after network policy RBAC check: %w", err)
+		}
+	}
 	err = r.reconcileKubevirtCSIClusterRBAC(ctx, createOrUpdate, hcluster)
 	if err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to reconcile kubevirt CSI cluster wide RBAC: %w", err)
 	}
-	if hcluster.Spec.Platform.Kubevirt != nil && hcluster.Spec.Platform.Kubevirt.Credentials != nil {
-		if err := r.Client.Status().Update(ctx, hcluster); err != nil {
-			if apierrors.IsConflict(err) {
-				return ctrl.Result{Requeue: true}, nil
-			}
-			return ctrl.Result{}, fmt.Errorf("failed to update status after network policy RBAC check: %w", err)
-		}
-	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 2011 - 2022, The status update for the infra-network RBAC condition
is happening after reconcileKubevirtCSIClusterRBAC and can be skipped if that
call errors; move/persist the ValidKubeVirtInfraNetworkPolicyRBAC status write
so it runs before or independently of reconcileKubevirtCSIClusterRBAC.
Specifically, ensure the code that sets/updates the
ValidKubeVirtInfraNetworkPolicyRBAC condition on hcluster and calls
r.Client.Status().Update(ctx, hcluster) executes (and handles
apierrors.IsConflict) prior to invoking r.reconcileKubevirtCSIClusterRBAC,
leaving reconcileKubevirtCSIClusterRBAC to run afterwards so the infra-network
RBAC diagnostic state cannot be lost if CSI RBAC reconciliation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2011-2022: The status update for the infra-network RBAC condition
is happening after reconcileKubevirtCSIClusterRBAC and can be skipped if that
call errors; move/persist the ValidKubeVirtInfraNetworkPolicyRBAC status write
so it runs before or independently of reconcileKubevirtCSIClusterRBAC.
Specifically, ensure the code that sets/updates the
ValidKubeVirtInfraNetworkPolicyRBAC condition on hcluster and calls
r.Client.Status().Update(ctx, hcluster) executes (and handles
apierrors.IsConflict) prior to invoking r.reconcileKubevirtCSIClusterRBAC,
leaving reconcileKubevirtCSIClusterRBAC to run afterwards so the infra-network
RBAC diagnostic state cannot be lost if CSI RBAC reconciliation fails.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 721-730: The NamespaceSelector in the NetworkPolicy peer uses the
non-standard label key "name" which prevents matching namespaces that only have
the standard label; update the NamespaceSelector.MatchLabels key from "name" to
"kubernetes.io/metadata.name" in the NetworkPolicy block (the peer with
PodSelector matching
"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":
"default") so the selector correctly matches the openshift-ingress namespace.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e6fa90e7-dc22-429b-9381-bf3a998f498e

📥 Commits

Reviewing files that changed from the base of the PR and between f33bbc9 and 04deba2.

⛔ Files ignored due to path filters (3)
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (6)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • docs/content/how-to/kubevirt/external-infrastructure.md
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • support/conditions/conditions.go
  • test/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • docs/content/how-to/kubevirt/external-infrastructure.md

@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from 04deba2 to d7f90ca Compare March 26, 2026 15:06
@dpateriya

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci-robot

Copy link
Copy Markdown

@dpateriya: The /verified command must be used with one of the following actions: by, later, remove, or bypass. See https://docs.ci.openshift.org/docs/architecture/jira/#premerge-verification for more information.

Details

In response to this:

/verified

Tested with custom operator image built from this PR branch: quay.io/rhn_support_dpateriy/hypershift:pr-8056-amd64


Scenario 1: RBAC granted — NetworkPolicy created correctly

$ oc get netpol -n kubevirt-external-8056
NAME            POD-SELECTOR                                                                        AGE
virt-launcher   hypershift.openshift.io/infra-id=kubevirt-external-8056,kubevirt.io=virt-launcher   45s
$ oc get netpol virt-launcher -n kubevirt-external-8056 -oyaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 annotations:
   hypershift.openshift.io/cluster: clusters/kubevirt-external-8056
 name: virt-launcher
 namespace: kubevirt-external-8056
spec:
 egress:
 - to:
   - ipBlock:
       cidr: 0.0.0.0/0
       except:
       - 10.128.0.0/14
       - 172.30.0.0/16
   - ipBlock:
       cidr: ::/0
   - podSelector:
       matchLabels:
         hypershift.openshift.io/infra-id: kubevirt-external-8056
         kubevirt.io: virt-launcher
   - namespaceSelector:
       matchLabels:
         kubernetes.io/metadata.name: openshift-dns
     podSelector:
       matchLabels:
         dns.operator.openshift.io/daemonset-dns: default
   - namespaceSelector:
       matchLabels:
         name: openshift-ingress
     podSelector:
       matchLabels:
         ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
 ingress:
 - ports:
   - protocol: TCP
   - protocol: UDP
   - protocol: SCTP
 podSelector:
   matchLabels:
     hypershift.openshift.io/infra-id: kubevirt-external-8056
     kubevirt.io: virt-launcher
 policyTypes:
 - Ingress
 - Egress
$ oc get hc -n clusters kubevirt-external-8056 -o yaml | grep RBAC -B7
   type: PlatformCredentialsFound
 - lastTransitionTime: "2026-04-26T11:21:18Z"
   message: Infrastructure cluster network configuration is readable; CIDR-based
     egress restrictions are active.
   observedGeneration: 3
   reason: AsExpected
   status: "True"
   type: ValidKubeVirtInfraNetworkPolicyRBAC

Scenario 2: RBAC missing for networkpolicies — graceful degradation

Tested by creating a HostedCluster with an infra kubeconfig bound to a ServiceAccount that has all required permissions except networking.k8s.io/networkpolicies.

Condition on HostedCluster:

$ oc get hc -n clusters kubevirt-restricted-test -o yaml | grep RBAC -B7
 - lastTransitionTime: "2026-04-26T11:45:03Z"
   message: 'Unable to create/update virt-launcher NetworkPolicy on the infrastructure
     cluster: the external infra kubeconfig lacks networking.k8s.io/networkpolicies
     permissions. Grant create/update/get/list/watch on networkpolicies in the infra
     namespace. Error: networkpolicies.networking.k8s.io "virt-launcher" is forbidden:
     User "system:serviceaccount:kubevirt-external-8056:hypershift-infra-restricted"
     cannot get resource "networkpolicies" in API group "networking.k8s.io" in the
     namespace "default"'
   observedGeneration: 3
   reason: InfraClusterNetworkPolicyCreateFailed
   status: "False"
   type: ValidKubeVirtInfraNetworkPolicyRBAC

VM creation continues unblocked (reconcile loop not broken):

$ oc get pods -n default
NAME                                                              READY   STATUS      RESTARTS   AGE
importer-prime-8fd7e283-9085-419a-8287-9d02940d63a0               0/2     Init:0/1    0          11s
importer-prime-d08cee16-47c5-4215-bab4-50ac44757cce               0/2     Init:0/1    0          11s
virt-launcher-kubevirt-external-8056-restrict-62mv7-5qpcc-9wb9b   0/2     Pending     0          11s
virt-launcher-kubevirt-external-8056-restrict-62mv7-dkzvj-5j22v   0/2     Pending     0          11s

Scenario 3: Pre-existing HostedCluster

A HostedCluster deployed before the operator was patched with the PR image also had the virt-launcher NetworkPolicy created correctly on the next reconcile, with ValidKubeVirtInfraNetworkPolicyRBAC: True.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dpateriya

Copy link
Copy Markdown
Contributor Author

/verified by me

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 26, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@dpateriya: This PR has been marked as verified by me.

Details

In response to this:

/verified by me

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…fra cluster

When deploying HCP KubeVirt with external infrastructure (workers on a
separate cluster), the virt-launcher NetworkPolicy was never created on
the infrastructure cluster. The reconcileNetworkPolicies function
explicitly skipped it when Credentials != nil, leaving guest VMs with
unrestricted network access to all pods and services on the infra
cluster.

This patch adds an else branch that uses the existing infra cluster
client (from KubevirtInfraClientMap) to create the virt-launcher
NetworkPolicy in the infra namespace on the infrastructure cluster.

A new reconcileVirtLauncherNetworkPolicyExternalInfra function builds
the policy adapted for external infra:
- Blocks infra cluster's clusterNetwork/serviceNetwork CIDRs
- Allows inter-VM, DNS, and ingress controller traffic
- Omits control-plane pod selectors (kube-apiserver, oauth,
  ignition-server-proxy) since those pods run on the management
  cluster and are reached via external IPs

The Network config lookup is best-effort: if the infra kubeconfig
lacks cluster-scoped get on networks.config.openshift.io, the policy
is still created but without CIDR-based egress blocking.

Also updates the documented minimum RBAC role for external infra to
include networkpolicies (networking.k8s.io) and documents the
optional ClusterRole for full network isolation.

Made-with: Cursor
@dpateriya dpateriya force-pushed the fix/virt-launcher-netpol-external-infra branch from 7b03ca8 to 4e3c2c4 Compare April 26, 2026 12:06
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Apr 26, 2026
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2026
@dpateriya

Copy link
Copy Markdown
Contributor Author

@csrwng could you re-/lgtm when you get a chance? The rebase was needed only to sync GitHub Actions workflow files that were updated on main since the branch diverged -- no code changes were made.

@dpateriya

Copy link
Copy Markdown
Contributor Author

/verified by me

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 26, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@dpateriya: This PR has been marked as verified by me.

Details

In response to this:

/verified by me

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dpateriya

Copy link
Copy Markdown
Contributor Author

/retest

@csrwng

csrwng commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 27, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2048752502962130944 | Cost: $2.1119624999999997 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci

Copy link
Copy Markdown

I now have all the evidence needed for a complete analysis. This is a CI infrastructure resource exhaustion issue — not related to the PR's code changes at all.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-v2-aws
  • Build ID: 2048760019419140096
  • Target: e2e-v2-aws
  • Type: presubmit (PR fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster #8056)
  • State: failure
  • Start Time: 2026-04-27T13:07:20Z
  • Completion Time: 2026-04-27T15:47:09Z
  • Duration: ~2h40m (of which 2h was waiting for a lease)
  • Cluster: build01
  • Failure Reason: executing_graph:step_failed:utilizing_lease:acquiring_lease

Test Failure Analysis

Error

failed to acquire lease for "hypershift-aws-quota-slice": resources not found
error: Failed to acquire resource, current capacity: 0 free, 30 leased

Summary

The e2e-v2-aws test never executed. The job spent exactly 2 hours waiting to acquire a hypershift-aws-quota-slice cloud resource lease before timing out. All 30 lease slots were occupied by other jobs during the entire wait window (13:47–15:47 UTC). This is a CI infrastructure capacity issue unrelated to the code changes in PR #8056.

Root Cause

The CI lease system manages concurrent access to shared cloud resources (AWS accounts/quotas) via a pool of hypershift-aws-quota-slice leases. At the time this job attempted to acquire a lease, all 30 available slots were already held by other running jobs. The job waited the full 2-hour lease acquisition timeout without any slot becoming available, then failed.

Key timeline:

  1. 13:07:20Z — Job started, began building images (all builds succeeded quickly, using cached artifacts)
  2. 13:47:02Z — All images built, release created. ci-operator requested a hypershift-aws-quota-slice lease
  3. 15:47:02Z — Exactly 2 hours later, lease acquisition timed out: current capacity: 0 free, 30 leased
  4. 15:47:07Z — Job reported failure with reason executing_graph:step_failed:utilizing_lease:acquiring_lease

No test step code was executed. The multi-stage test e2e-v2-aws (which includes pre/test/post phases) never started because the lease — required to provision an AWS environment — could not be obtained. The PR's code changes (creating a virt-launcher NetworkPolicy on external infra clusters) were never tested.

This is a transient CI infrastructure issue caused by high demand on the hypershift-aws-quota-slice resource pool.

Recommendations
  1. Retest the PR — This failure is not related to the code changes. Trigger a retest with /retest or /test e2e-v2-aws when CI resource contention has subsided.
  2. No code changes needed — The PR's code (OCPBUGS-78575: virt-launcher NetworkPolicy for external infra) was never exercised; the failure occurred before any test step ran.
  3. If retests continue to fail with the same lease error — This may indicate a systemic capacity issue with the hypershift-aws-quota-slice pool. Escalate to the CI infrastructure / Test Platform team to investigate whether the pool size (currently 30) needs to be increased or if stuck leases are consuming capacity.
Evidence
Evidence Detail
Failure type CI infrastructure — lease acquisition timeout
Resource pool hypershift-aws-quota-slice
Pool capacity 30 total, 0 free, 30 leased at time of failure
Wait duration Exactly 2 hours (13:47:02Z → 15:47:02Z)
ci-operator reason executing_graph:step_failed:utilizing_lease:acquiring_lease
JUnit failure Run multi-stage test e2e-v2-aws — 7200s duration, failed
Image builds All succeeded (src, hypershift, hypershift-operator, hypershift-tests, hypershift-cli)
Test execution None — no pre/test/post steps ran
Code-related? No — failure occurred before any test code was executed

@csrwng

csrwng commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

/retest-required

@csrwng

csrwng commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

/override ci/prow/verify-workflows

@openshift-ci

openshift-ci Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

@csrwng: Overrode contexts on behalf of csrwng: ci/prow/verify-workflows

Details

In response to this:

/override ci/prow/verify-workflows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 222a19f into openshift:main Apr 27, 2026
53 of 55 checks passed
@openshift-ci-robot

Copy link
Copy Markdown

@dpateriya: Jira Issue Verification Checks: Jira Issue OCPBUGS-78575
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-78575 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Summary

  • Create the virt-launcher NetworkPolicy on the external infrastructure cluster when deploying HCP KubeVirt with workers on a separate cluster (Credentials != nil)
  • Add reconcileVirtLauncherNetworkPolicyExternalInfra function that builds an adapted policy for external infra (uses infra cluster CIDRs, omits control-plane pod selectors that only exist on the management cluster)
  • Update documented minimum RBAC role (kv-external-infra-role) to include networkpolicies permission

Problem

When deploying a Hosted Control Plane with KubeVirt platform using external infrastructure (workers on a separate cluster from the control plane), the virt-launcher NetworkPolicy is never created on the infrastructure cluster. The code in reconcileNetworkPolicies explicitly skips it when hcluster.Spec.Platform.Kubevirt.Credentials != nil. This leaves guest VMs with unrestricted network access to all pods and services on the infrastructure cluster, breaking tenant isolation in multi-tenant environments.

Fix

Uses the existing infra cluster client (KubevirtInfraClientMap) -- the same client already used for VM creation, route management, and version validation -- to also create the NetworkPolicy on the infra cluster. The policy:

  • Blocks egress to the infra cluster clusterNetwork and serviceNetwork CIDRs
  • Allows inter-VM communication (same infra-id), DNS, ingress controller, and external traffic
  • Omits kube-apiserver, oauth, and ignition-server-proxy pod selectors (those run on the management cluster and are reached via external IPs, already permitted by the 0.0.0.0/0 rule)

Test plan

  • Deploy HCP KubeVirt with external infra (--infra-kubeconfig-file + --infra-namespace)
  • Verify virt-launcher NetworkPolicy is created in the infra namespace on the workers cluster
  • Verify the policy egress rules block infra cluster CIDRs and allow inter-VM, DNS, ingress
  • Verify centralized infra (same-cluster) deployments continue to work unchanged
  • Verify guest VMs can still reach their control plane (kube-apiserver, oauth, ignition) via external routes
  • Verify guest VMs cannot reach arbitrary pods/services on the infra cluster

Made with Cursor

Summary by CodeRabbit

  • Documentation

  • Expanded RBAC guidance: added networkpolicies and events, adjusted volumesnapshot verbs, updated Role examples, and documented cluster-scoped RBAC needed to read cluster network configuration.

  • New Features

  • Network policy handling for external infrastructure clusters: discovers infra network config and applies refined virt-launcher ingress/egress policies with optional CIDR-based egress restrictions.

  • Bug Fixes / Observability

  • Added a hosted cluster condition, infra-cluster warning events, and status update behavior when infra kubeconfig cannot read cluster network config.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

@dpateriya: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-workflows 4e3c2c4 link unknown /test verify-workflows

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

orenc1 added a commit to orenc1/release that referenced this pull request May 11, 2026
…a SA

The hypershift-operator now creates a virt-launcher NetworkPolicy on
the external infrastructure cluster to enforce network isolation for
KubeVirt guest VMs (openshift/hypershift#8056, OCPBUGS-78575). A new
ValidKubeVirtInfraNetworkPolicyRBAC condition was added that requires
the external infra service account to have networking.k8s.io/networkpolicies
permissions and cluster-scoped read access to networks.config.openshift.io
for CIDR-based egress rules.

The CI step that provisions the restricted infra kubeconfig was not
updated to include these permissions, so the e2e-hypershift-kubevirt
job fails with forbidden errors when the operator attempts to manage
the virt-launcher NetworkPolicy and read cluster network configuration.

Add networking.k8s.io/networkpolicies with full verbs to the
kv-external-infra-role, and create a ClusterRole+ClusterRoleBinding
granting get access to networks.config.openshift.io so the operator
can build CIDR-based egress restrictions for full tenant isolation.

Signed-off-by: Oren Cohen <ocohen@redhat.com>
Assisted-by: Claude <noreply@anthropic.com>
openshift-merge-bot Bot pushed a commit to openshift/release that referenced this pull request May 12, 2026
…a SA (#79081)

The hypershift-operator now creates a virt-launcher NetworkPolicy on
the external infrastructure cluster to enforce network isolation for
KubeVirt guest VMs (openshift/hypershift#8056, OCPBUGS-78575). A new
ValidKubeVirtInfraNetworkPolicyRBAC condition was added that requires
the external infra service account to have networking.k8s.io/networkpolicies
permissions and cluster-scoped read access to networks.config.openshift.io
for CIDR-based egress rules.

The CI step that provisions the restricted infra kubeconfig was not
updated to include these permissions, so the e2e-hypershift-kubevirt
job fails with forbidden errors when the operator attempts to manage
the virt-launcher NetworkPolicy and read cluster network configuration.

Add networking.k8s.io/networkpolicies with full verbs to the
kv-external-infra-role, and create a ClusterRole+ClusterRoleBinding
granting get access to networks.config.openshift.io so the operator
can build CIDR-based egress restrictions for full tenant isolation.


Assisted-by: Claude <noreply@anthropic.com>

Signed-off-by: Oren Cohen <ocohen@redhat.com>
mhanss pushed a commit to mhanss/release that referenced this pull request May 26, 2026
…a SA (openshift#79081)

The hypershift-operator now creates a virt-launcher NetworkPolicy on
the external infrastructure cluster to enforce network isolation for
KubeVirt guest VMs (openshift/hypershift#8056, OCPBUGS-78575). A new
ValidKubeVirtInfraNetworkPolicyRBAC condition was added that requires
the external infra service account to have networking.k8s.io/networkpolicies
permissions and cluster-scoped read access to networks.config.openshift.io
for CIDR-based egress rules.

The CI step that provisions the restricted infra kubeconfig was not
updated to include these permissions, so the e2e-hypershift-kubevirt
job fails with forbidden errors when the operator attempts to manage
the virt-launcher NetworkPolicy and read cluster network configuration.

Add networking.k8s.io/networkpolicies with full verbs to the
kv-external-infra-role, and create a ClusterRole+ClusterRoleBinding
granting get access to networks.config.openshift.io so the operator
can build CIDR-based egress restrictions for full tenant isolation.


Assisted-by: Claude <noreply@anthropic.com>

Signed-off-by: Oren Cohen <ocohen@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants