Skip to content

CORENET-6714: Enable Network Observability on Day 0#2925

Open
OlivierCazade wants to merge 1 commit into
openshift:masterfrom
OlivierCazade:day0
Open

CORENET-6714: Enable Network Observability on Day 0#2925
OlivierCazade wants to merge 1 commit into
openshift:masterfrom
OlivierCazade:day0

Conversation

@OlivierCazade

@OlivierCazade OlivierCazade commented Mar 10, 2026

Copy link
Copy Markdown

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection

    • Network Observability is installed by default on multi-node clusters
    • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
    • Users can explicitly enable/disable via spec.installNetworkObservability
  2. Deployment Tracking

    • After successful deployment, the NetworkObservabilityDeployed condition is set
    • All subsequent reconciliations immediately return when this condition is present
    • This prevents reinstallation if users delete the operator or FlowCollector
    • Users maintain full control after initial deployment
  3. Status Reporting

    • New ObservabilityConfig status level for degraded state reporting
    • Comprehensive error handling with clear degraded status messages
    • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Summary by CodeRabbit

  • New Features

    • Network Observability can now be installed and managed automatically from the cluster.
    • Added a default configuration for the observability components, including the required namespace and flow collection setup.
  • Bug Fixes

    • Improved handling of single-node clusters and empty installation settings.
    • Added clearer status reporting when deployment succeeds or fails.
  • Documentation

    • Updated the sample configuration with recommended values and usage notes for installation policy.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 10, 2026
@openshift-ci openshift-ci Bot requested a review from stleerh March 10, 2026 14:08
@openshift-ci-robot

openshift-ci-robot commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Mar 10, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces a new observability-controller that reconciles the singleton configv1.Network CR named cluster. When the NetworkObservabilityInstall feature gate is enabled, the controller evaluates the networkObservability.installationPolicy field to decide whether to install the Network Observability Operator via OLM (OLMv1 ClusterExtension or OLMv0 CSV) and create a FlowCollector CR, updating the NetworkObservabilityDeployed status condition on the operator Network CR. Supporting RBAC and OLM manifests are included as bindata.

Changes

Network Observability Controller

Layer / File(s) Summary
Operator and FlowCollector installation manifests
bindata/observability/07-observability-operator.yaml, bindata/observability/08-flowcollector.yaml
Defines the Kubernetes manifests applied by the controller: openshift-netobserv-operator namespace, installer ServiceAccount, cluster-scoped ClusterRole with extensive RBAC, ClusterRoleBinding, namespace-scoped Role/RoleBinding, OLM ClusterExtension on the stable channel, and the FlowCollector CR with eBPF agent, DNSTracking, sampling: 400, and Loki disabled.
Controller scaffolding, constants, and registration
pkg/controller/observability/observability_controller.go (lines 1–79), pkg/controller/add_networkconfig.go, sample-config.yaml
Exports constants for manifest paths and resource names, implements Add() to construct ReconcileObservability and register a watch on configv1.Network, appends observability.Add to AddToManagerFuncs, and documents installationPolicy values in sample-config.yaml.
Reconcile loop, feature gate, and status condition helpers
pkg/controller/observability/observability_controller.go (lines 80–256)
Implements Reconcile() with singleton filtering, isFeatureGateEnabled() guard, installation decision branch, operator install/wait branch, FlowCollector creation branch, and setNetworkObservabilityCondition() / wasNetworkObservabilityDeployed() helpers that upsert or read the NetworkObservabilityDeployed condition on the operator Network CR.
Installation policy decision and SNO detection
pkg/controller/observability/observability_controller.go (lines 257–323)
Implements shouldInstallNetworkObservability() mapping InstallAndEnable, NoAction, and default policy to install/skip, using wasNetworkObservabilityDeployed() for the "install once" default path and detecting SNO via configv1.Infrastructure.status.controlPlaneTopology == SingleReplicaTopologyMode.
Operator detection, OLMv1/v0 checks, manifest apply, and FlowCollector management
pkg/controller/observability/observability_controller.go (lines 324–599)
Implements isNetObservOperatorInstalled() checking FlowCollector CRD then OLMv1 ClusterExtension Installed condition and OLMv0 CSV phase; applyManifest() for server-side apply of multi-document YAML; installNetObservOperator() and waitForNetObservOperator() polling until Installed=True or timeout; isFlowCollectorExists() and createFlowCollector() which ensures the openshift-network-observability namespace and applies the FlowCollector manifest.
Controller test suite
pkg/controller/observability/observability_controller_test.go
Full test suite covering helper constructors, unit tests for shouldInstallNetworkObservability, isFeatureGateEnabled, isNetObservOperatorInstalled, waitForNetObservOperator, isFlowCollectorExists, and applyManifest; integration-style reconcile tests for skip/idempotency, policy transitions, failure/recovery, concurrent stress, and NetworkObservabilityDeployed condition assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Manager
  participant ReconcileObservability
  participant configv1Network as configv1.Network (cluster)
  participant operatorv1Network as operatorv1.Network
  participant OLM as OLM ClusterExtension
  participant FlowCollector

  Manager->>ReconcileObservability: Reconcile(cluster)
  ReconcileObservability->>ReconcileObservability: isFeatureGateEnabled(NetworkObservabilityInstall)
  ReconcileObservability->>configv1Network: Get installationPolicy + spec
  ReconcileObservability->>operatorv1Network: wasNetworkObservabilityDeployed()
  ReconcileObservability->>ReconcileObservability: shouldInstallNetworkObservability() [policy + SNO]

  alt should install
    ReconcileObservability->>OLM: isNetObservOperatorInstalled() [CRD + ClusterExtension/CSV]
    alt not installed
      ReconcileObservability->>OLM: applyManifest(07-observability-operator.yaml)
      ReconcileObservability->>OLM: waitForNetObservOperator() [poll Installed=True]
    end
    ReconcileObservability->>FlowCollector: isFlowCollectorExists()
    alt absent
      ReconcileObservability->>FlowCollector: createFlowCollector() [namespace + applyManifest]
    end
    ReconcileObservability->>operatorv1Network: setNetworkObservabilityCondition(True, DeploymentComplete)
  else failure
    ReconcileObservability->>operatorv1Network: setNetworkObservabilityCondition(False, DeploymentFailed)
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning The suite has 107 Gomega expectations and none include contextual failure messages; structure/timeouts look otherwise acceptable. Add helpful messages to key Expect assertions (especially error and state checks), and consider shared setup helpers if moving to Ginkgo.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: enabling Network Observability installation on Day 0.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo titles were added; the new tests use static, deterministic Test... names with no dynamic data in titles.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the new observability test file is plain Go unit tests with gomega, so MicroShift-specific API checks are not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo/e2e specs were added; the new tests are plain Go unit tests, and SNO cases are explicitly simulated or skipped by topology.
Topology-Aware Scheduling Compatibility ✅ Passed No new scheduling constraints were added; the manifests have no node selectors/affinity/spread/PDBs, and the controller explicitly gates default install on SNO topology.
Ote Binary Stdout Contract ✅ Passed Changed files add no stdout writes in process-level code; add_networkconfig init only appends controllers, and observability code has no Print/LogToStderr calls.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New tests are plain Go unit tests (no Ginkgo e2e), and I found no IPv4-only assumptions or external connectivity calls.
No-Weak-Crypto ✅ Passed Reviewed the new observability controller/tests and manifests; no MD5/SHA1/DES/RC4/3DES/Blowfish, custom crypto, or secret comparisons found.
Container-Privileges ✅ Passed Added observability manifests and controller code contain no privileged pods/containers, host namespaces, SYS_ADMIN, or privilege-escalation settings.
No-Sensitive-Data-In-Logs ✅ Passed New logs are generic controller/status messages; no passwords, tokens, PII, hostnames, or customer data are logged in touched files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci-robot

openshift-ci-robot commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Summary by CodeRabbit

  • New Features
  • Network Observability is now installable and manageable through cluster configuration—users can enable or disable network observability monitoring for their cluster.
  • Automatic deployment and lifecycle management of the observability operator and flow collection resources.
  • Network configuration status now reflects observability deployment state and health.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml (1)

11-14: Consider adding watch and delete verbs for OLM resources.

The controller may need watch to properly track subscription and CSV state changes via informers. Additionally, delete permission on subscriptions might be needed if you ever want to support uninstallation or cleanup scenarios.

🔧 Suggested addition
   # Manage OLM resources for operator installation
   - apiGroups: ["operators.coreos.com"]
     resources: ["subscriptions", "clusterserviceversions", "operatorgroups"]
-    verbs: ["get", "list", "create", "update", "patch"]
+    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml` around
lines 11 - 14, The RBAC rule for apiGroups "operators.coreos.com" covering
resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is
missing the "watch" and "delete" verbs; update the verbs array for that rule
(for the resources "subscriptions", "clusterserviceversions", and
"operatorgroups") to include "watch" (so informers can track state changes) and
"delete" (to allow cleanup/uninstallation) in addition to the existing verbs.
pkg/controller/observability/observability_controller_test.go (1)

1512-1518: Test creates files in working directory - may cause side effects.

This test creates a manifests/ directory and writes FlowCollectorYAML in the working directory. While defer os.Remove(FlowCollectorYAML) removes the file, it doesn't remove the manifests directory, which could persist across test runs.

🔧 Suggested fix for proper cleanup
 	err := os.MkdirAll("manifests", 0755)
 	g.Expect(err).NotTo(HaveOccurred())
+	defer os.RemoveAll("manifests")  // Clean up directory

 	// Create the FlowCollector manifest at the expected path
 	err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644)
 	g.Expect(err).NotTo(HaveOccurred())
-	defer os.Remove(FlowCollectorYAML)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1512 - 1518, The test creates a "manifests" directory and a file at
FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of
the file, leaving the manifests directory behind; update the test to always
clean up the directory by deferring a call to remove the directory (e.g. defer
os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or
replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file
created from flowCollectorManifest and the manifests directory are removed after
the test.
pkg/controller/observability/observability_controller.go (2)

356-392: Potential race condition in markNetworkObservabilityDeployed.

The function has a TOCTOU (time-of-check-time-of-use) pattern: it reads the latest Network CR, modifies conditions, then updates. If another controller or user modifies the Network CR between Get and Update, the update will fail with a conflict error (which would be retried by the caller).

This is acceptable since Kubernetes retries on conflicts, but consider using a retry loop here for better resilience.

🔧 Suggested: Add retry for conflict resilience
 func (r *ReconcileObservability) markNetworkObservabilityDeployed(ctx context.Context, network *configv1.Network) error {
+	return retry.RetryOnConflict(retry.DefaultBackoff, func() error {
+		return r.doMarkNetworkObservabilityDeployed(ctx)
+	})
+}
+
+func (r *ReconcileObservability) doMarkNetworkObservabilityDeployed(ctx context.Context) error {
 	// Check if condition already exists and is true
-	for _, condition := range network.Status.Conditions {
+	latest := &configv1.Network{}
+	if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil {
+		return err
+	}
+
+	for _, condition := range latest.Status.Conditions {
 		if condition.Type == NetworkObservabilityDeployed && condition.Status == metav1.ConditionTrue {
 			return nil // Already marked as deployed
 		}
 	}
-
-	// Get the latest version of the Network CR to avoid conflicts
-	latest := &configv1.Network{}
-	if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil {
-		return err
-	}
 	// ... rest of the function
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 356 -
392, markNetworkObservabilityDeployed currently does a Get -> modify ->
Status().Update which can fail on a concurrent update; wrap the
Get/modify/Status().Update sequence in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to
retry on conflicts, re-fetching the latest Network (using r.client.Get) inside
the loop before applying the condition and calling r.client.Status().Update so
transient conflicts are retried safely; keep the same condition logic and return
the final error from the retry.

288-319: Polling uses parent context which may already have timeout pressure.

waitForNetObservOperator uses the passed context for polling, but also defines its own checkTimeout (10 minutes). If the caller's context has a shorter deadline, the poll will exit early with the caller's context error rather than context.DeadlineExceeded from wait.PollUntilContextTimeout.

The current implementation may work correctly in practice since the controller-runtime context typically doesn't have a deadline, but this could cause unexpected behavior in tests or if the calling context changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 288 -
319, The polling uses the caller's ctx which can have a shorter deadline;
instead create a new timeout-bound context for the poll so it always runs up to
checkTimeout: inside waitForNetObservOperator, call
context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass
that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout,
condition as before); reference function waitForNetObservOperator and variables
checkInterval/checkTimeout to locate where to replace the ctx usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 166: The go.mod contains a temporary replace directive pointing to a
personal fork for openshift/api#2752; do not merge until the upstream PR is
merged. Add a CI/merge-block gate that fails or flags the PR when the replace
directive "replace github.com/openshift/api" is present, add a clear TODO
comment adjacent to the replace line referencing openshift/api#2752 and stating
it must be removed when that PR merges, and add repository-level tracking (e.g.,
in the PR description or a maintainer checklist) to actively monitor
openshift/api#2752 and remove the replace directive immediately once the
upstream PR is approved and merged.

In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1250-1265: The test spawns goroutines that call r.Reconcile and
uses g.Expect inside those goroutines, which can panic because Gomega's fail
handler (t.FailNow) must run in the main test goroutine; fix by removing
g.Expect from the goroutines and collecting errors via a channel (e.g., make
errCh chan error), have each goroutine send its err to errCh after calling
r.Reconcile(req), then in the main goroutine close/iterate the channel and
assert with g.Expect that all received errors are nil; alternatively create a
per-goroutine Gomega tied to a testing.T (NewWithT) if you need per-goroutine
assertions—reference r.Reconcile, req, and the done/err channel approach to
locate where to change code.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 169-173: The call to markNetworkObservabilityDeployed(err) is only
logged on failure which prevents the controller from retrying and can lead to
repeated FlowCollector creation; modify the reconciler to return the error
instead of just logging it so the reconciliation is requeued—locate the call to
r.markNetworkObservabilityDeployed(ctx, &network) in the reconciler function
(the surrounding code that currently does klog.Warningf on error) and replace
the log-without-return with returning a wrapped/annotated error (or simply
return err) so the controller will retry the reconcile loop; keep any logging
but ensure the function exits with an error when
markNetworkObservabilityDeployed fails.
- Around line 144-152: When waitForNetObservOperator(ctx) times out you
currently return ctrl.Result{RequeueAfter: 0}, nil which prevents any automatic
retry; update the timeout branch in the reconciler to either return a non-zero
RequeueAfter (e.g., RequeueAfter: time.Minute*5) so the controller will re-check
later, or return a wrapped error instead of nil to trigger error-based
requeueing; adjust the block around waitForNetObservOperator, the
r.status.SetDegraded call, and the return statement so the controller will retry
(refer to waitForNetObservOperator,
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady",
...), and the existing return ctrl.Result{RequeueAfter: 0}, nil).

---

Nitpick comments:
In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml`:
- Around line 11-14: The RBAC rule for apiGroups "operators.coreos.com" covering
resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is
missing the "watch" and "delete" verbs; update the verbs array for that rule
(for the resources "subscriptions", "clusterserviceversions", and
"operatorgroups") to include "watch" (so informers can track state changes) and
"delete" (to allow cleanup/uninstallation) in addition to the existing verbs.

In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1512-1518: The test creates a "manifests" directory and a file at
FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of
the file, leaving the manifests directory behind; update the test to always
clean up the directory by deferring a call to remove the directory (e.g. defer
os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or
replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file
created from flowCollectorManifest and the manifests directory are removed after
the test.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 356-392: markNetworkObservabilityDeployed currently does a Get ->
modify -> Status().Update which can fail on a concurrent update; wrap the
Get/modify/Status().Update sequence in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to
retry on conflicts, re-fetching the latest Network (using r.client.Get) inside
the loop before applying the condition and calling r.client.Status().Update so
transient conflicts are retried safely; keep the same condition logic and return
the final error from the retry.
- Around line 288-319: The polling uses the caller's ctx which can have a
shorter deadline; instead create a new timeout-bound context for the poll so it
always runs up to checkTimeout: inside waitForNetObservOperator, call
context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass
that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout,
condition as before); reference function waitForNetObservOperator and variables
checkInterval/checkTimeout to locate where to replace the ctx usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 826708e4-9aa4-4a6f-941f-be6e5a7c5d63

📥 Commits

Reviewing files that changed from the base of the PR and between 88891a9 and b1eeed3.

⛔ Files ignored due to path filters (36)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gitignore is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.golangci.yml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/FAQ.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASES is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/RELEASE.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/VERSIONING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/alias.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/code-of-conduct.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/conversion/conversion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/conversion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/decoder.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/metrics/metrics.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (9)
  • go.mod
  • manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
  • manifests/07-observability-operator.yaml
  • manifests/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • pkg/controller/statusmanager/status_manager.go
  • sample-config.yaml

Comment thread go.mod Outdated
Comment thread pkg/controller/observability/observability_controller_test.go
Comment thread pkg/controller/observability/observability_controller.go Outdated
Comment thread pkg/controller/observability/observability_controller.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller_test.go (1)

1268-1283: ⚠️ Potential issue | 🟡 Minor

Race condition risk: g.Expect called inside goroutines.

Gomega's fail handler calls t.FailNow() which must only be called from the test goroutine. If an assertion fails inside a spawned goroutine, this can cause a test panic.

Suggested fix using error channel
 	// Run 5 concurrent reconciliations
 	done := make(chan bool, 5)
+	errChan := make(chan error, 5)
 	for i := 0; i < 5; i++ {
 		go func() {
 			_, err := r.Reconcile(context.TODO(), req)
-			// All should complete without error (idempotent)
-			g.Expect(err).NotTo(HaveOccurred())
+			errChan <- err
 			done <- true
 		}()
 	}

 	// Wait for all to complete
 	for i := 0; i < 5; i++ {
 		<-done
+		err := <-errChan
+		g.Expect(err).NotTo(HaveOccurred())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1268 - 1283, The test currently calls g.Expect inside spawned goroutines (see
r.Reconcile, req, and done), which risks calling Gomega's FailNow from non-test
goroutines; instead capture errors in a channel from each goroutine and perform
the assertion in the main test goroutine. Spawn the 5 goroutines to call
r.Reconcile and send any returned error (or nil) into an errs channel, then
after reading all results from errs in the main goroutine use
g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from
inside the goroutines.
pkg/controller/observability/observability_controller.go (1)

144-152: ⚠️ Potential issue | 🟠 Major

Timeout stops reconciliation permanently without requeue.

When waitForNetObservOperator times out, returning RequeueAfter: 0 with no error means the controller stops trying. If the operator eventually succeeds, this controller won't notice and won't create the FlowCollector.

Consider returning a non-zero RequeueAfter to periodically check if the operator eventually becomes ready:

Suggested fix
 	if err := r.waitForNetObservOperator(ctx); err != nil {
 		if err == context.DeadlineExceeded {
 			klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
 			r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
-			return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
+			return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil // Retry later
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 144 -
152, When waitForNetObservOperator(ctx) returns context.DeadlineExceeded the
current code returns ctrl.Result{RequeueAfter: 0} which stops reconciliation
permanently; change that return to schedule a future requeue (e.g.,
ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller will
periodically re-check for the operator and eventually create the FlowCollector.
Update the branch handling DeadlineExceeded in the Reconcile logic (the block
around waitForNetObservOperator and r.status.SetDegraded) to return a non-zero
RequeueAfter (choose a sensible poll interval constant or reuse checkTimeout)
instead of 0 and keep returning nil error.
🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller_test.go (1)

1507-1572: Test creates manifest file in working directory.

This test writes to manifests/08-flowcollector.yaml in the working directory. While the cleanup with defer os.Remove() handles the file, consider:

  1. If the test panics before defer is registered (lines 1534-1536), the file persists
  2. Could conflict with parallel tests using the same path

Using a temp directory and temporarily overriding the FlowCollectorYAML constant (if possible) would be more robust, but this is acceptable for an integration-style test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1507 - 1572, TestReconcile_FirstTimeDeploymentSetsCondition writes to a fixed
manifests path (FlowCollectorYAML) which can leak or conflict; fix by creating a
temporary directory with os.MkdirTemp at test start, write the FlowCollector
manifest into that temp dir (use os.Create or os.WriteFile), set or inject the
FlowCollectorYAML path used by the reconciler to point to the temp file (or
refactor the code to accept a manifestPath parameter) before calling Reconcile,
and register defer os.RemoveAll(tempDir) immediately after creating the temp
directory to guarantee cleanup even on panic; update references to
FlowCollectorYAML in the test to use the temp file path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test currently calls g.Expect inside spawned
goroutines (see r.Reconcile, req, and done), which risks calling Gomega's
FailNow from non-test goroutines; instead capture errors in a channel from each
goroutine and perform the assertion in the main test goroutine. Spawn the 5
goroutines to call r.Reconcile and send any returned error (or nil) into an errs
channel, then after reading all results from errs in the main goroutine use
g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from
inside the goroutines.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 144-152: When waitForNetObservOperator(ctx) returns
context.DeadlineExceeded the current code returns ctrl.Result{RequeueAfter: 0}
which stops reconciliation permanently; change that return to schedule a future
requeue (e.g., ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller
will periodically re-check for the operator and eventually create the
FlowCollector. Update the branch handling DeadlineExceeded in the Reconcile
logic (the block around waitForNetObservOperator and r.status.SetDegraded) to
return a non-zero RequeueAfter (choose a sensible poll interval constant or
reuse checkTimeout) instead of 0 and keep returning nil error.

---

Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1507-1572: TestReconcile_FirstTimeDeploymentSetsCondition writes
to a fixed manifests path (FlowCollectorYAML) which can leak or conflict; fix by
creating a temporary directory with os.MkdirTemp at test start, write the
FlowCollector manifest into that temp dir (use os.Create or os.WriteFile), set
or inject the FlowCollectorYAML path used by the reconciler to point to the temp
file (or refactor the code to accept a manifestPath parameter) before calling
Reconcile, and register defer os.RemoveAll(tempDir) immediately after creating
the temp directory to guarantee cleanup even on panic; update references to
FlowCollectorYAML in the test to use the temp file path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b46e56ed-85bd-48c6-8a11-d0576b268eb9

📥 Commits

Reviewing files that changed from the base of the PR and between b1eeed3 and 9cc8e57.

⛔ Files ignored due to path filters (8)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (4)
  • go.mod
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • sample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • sample-config.yaml

@openshift-ci-robot

openshift-ci-robot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller.go (2)

157-162: ⚠️ Potential issue | 🟠 Major

Timeout path disables automatic recovery

Line 161 returns ctrl.Result{RequeueAfter: 0}, nil on timeout, so reconciliation stops and may never create the FlowCollector even if the operator becomes ready later.

Suggested fix
 	if err := r.waitForNetObservOperator(ctx); err != nil {
 		if err == context.DeadlineExceeded {
 			klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
 			r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
-			return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
+			return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
 		}

183-185: ⚠️ Potential issue | 🟠 Major

Deployment-condition update errors are swallowed

At Line 183-Line 185, markNetworkObservabilityDeployed failure is only logged. That can cause repeated deploy-path execution on later reconciles instead of retrying the status write immediately.

Suggested fix
 	// Mark as deployed in Network CR status
 	if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
-		klog.Warningf("Failed to update Network Observability deployment status: %v", err)
+		r.status.SetDegraded(statusmanager.ObservabilityConfig, "MarkDeployedError", fmt.Sprintf("Failed to update Network Observability deployment status: %v", err))
+		return ctrl.Result{}, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 183 -
185, The call to markNetworkObservabilityDeployed is currently logging failures
and swallowing the error, which prevents the reconcile loop from retrying status
writes; change the code in observability_controller.go so that if
r.markNetworkObservabilityDeployed(ctx, &network) returns an error you propagate
that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed: %w", err)
or wrap with kerrors) instead of only calling klog.Warningf so the reconciler
will requeue and retry the status update; locate the
markNetworkObservabilityDeployed call in the reconcile flow and replace the
swallowed-log branch with an error return (or requeue result) so failures are
retried immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 326-338: The readiness check in waitForNetObservOperator currently
returns after inspecting the first matching csv in csvs.Items which can be
non-deterministic; instead collect all CSVs whose name has the
"netobserv-operator" prefix, determine the intended CSV by selecting the one
with the highest semantic version parsed from the name (e.g., parse the version
suffix from csv.GetName() like "netobserv-operator.vX.Y.Z" and compare using
semver semantics), and then call unstructured.NestedString on that selected
CSV's Object to check status.phase; update the logic around csvs.Items,
csv.GetName(), and unstructured.NestedString to implement this deterministic
selection before returning phase == "Succeeded".

---

Duplicate comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 183-185: The call to markNetworkObservabilityDeployed is currently
logging failures and swallowing the error, which prevents the reconcile loop
from retrying status writes; change the code in observability_controller.go so
that if r.markNetworkObservabilityDeployed(ctx, &network) returns an error you
propagate that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed:
%w", err) or wrap with kerrors) instead of only calling klog.Warningf so the
reconciler will requeue and retry the status update; locate the
markNetworkObservabilityDeployed call in the reconcile flow and replace the
swallowed-log branch with an error return (or requeue result) so failures are
retried immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63c37a51-4f01-487a-9005-0b35a6728d9f

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc8e57 and 5bb3220.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller.go

Comment thread pkg/controller/observability/observability_controller.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller.go (1)

133-146: Namespace creation handles not-found but not already-exists race.

The Get-then-Create pattern at lines 136-146 could encounter an AlreadyExists error if another process creates the namespace between the Get and Create calls. This would fail the reconciliation, though it will succeed on retry.

♻️ Optional: handle AlreadyExists gracefully
 		if errors.IsNotFound(err) {
 			if err := r.client.Create(ctx, ns); err != nil {
+				if errors.IsAlreadyExists(err) {
+					klog.V(4).Infof("Namespace %s was created by another process", OperatorNamespace)
+				} else {
 				r.status.SetDegraded(statusmanager.ObservabilityConfig, "CreateNamespaceError", fmt.Sprintf("Failed to create namespace %s: %v", OperatorNamespace, err))
 				return ctrl.Result{}, err
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 133 -
146, The Get-then-Create namespace flow in Reconcile
(observability_controller.go) can race: after r.client.Get succeeds with
NotFound, r.client.Create may return AlreadyExists if another actor created the
namespace; update the Create error handling in the block that constructs ns
(using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do
not call r.status.SetDegraded or return error) and only set degraded/return for
other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig,
...) for real create failures so reconciliation continues cleanly on races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 133-146: The Get-then-Create namespace flow in Reconcile
(observability_controller.go) can race: after r.client.Get succeeds with
NotFound, r.client.Create may return AlreadyExists if another actor created the
namespace; update the Create error handling in the block that constructs ns
(using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do
not call r.status.SetDegraded or return error) and only set degraded/return for
other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig,
...) for real create failures so reconciliation continues cleanly on races.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c15118e0-7f89-43e5-a75e-3ba878565507

📥 Commits

Reviewing files that changed from the base of the PR and between 5bb3220 and 2a76763.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/controller/observability/observability_controller_test.go (1)

1268-1283: ⚠️ Potential issue | 🟡 Minor

Race condition in concurrent test remains unaddressed.

Calling g.Expect() inside goroutines can cause test panics because Gomega's fail handler invokes t.FailNow(), which must only be called from the test's main goroutine.

🔧 Suggested fix using error channel
 	// Run 5 concurrent reconciliations
 	done := make(chan bool, 5)
+	errChan := make(chan error, 5)
 	for i := 0; i < 5; i++ {
 		go func() {
 			_, err := r.Reconcile(context.TODO(), req)
-			// All should complete without error (idempotent)
-			g.Expect(err).NotTo(HaveOccurred())
+			errChan <- err
 			done <- true
 		}()
 	}

 	// Wait for all to complete
 	for i := 0; i < 5; i++ {
 		<-done
+		err := <-errChan
+		g.Expect(err).NotTo(HaveOccurred())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1268 - 1283, The test invokes g.Expect inside goroutines which can call
t.FailNow from non-test goroutines; instead collect errors from r.Reconcile in a
buffered channel and perform Gomega assertions in the main test goroutine.
Replace the done channel with an errChan (buffered to 5), have each goroutine
call r.Reconcile(context.TODO(), req) and send the returned error into errChan,
then loop 5 times in the main goroutine to receive err := <-errChan and call
g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent
but ensures assertions (using g.Expect) run only on the main test goroutine.
🧹 Nitpick comments (2)
pkg/controller/observability/observability_controller_test.go (2)

936-938: Misleading test name and comment.

The test is named TestReconcile_FlowCollectorDeleted with comment "tests that reconciliation recreates FlowCollector if it gets deleted," but the actual assertion verifies that reconciliation skips reinstallation when the deployed condition is set. The test logic is correct per the PR's design (avoid reinstallation once deployed), but the name/comment should reflect this behavior.

♻️ Suggested rename for clarity
-// TestReconcile_FlowCollectorDeleted tests that reconciliation recreates
-// FlowCollector if it gets deleted
-func TestReconcile_FlowCollectorDeleted(t *testing.T) {
+// TestReconcile_SkipsReinstallWhenFlowCollectorDeleted tests that reconciliation
+// does NOT recreate FlowCollector after deletion if the deployed condition is set
+func TestReconcile_SkipsReinstallWhenFlowCollectorDeleted(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
936 - 938, Rename the test function TestReconcile_FlowCollectorDeleted and
update its comment to reflect the actual behavior being asserted: that
reconciliation skips reinstallation when the FlowCollector has the "deployed"
condition set. Specifically, change the function name (e.g., to
TestReconcile_SkipsReinstallWhenFlowCollectorDeployed) and update the preceding
comment to state "tests that reconciliation skips reinstallation of
FlowCollector when the deployed condition is present"; also update any
references to the old test name in the file (including test registration or
helper calls) so they remain consistent.

1530-1536: Test creates files in working directory without full cleanup.

The test creates manifests/ directory and writes a file but only removes the file on cleanup, not the directory. While manifests/ likely already exists in the repository, consider using t.TempDir() for full isolation, especially if tests run in parallel.

♻️ Suggested fix using temp directory
-	err := os.MkdirAll("manifests", 0755)
-	g.Expect(err).NotTo(HaveOccurred())
-
-	// Create the FlowCollector manifest at the expected path
-	err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644)
-	g.Expect(err).NotTo(HaveOccurred())
-	defer os.Remove(FlowCollectorYAML)
+	// Use temp directory and override the manifest path for this test
+	tmpDir := t.TempDir()
+	manifestPath := filepath.Join(tmpDir, "flowcollector.yaml")
+	err := os.WriteFile(manifestPath, []byte(flowCollectorManifest), 0644)
+	g.Expect(err).NotTo(HaveOccurred())
+	
+	// Note: This requires the controller to accept a configurable manifest path
+	// or use a test-specific approach to override FlowCollectorYAML

Alternatively, if modifying the controller isn't feasible, ensure the manifests directory cleanup:

+	defer os.RemoveAll("manifests")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1530 - 1536, The test creates a real manifests/ directory and only removes the
file, leaving the directory behind; update the test to use an isolated temp
directory so artifacts are fully cleaned up: create a temp dir via t.TempDir()
(or os.MkdirTemp if t is not available), write the FlowCollector manifest into
filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the existing
flowCollectorManifest, and set the test to use that path (or defer
os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests"
directory; alternatively ensure the test defers os.RemoveAll("manifests") after
writing—reference FlowCollectorYAML and flowCollectorManifest to locate the
write site in observability_controller_test.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test invokes g.Expect inside goroutines which can
call t.FailNow from non-test goroutines; instead collect errors from r.Reconcile
in a buffered channel and perform Gomega assertions in the main test goroutine.
Replace the done channel with an errChan (buffered to 5), have each goroutine
call r.Reconcile(context.TODO(), req) and send the returned error into errChan,
then loop 5 times in the main goroutine to receive err := <-errChan and call
g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent
but ensures assertions (using g.Expect) run only on the main test goroutine.

---

Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 936-938: Rename the test function
TestReconcile_FlowCollectorDeleted and update its comment to reflect the actual
behavior being asserted: that reconciliation skips reinstallation when the
FlowCollector has the "deployed" condition set. Specifically, change the
function name (e.g., to TestReconcile_SkipsReinstallWhenFlowCollectorDeployed)
and update the preceding comment to state "tests that reconciliation skips
reinstallation of FlowCollector when the deployed condition is present"; also
update any references to the old test name in the file (including test
registration or helper calls) so they remain consistent.
- Around line 1530-1536: The test creates a real manifests/ directory and only
removes the file, leaving the directory behind; update the test to use an
isolated temp directory so artifacts are fully cleaned up: create a temp dir via
t.TempDir() (or os.MkdirTemp if t is not available), write the FlowCollector
manifest into filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the
existing flowCollectorManifest, and set the test to use that path (or defer
os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests"
directory; alternatively ensure the test defers os.RemoveAll("manifests") after
writing—reference FlowCollectorYAML and flowCollectorManifest to locate the
write site in observability_controller_test.go.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 394e355a-3990-4ada-aaac-e4a5ee489446

📥 Commits

Reviewing files that changed from the base of the PR and between 2a76763 and 9548585.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller_test.go

Comment thread manifests/08-flowcollector.yaml Outdated
Comment thread manifests/08-flowcollector.yaml Outdated
Comment thread pkg/controller/observability/observability_controller.go Outdated
Comment thread pkg/controller/observability/observability_controller.go Outdated
// Check if Network Observability was previously deployed
// If so, we're done - no need to check or manage anything
if r.wasNetworkObservabilityDeployed(&network) {
klog.Info("Network Observability was previously deployed, skipping reconciliation.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to clear this flag so it will install Network Observability again? If not, we just need to make it clear in our documentation.

Comment thread pkg/controller/observability/observability_controller.go Outdated
err := r.client.Get(ctx, types.NamespacedName{
Name: "netobserv-operator",
Namespace: OperatorNamespace,
}, subscription)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PoC determined that Network Observability was installed if there is a Deployment with the "app=netobserv-operator" label. With the Service deployment model added, it uses a DaemonSet.

It seems checking for a Deployment or DaemonSet is more reliable than a Subscription, which is OLM-specific, because unofficially, you could install it with Helm. Should we not worry about that? Thoughts?

Group: "operators.coreos.com",
Version: "v1alpha1",
Kind: "ClusterServiceVersion",
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment in that CSV is specific to OLM. The alternative is to check for the Deployment or DaemonSet.

@stleerh stleerh requested review from dave-tucker and jotak March 24, 2026 18:42
@arkadeepsen

Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Mar 25, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
go.mod (1)

165-165: ⚠️ Potential issue | 🟠 Major

Keep the temporary openshift/api fork out of the merge path.

This still makes release builds resolve API types from a personal fork. Please keep a hard merge blocker on this line and remove it before merging.

Verify that the replace is still present and that the upstream API PR is not merged yet:

#!/bin/bash
set -euo pipefail

echo "replace directive in go.mod:"
rg -n '^replace github\.com/openshift/api ' go.mod

echo
echo "openshift/api#2752 state:"
curl -fsSL https://api.github.com/repos/openshift/api/pulls/2752 | jq '{state, merged_at, title, html_url}'

Expected result: if the replace line is present and merged_at is null, this remains a pre-merge blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 165, The go.mod contains a temporary replace directive
referencing a personal fork ("replace github.com/openshift/api
v0.0.0-20260302174620-dcac36b908db => github.com/OlivierCazade/api
v0.0.0-20260324161534-5d972e0f328a") which must not make it into mainline;
remove that replace before merging OR add a strict CI merge-blocker that fails
the build if that exact replace line exists (e.g., a simple grep check
equivalent to the reviewer’s snippet), and ensure the replace is only present in
a local/temporary branch and not in the merged branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 144-148: When r.waitForNetObservOperator(ctx) returns
context.DeadlineExceeded, in addition to requeuing keep the current behavior but
call r.status.SetDegraded with the ObservabilityConfig key (use the existing
"OperatorNotReady" reason) and a clear message including checkTimeout so the CNO
status reflects the timeout; ensure you still return ctrl.Result{RequeueAfter:
requeueAfter}, nil. Reference the r.waitForNetObservOperator function,
r.status.SetDegraded method, the ObservabilityConfig constant/key, the
"OperatorNotReady" reason, and the checkTimeout/requeueAfter variables when
adding the status update.
- Around line 129-140: The current logic short-circuits once any Subscription
exists (isNetObservOperatorInstalled) which stops reapplying the desired OLM
manifest and can leave a drifted/partial install stuck; change reconcile to
always call installNetObservOperator until the operator is fully ready (e.g.,
check CSV/ClusterServiceVersion or subscription status for installed/ready
state) instead of relying on Subscription existence: remove or stop using
isNetObservOperatorInstalled, have installNetObservOperator be idempotent and
reapply the operator manifest each reconcile when readiness checks
(CSV/Subscription conditions) are not satisfied, and keep setting degraded via
r.status.SetDegraded(...) on failures so errors still surface.

---

Duplicate comments:
In `@go.mod`:
- Line 165: The go.mod contains a temporary replace directive referencing a
personal fork ("replace github.com/openshift/api
v0.0.0-20260302174620-dcac36b908db => github.com/OlivierCazade/api
v0.0.0-20260324161534-5d972e0f328a") which must not make it into mainline;
remove that replace before merging OR add a strict CI merge-blocker that fails
the build if that exact replace line exists (e.g., a simple grep check
equivalent to the reviewer’s snippet), and ensure the replace is only present in
a local/temporary branch and not in the merged branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d4dbd2bf-3cd4-41dd-a545-948c1fba572d

📥 Commits

Reviewing files that changed from the base of the PR and between 2a76763 and 844d745.

⛔ Files ignored due to path filters (7)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (9)
  • go.mod
  • manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
  • manifests/07-observability-operator.yaml
  • manifests/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • pkg/controller/statusmanager/status_manager.go
  • sample-config.yaml
✅ Files skipped from review due to trivial changes (4)
  • pkg/controller/add_networkconfig.go
  • manifests/07-observability-operator.yaml
  • manifests/08-flowcollector.yaml
  • manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • sample-config.yaml
  • pkg/controller/observability/observability_controller_test.go

Comment on lines +129 to +140
installed, err := r.isNetObservOperatorInstalled(ctx)
if err != nil {
r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err))
return ctrl.Result{}, err
}
if !installed {
// Install Network Observability Operator
if err := r.installNetObservOperator(ctx); err != nil {
r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err))
return ctrl.Result{}, err
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reapply the operator manifest until readiness instead of keying on Subscription existence.

As soon as any Subscription exists, this branch stops asserting the desired OLM spec. A drifted or half-created install can then sit in the wait/requeue path forever because installNetObservOperator is never called again.

Suggested fix
-	// First time installation - proceed with operator installation
-	installed, err := r.isNetObservOperatorInstalled(ctx)
-	if err != nil {
-		r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err))
-		return ctrl.Result{}, err
-	}
-	if !installed {
-		// Install Network Observability Operator
-		if err := r.installNetObservOperator(ctx); err != nil {
-			r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err))
-			return ctrl.Result{}, err
-		}
-	}
+	// Re-apply the desired OLM resources until the operator becomes ready.
+	if err := r.installNetObservOperator(ctx); err != nil {
+		r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to apply Network Observability Operator manifest: %v", err))
+		return ctrl.Result{}, err
+	}

That also makes isNetObservOperatorInstalled removable afterward.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
installed, err := r.isNetObservOperatorInstalled(ctx)
if err != nil {
r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err))
return ctrl.Result{}, err
}
if !installed {
// Install Network Observability Operator
if err := r.installNetObservOperator(ctx); err != nil {
r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err))
return ctrl.Result{}, err
}
}
// Re-apply the desired OLM resources until the operator becomes ready.
if err := r.installNetObservOperator(ctx); err != nil {
r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to apply Network Observability Operator manifest: %v", err))
return ctrl.Result{}, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 129 -
140, The current logic short-circuits once any Subscription exists
(isNetObservOperatorInstalled) which stops reapplying the desired OLM manifest
and can leave a drifted/partial install stuck; change reconcile to always call
installNetObservOperator until the operator is fully ready (e.g., check
CSV/ClusterServiceVersion or subscription status for installed/ready state)
instead of relying on Subscription existence: remove or stop using
isNetObservOperatorInstalled, have installNetObservOperator be idempotent and
reapply the operator manifest each reconcile when readiness checks
(CSV/Subscription conditions) are not satisfied, and keep setting degraded via
r.status.SetDegraded(...) on failures so errors still surface.

Comment thread pkg/controller/observability/observability_controller.go Outdated
@arkadeepsen

Copy link
Copy Markdown
Member

@OlivierCazade PLMK when the PR is ready to be reviewed.

@OlivierCazade OlivierCazade force-pushed the day0 branch 2 times, most recently from 67701bb to 66bf264 Compare March 25, 2026 11:26
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@OlivierCazade OlivierCazade force-pushed the day0 branch 5 times, most recently from 4b236f1 to 477a479 Compare March 26, 2026 12:08
@arkadeepsen

Copy link
Copy Markdown
Member

@arkadeepsen, I believem, all comments have been addressed.

Sure. I'll post my comments in one or two days.

@mffiedler

Copy link
Copy Markdown

/retest-required

@arkadeepsen arkadeepsen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the e2e tests for this feature going to be added?

Comment thread sample-config.yaml Outdated
Comment on lines +15 to +18
# Valid values: "", "InstallAndEnable", "DoNotInstall"
# Default (empty or omitted): enabled on multi-node clusters, disabled on SNO
# "InstallAndEnable": explicitly enable (even on SNO)
# "DoNotInstall": explicitly disable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text should be updated here: DoNotInstall -> NoAction.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,42 @@
apiVersion: rbac.authorization.k8s.io/v1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing RBAC for CNO provide cluster-admin permissions to it. I don't think this file is needed as CNO already has the required permissions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the file

Comment on lines +76 to +80
// StatusReporter is an interface for reporting status
type StatusReporter interface {
SetDegraded(level statusmanager.StatusLevel, reason, message string)
SetNotDegraded(level statusmanager.StatusLevel)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used to update status. Let's remove it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


type ReconcileObservability struct {
client crclient.Client
status StatusReporter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this one as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +330 to +332
err := r.client.Get(ctx, types.NamespacedName{
Name: "flowcollectors.flows.netobserv.io",
}, crd)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also check if there's a CR available, along with checking whether the CRD exists, to be more sure that the operator is installed or not?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the function is doing, it does a GET query with the flowcollector name on the CRD.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant flowcollector object (custom resource), not the CRD. Having the CRD installed will not ensure that the operator is actually installed. Ideally, the check should be NOO is installed and the flowcollector object is created.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently multiple way for NOO to be installed (OLM v0 or v1, and helm chart for upstream), the common point between all of them is the CRD. Checking more precisely would introduce a lot of complexity because we would have to check for the different cases and IMO this is a bit unnecessary because unless the user has been hacking a lot, both always come together.

The suggestion actually come from the OLM channel, we asked for a way to check for both OLMv0 and OLMv1 installation and they suggested using the CRD.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if helm chart is officially supported in downstream.
For the others cases, I think we should do the following:

  • Check for OLMv0 resources. If they exist then don't do anything.
  • If OLMv0 resources don't exist, then check for OLMv1 resources. If they exist then don't do anything. If they don't exist then install everything.

Now, the user can also delete the flowcollector object created by CNO. So, in the second scenario, the flowcollector resource existence should also be checked and if it doesn't exist, then it should also be created again.

Existence of CRD only doesn't guarantee that all the required resources for NOO are installed for it to function correctly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkadeepsen are you fine with the last changes regarding this thread ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on PTO till yesterday. I'll take a look at the latest changes and let you know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with chai-bot regarding the various scenarios: https://redhat-internal.slack.com/archives/C3VS0LV41/p1782466535969549

Following is my understanding:

  • If operator is installed via OLMv0, then the resources installed during installation of the operator is not managed, i.e. the CRD if deleted will not be re-installed. Thus, for the case of OLMv0, the CRD may not exist, but the operator can still be running.
  • If operator is installed via OLMv1, then the resources are managed and if deleted will be re-installed. Thus, if the ClusterExtension object is installed, CRD will always be installed.
  • The user can delete the ClusterExtension object in the case of OLMv1 using --cascade=orphan which will not delete the CRD. For the case of OLMv0, when CSV and Subscription are deleted, the operator is removed, but the CRD is not removed. Currently if CRD exists and OLM resources do not exist, operator is not installed.

The current flow will try to install the operator if it detects that the CRD is not installed, but the operator may be installed using OLMv0 and CNO will install another version of NOO via OLMv1. A check for this scenario should be added.

I think we also need to document the pitfalls of the different install methods (OLMv0 and OLMv1) as this does deviate from what the API documents:

	// NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
	// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
	NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed a bit the function to handle the case of an already installed NOO with a deleted CRD :

Updated behavior:

  • Always checks both OLMv1 and OLMv0
    • If CRD is missing:
      • OLMv1 present → Returns error (operator deployed via OLMv1 but CRD manually removed). This case if unlikely to happens because it would only happens if the reconciliation happens while OLMv1 is still recreating the CRD or if OLMv1 failed to recreate it.
      • OLMv0 present → Returns error (operator deployed via OLMv0 but CRD manually removed)
      • Neither present → Returns false (operator not installed)
    • If CRD exists:
      • OLMv1 present → Returns true
      • OLMv0 present → Returns true
      • Neither present → Returns error (CRD exists but no OLM installation found)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About documenting this cases, I created a PR on the API side :
openshift/api#2907

Comment on lines +153 to +157
// Mark as deployed to track deployment status
if err := r.markNetworkObservabilityDeployed(ctx); err != nil {
klog.Warningf("Failed to mark Network Observability as deployed: %v. Will retry in %v.", err, requeueAfterStandard)
return ctrl.Result{RequeueAfter: requeueAfterStandard}, nil
}

@arkadeepsen arkadeepsen Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mark the condition in both scenarios, not only in the case of successful install. This will be easier for users to track the staus of the operator deployment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, when the deployment is failing, the condition is updated with the cause of the failure

@OlivierCazade

Copy link
Copy Markdown
Author

About the e2e tests, I did not find any e2e tests in this repository, how are you usually implementing them ?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller.go (1)

112-117: 💤 Low value

Consider logging if status condition update fails.

Discarding the error from setNetworkObservabilityCondition at lines 115, 125, 129, and 147 is acceptable since the primary error is already being handled with a requeue. However, adding a warning log when the condition update fails would improve observability without changing the control flow.

 		if err := r.installNetObservOperator(ctx); err != nil {
 			klog.Warningf("Failed to install Network Observability Operator: %v. Will retry in %v.", err, requeueAfterOLM)
 			// Mark deployment as failed
-			_ = r.setNetworkObservabilityCondition(ctx, operatorv1.ConditionFalse, "DeploymentFailed", fmt.Sprintf("Failed to install Network Observability Operator: %v", err))
+			if condErr := r.setNetworkObservabilityCondition(ctx, operatorv1.ConditionFalse, "DeploymentFailed", fmt.Sprintf("Failed to install Network Observability Operator: %v", err)); condErr != nil {
+				klog.V(4).Infof("Failed to update deployment status condition: %v", condErr)
+			}
 			return ctrl.Result{RequeueAfter: requeueAfterOLM}, nil
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/observability/observability_controller.go` around lines 112 -
117, The call to setNetworkObservabilityCondition in the error paths (used
alongside installNetObservOperator) currently discards its returned error;
update those sites to capture the error from setNetworkObservabilityCondition
and emit a klog.Warningf (or similar) noting that the condition update failed
and include the error, but do not change the existing control flow or requeue
behavior — just log the failure to improve observability.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 112-117: The call to setNetworkObservabilityCondition in the error
paths (used alongside installNetObservOperator) currently discards its returned
error; update those sites to capture the error from
setNetworkObservabilityCondition and emit a klog.Warningf (or similar) noting
that the condition update failed and include the error, but do not change the
existing control flow or requeue behavior — just log the failure to improve
observability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 17a029ad-6c04-4afc-8147-c652211b825f

📥 Commits

Reviewing files that changed from the base of the PR and between ba9e013 and c0f4f25.

⛔ Files ignored due to path filters (100)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/.golangci.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_apiserver.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_authentication.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_cluster_operator.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_dns.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_image.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_infrastructure.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_kmsencryption.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1alpha1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/console/v1/types_console_plugin.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/console/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/envtest-releases.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/install.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/register.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/types_pacemakercluster.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1alpha1/types_pacemakercluster.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/features.md is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features/legacyfeaturegates.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/install.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/machineconfiguration/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machineconfiguration/v1/types_machineconfignode.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1/types_csi_cluster_driver.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/types_ingress.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1alpha1/types_clusterapi.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/quota/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/quota/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/quota/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/route/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/route/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/route/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/security/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/security/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/security/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/security/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gitignore is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.golangci.yml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/FAQ.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASES is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/RELEASE.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/VERSIONING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/alias.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/code-of-conduct.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (7)
  • bindata/observability/07-observability-operator.yaml
  • bindata/observability/08-flowcollector.yaml
  • go.mod
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • sample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • sample-config.yaml
  • bindata/observability/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
  • go.mod

@arkadeepsen

Copy link
Copy Markdown
Member

About the e2e tests, I did not find any e2e tests in this repository, how are you usually implementing them ?

The tests are added to https://github.com/openshift/origin repo. Currently, this is a TP feature. You can refer to https://docs.google.com/document/d/1zOL38_KDKwAvsx-LMHfyDdX4-jq3HQU1YWBjfuHaM0Q/edit?usp=drivesdk for details regarding feature gates. I think it'll also have details about test coverage. It also has details about lifting the feature gates and making features available by default.

@OlivierCazade

Copy link
Copy Markdown
Author

@arkadeepsen, outside of the e2e, does this PR is fine for you now ?

@arkadeepsen

Copy link
Copy Markdown
Member

@arkadeepsen, outside of the e2e, does this PR is fine for you now ?

I think this comment is still not resolved: #2925 (comment)

The description of InstallAndEnable in the API states the following:

	// NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
	// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
	NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"

I am not sure all the scenarios are handled for this mode. The above comment is related to how CNO will detect that the user has removed the resources managed by it.

@OlivierCazade

Copy link
Copy Markdown
Author

Sorry, answering in the comment thread.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 426-447: The ReconcileObservability struct in these tests is
created with featureGate set to nil, causing Reconcile() to exit early at the
feature-gate check before reaching the behavior the tests are designed to
validate. To fix this, initialize the featureGate field in the
ReconcileObservability struct with a properly configured feature gate that
enables the observability feature (rather than leaving it nil). This pattern
should be applied to the ReconcileObservability initialization across all the
affected test functions: TestReconcile_SkipsWhenDisabled at lines 426-447, the
tests at lines 492-532, 970-1001, 1045-1084, 1124-1235, and 1556-1585 so that
Reconcile() progresses past the feature-gate guard and actually exercises the
behavior being asserted.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 105-109: When isNetObservOperatorInstalled returns an error, the
controller currently only logs a warning and requeues without updating the
Network CR status. To surface this error to users, set the
NetworkObservabilityDeployed condition to False with the error message included
whenever isNetObservOperatorInstalled fails. This will make persistent
installation issues (like ClusterExtension failures or inability to identify the
installation source) visible in the Network CR status, allowing users to
understand why deployment is stalled rather than only seeing a requeue loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e6771cd-b873-4c39-89bf-5237154cf733

📥 Commits

Reviewing files that changed from the base of the PR and between c0f4f25 and bc5e876.

⛔ Files ignored due to path filters (100)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/.golangci.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_apiserver.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_authentication.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_cluster_operator.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_dns.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_image.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_infrastructure.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_kmsencryption.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1alpha1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/console/v1/types_console_plugin.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/console/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/envtest-releases.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/install.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/register.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/types_pacemakercluster.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/etcd/v1alpha1/types_pacemakercluster.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/etcd/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/features.md is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features/legacyfeaturegates.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/install.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/machineconfiguration/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machineconfiguration/v1/types_machineconfignode.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1/types_csi_cluster_driver.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/types_ingress.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-OKD.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1alpha1/types_clusterapi.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/quota/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/quota/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/quota/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/route/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/route/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/route/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/security/v1/generated.proto is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/security/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/security/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/security/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gitignore is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.golangci.yml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/FAQ.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASES is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/RELEASE.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/VERSIONING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/alias.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/code-of-conduct.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (7)
  • bindata/observability/07-observability-operator.yaml
  • bindata/observability/08-flowcollector.yaml
  • go.mod
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • sample-config.yaml
✅ Files skipped from review due to trivial changes (1)
  • sample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • pkg/controller/add_networkconfig.go
  • bindata/observability/08-flowcollector.yaml

Comment thread pkg/controller/observability/observability_controller_test.go
Comment thread pkg/controller/observability/observability_controller.go
@stleerh

stleerh commented Jun 23, 2026

Copy link
Copy Markdown

/lgtm

@arkadeepsen Are there any outstanding issues?

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2026
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: OlivierCazade, stleerh
Once this PR has been reviewed and has the lgtm label, please assign kyrtapz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026
@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@OlivierCazade

Copy link
Copy Markdown
Author

Rebased the branch due to conflicts (no need to update openshift/api anymore it was updated on the main branch)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
bindata/observability/07-observability-operator.yaml (1)

38-44: 🔒 Security & Privacy | 🔵 Trivial

Drop the webhook RBAC after install create/list/watch on validatingwebhookconfigurations is cluster-wide; if this SA is only needed for OLM v1 pre-authorization, remove or narrow the binding once installation finishes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/observability/07-observability-operator.yaml` around lines 38 - 44,
The RBAC on validatingwebhookconfigurations is broader than needed after
installation, so narrow or remove the create/list/watch permissions once the OLM
v1 pre-authorization flow is done. Update the relevant binding in
observability/07-observability-operator.yaml so the service account used for
install-time webhook setup no longer retains cluster-wide webhook management
access, while keeping only the specific permissions needed by the install path.

Sources: Path instructions, Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/observability/07-observability-operator.yaml`:
- Around line 1-4: The openshift-netobserv-operator Namespace is missing the
usual network isolation resource. Add a NetworkPolicy for this namespace,
following the pattern used by the other operator namespace manifests in this
repo, so the operator namespace is explicitly isolated for ingress and egress;
place it alongside the existing Namespace manifest and ensure it targets
openshift-netobserv-operator.

---

Nitpick comments:
In `@bindata/observability/07-observability-operator.yaml`:
- Around line 38-44: The RBAC on validatingwebhookconfigurations is broader than
needed after installation, so narrow or remove the create/list/watch permissions
once the OLM v1 pre-authorization flow is done. Update the relevant binding in
observability/07-observability-operator.yaml so the service account used for
install-time webhook setup no longer retains cluster-wide webhook management
access, while keeping only the specific permissions needed by the install path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e3de99a-274e-4c6c-8024-88ca299bd4a0

📥 Commits

Reviewing files that changed from the base of the PR and between bc5e876 and d9d4cad.

⛔ Files ignored due to path filters (28)
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gitignore is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.golangci.yml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/FAQ.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASES is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/RELEASE.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/VERSIONING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/alias.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/code-of-conduct.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (6)
  • bindata/observability/07-observability-operator.yaml
  • bindata/observability/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • sample-config.yaml
✅ Files skipped from review due to trivial changes (2)
  • bindata/observability/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • sample-config.yaml
  • pkg/controller/observability/observability_controller_test.go
  • pkg/controller/observability/observability_controller.go

Comment on lines +1 to +4
apiVersion: v1
kind: Namespace
metadata:
name: openshift-netobserv-operator

@coderabbitai coderabbitai Bot Jun 24, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether a NetworkPolicy exists for the new namespace.
# Expected: at least one NetworkPolicy targeting openshift-netobserv-operator, or an intentional documented exception.
rg -n -C3 'kind:\s*NetworkPolicy|namespace:\s*openshift-netobserv-operator' --glob '*.yaml' --glob '*.yml'

Repository: openshift/cluster-network-operator

Length of output: 172


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== files mentioning openshift-netobserv-operator =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'openshift-netobserv-operator' .

echo
echo "== NetworkPolicy manifests =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'kind:\s*NetworkPolicy' .

echo
echo "== target file excerpt =="
sed -n '1,80p' bindata/observability/07-observability-operator.yaml

Repository: openshift/cluster-network-operator

Length of output: 5247


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== files mentioning openshift-netobserv-operator =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'openshift-netobserv-operator' .

echo
echo "== NetworkPolicy manifests =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'kind:\s*NetworkPolicy' .

echo
echo "== target file excerpt =="
sed -n '1,80p' bindata/observability/07-observability-operator.yaml

Repository: openshift/cluster-network-operator

Length of output: 5247


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== manifests/0000_70_cluster-network-operator_00_namespace.yaml =="
sed -n '1,120p' manifests/0000_70_cluster-network-operator_00_namespace.yaml

echo
echo "== NetworkPolicy objects in bindata/observability/07-observability-operator.yaml =="
rg -n -C4 'kind:\s*NetworkPolicy|apiVersion:\s*networking.k8s.io/v1|metadata:|namespace:' bindata/observability/07-observability-operator.yaml

echo
echo "== Namespace-scoped policy refs across repo =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' 'namespace:\s*openshift-netobserv-operator|name:\s*openshift-netobserv-operator' .

Repository: openshift/cluster-network-operator

Length of output: 3995


Define a NetworkPolicy for openshift-netobserv-operator. The namespace is created without any ingress/egress isolation, unlike other operator namespaces in this repo.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/observability/07-observability-operator.yaml` around lines 1 - 4, The
openshift-netobserv-operator Namespace is missing the usual network isolation
resource. Add a NetworkPolicy for this namespace, following the pattern used by
the other operator namespace manifests in this repo, so the operator namespace
is explicitly isolated for ingress and egress; place it alongside the existing
Namespace manifest and ensure it targets openshift-netobserv-operator.

Source: Path instructions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually correct suggestion and we need netpols added for the netobserv namespace.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkadeepsen Agreed — this one still stands. Adding NetworkPolicy resources for openshift-netobserv-operator alongside bindata/observability/07-observability-operator.yaml would be the right fix to keep the new operator namespace explicitly isolated for ingress and egress.


✏️ Learnings added
Learnt from: arkadeepsen
Repo: openshift/cluster-network-operator PR: 2925
File: bindata/observability/07-observability-operator.yaml:1-4
Timestamp: 2026-06-26T14:57:46.466Z
Learning: In the OpenShift Cluster Network Operator repository, when a new operator namespace is introduced in Kubernetes manifests (such as `bindata/observability/07-observability-operator.yaml` for `openshift-netobserv-operator`), the namespace is expected to include NetworkPolicy resources to explicitly isolate ingress and egress traffic, following the existing operator namespace pattern in the repo.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@OlivierCazade OlivierCazade Jun 26, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are working on adding the network policy directly inside the bundle. We do not provide for now a netpol for the operator.

The network policy for the Network Observability stack are deployed by the operator.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@arkadeepsen arkadeepsen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of the e2e tests? It'd be great to check the various scenarios of InstallAndEnable option. I don't think there are UTs for the different scenarios.

Comment thread sample-config.yaml Outdated
Comment on lines +13 to +19
networkObservability:
# installationPolicy controls Network Observability installation during cluster deployment (day-0).
# Valid values: "", "InstallAndEnable", "NoAction"
# Default (empty or omitted): enabled on multi-node clusters, disabled on SNO
# "InstallAndEnable": explicitly enable (even on SNO)
# "NoAction": explicitly disable
installationPolicy: InstallAndEnable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be added in sample-cluster-network-config.yaml rather than in this file as the field was added to config.openshift.io/v1 rather than operator.openshift.io/v1.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment thread pkg/controller/observability/observability_controller.go Outdated
Comment on lines +330 to +332
err := r.client.Get(ctx, types.NamespacedName{
Name: "flowcollectors.flows.netobserv.io",
}, crd)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with chai-bot regarding the various scenarios: https://redhat-internal.slack.com/archives/C3VS0LV41/p1782466535969549

Following is my understanding:

  • If operator is installed via OLMv0, then the resources installed during installation of the operator is not managed, i.e. the CRD if deleted will not be re-installed. Thus, for the case of OLMv0, the CRD may not exist, but the operator can still be running.
  • If operator is installed via OLMv1, then the resources are managed and if deleted will be re-installed. Thus, if the ClusterExtension object is installed, CRD will always be installed.
  • The user can delete the ClusterExtension object in the case of OLMv1 using --cascade=orphan which will not delete the CRD. For the case of OLMv0, when CSV and Subscription are deleted, the operator is removed, but the CRD is not removed. Currently if CRD exists and OLM resources do not exist, operator is not installed.

The current flow will try to install the operator if it detects that the CRD is not installed, but the operator may be installed using OLMv0 and CNO will install another version of NOO via OLMv1. A check for this scenario should be added.

I think we also need to document the pitfalls of the different install methods (OLMv0 and OLMv1) as this does deviate from what the API documents:

	// NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
	// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
	NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"


installed, err := r.isNetObservOperatorInstalled(context.TODO())

// Should return no error but false (not installed yet)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment is incorrect.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

Comment on lines +1 to +4
apiVersion: v1
kind: Namespace
metadata:
name: openshift-netobserv-operator

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually correct suggestion and we need netpols added for the netobserv namespace.

// Use RawPatch with ApplyPatchType to avoid deprecated crclient.Apply
patch := crclient.RawPatch(types.ApplyPatchType, data)
if err := r.client.Patch(ctx, obj, patch, &crclient.PatchOptions{
Force: ptr.To(true),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Force is set to true, it'll overwrite any changes done by the user. Who owns the objects? If CNO does, then this is fine, however if the user owns the objects then this will not respect user changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object is owned by the user but created by the CNO.

I removed the force field to prevent overriding user object.

@OlivierCazade OlivierCazade force-pushed the day0 branch 2 times, most recently from c793431 to f2e4ea2 Compare June 26, 2026 20:19
This commit introduces automated installation and management of the
Network Observability Operator during cluster deployment (day-0).

Key features:
- Automatic operator installation via OLM when NetworkObservabilityInstall
  feature gate is enabled
- Opt-out model: installs by default except on SNO clusters

Implementation details:
- New observability controller in pkg/controller/observability/
- Manifest-based operator installation
- Default FlowCollector configuration
- RBAC permissions for OLM resource management
- namespace creation for operator and observability components
@openshift-ci

openshift-ci Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@OlivierCazade: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade 9ac0fcc link false /test 4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-upgrade 5af2fe8 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-fdp-qe 5af2fe8 link true /test e2e-aws-ovn-fdp-qe
ci/prow/e2e-ovn-ipsec-step-registry 5af2fe8 link true /test e2e-ovn-ipsec-step-registry
ci/prow/hypershift-e2e-aks 5af2fe8 link true /test hypershift-e2e-aks
ci/prow/e2e-aws-ovn-serial-1of2 5af2fe8 link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-metal-ipi-ovn-ipv6 5af2fe8 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-ovn-upgrade-ipsec 5af2fe8 link true /test e2e-aws-ovn-upgrade-ipsec
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 5af2fe8 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/e2e-gcp-ovn 5af2fe8 link true /test e2e-gcp-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec 5af2fe8 link true /test e2e-metal-ipi-ovn-ipv6-ipsec
ci/prow/e2e-azure-ovn-upgrade 5af2fe8 link true /test e2e-azure-ovn-upgrade
ci/prow/5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade 5af2fe8 link false /test 5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants