CORENET-6714: Enable Network Observability on Day 0#2925
Conversation
|
@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughIntroduces a new ChangesNetwork Observability Controller
Sequence Diagram(s)sequenceDiagram
participant Manager
participant ReconcileObservability
participant configv1Network as configv1.Network (cluster)
participant operatorv1Network as operatorv1.Network
participant OLM as OLM ClusterExtension
participant FlowCollector
Manager->>ReconcileObservability: Reconcile(cluster)
ReconcileObservability->>ReconcileObservability: isFeatureGateEnabled(NetworkObservabilityInstall)
ReconcileObservability->>configv1Network: Get installationPolicy + spec
ReconcileObservability->>operatorv1Network: wasNetworkObservabilityDeployed()
ReconcileObservability->>ReconcileObservability: shouldInstallNetworkObservability() [policy + SNO]
alt should install
ReconcileObservability->>OLM: isNetObservOperatorInstalled() [CRD + ClusterExtension/CSV]
alt not installed
ReconcileObservability->>OLM: applyManifest(07-observability-operator.yaml)
ReconcileObservability->>OLM: waitForNetObservOperator() [poll Installed=True]
end
ReconcileObservability->>FlowCollector: isFlowCollectorExists()
alt absent
ReconcileObservability->>FlowCollector: createFlowCollector() [namespace + applyManifest]
end
ReconcileObservability->>operatorv1Network: setNetworkObservabilityCondition(True, DeploymentComplete)
else failure
ReconcileObservability->>operatorv1Network: setNetworkObservabilityCondition(False, DeploymentFailed)
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml (1)
11-14: Consider addingwatchanddeleteverbs for OLM resources.The controller may need
watchto properly track subscription and CSV state changes via informers. Additionally,deletepermission on subscriptions might be needed if you ever want to support uninstallation or cleanup scenarios.🔧 Suggested addition
# Manage OLM resources for operator installation - apiGroups: ["operators.coreos.com"] resources: ["subscriptions", "clusterserviceversions", "operatorgroups"] - verbs: ["get", "list", "create", "update", "patch"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml` around lines 11 - 14, The RBAC rule for apiGroups "operators.coreos.com" covering resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is missing the "watch" and "delete" verbs; update the verbs array for that rule (for the resources "subscriptions", "clusterserviceversions", and "operatorgroups") to include "watch" (so informers can track state changes) and "delete" (to allow cleanup/uninstallation) in addition to the existing verbs.pkg/controller/observability/observability_controller_test.go (1)
1512-1518: Test creates files in working directory - may cause side effects.This test creates a
manifests/directory and writesFlowCollectorYAMLin the working directory. Whiledefer os.Remove(FlowCollectorYAML)removes the file, it doesn't remove themanifestsdirectory, which could persist across test runs.🔧 Suggested fix for proper cleanup
err := os.MkdirAll("manifests", 0755) g.Expect(err).NotTo(HaveOccurred()) + defer os.RemoveAll("manifests") // Clean up directory // Create the FlowCollector manifest at the expected path err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644) g.Expect(err).NotTo(HaveOccurred()) - defer os.Remove(FlowCollectorYAML)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 1512 - 1518, The test creates a "manifests" directory and a file at FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of the file, leaving the manifests directory behind; update the test to always clean up the directory by deferring a call to remove the directory (e.g. defer os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file created from flowCollectorManifest and the manifests directory are removed after the test.pkg/controller/observability/observability_controller.go (2)
356-392: Potential race condition in markNetworkObservabilityDeployed.The function has a TOCTOU (time-of-check-time-of-use) pattern: it reads the latest Network CR, modifies conditions, then updates. If another controller or user modifies the Network CR between Get and Update, the update will fail with a conflict error (which would be retried by the caller).
This is acceptable since Kubernetes retries on conflicts, but consider using a retry loop here for better resilience.
🔧 Suggested: Add retry for conflict resilience
func (r *ReconcileObservability) markNetworkObservabilityDeployed(ctx context.Context, network *configv1.Network) error { + return retry.RetryOnConflict(retry.DefaultBackoff, func() error { + return r.doMarkNetworkObservabilityDeployed(ctx) + }) +} + +func (r *ReconcileObservability) doMarkNetworkObservabilityDeployed(ctx context.Context) error { // Check if condition already exists and is true - for _, condition := range network.Status.Conditions { + latest := &configv1.Network{} + if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil { + return err + } + + for _, condition := range latest.Status.Conditions { if condition.Type == NetworkObservabilityDeployed && condition.Status == metav1.ConditionTrue { return nil // Already marked as deployed } } - - // Get the latest version of the Network CR to avoid conflicts - latest := &configv1.Network{} - if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil { - return err - } // ... rest of the function🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller.go` around lines 356 - 392, markNetworkObservabilityDeployed currently does a Get -> modify -> Status().Update which can fail on a concurrent update; wrap the Get/modify/Status().Update sequence in a retry loop using k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to retry on conflicts, re-fetching the latest Network (using r.client.Get) inside the loop before applying the condition and calling r.client.Status().Update so transient conflicts are retried safely; keep the same condition logic and return the final error from the retry.
288-319: Polling uses parent context which may already have timeout pressure.
waitForNetObservOperatoruses the passed context for polling, but also defines its owncheckTimeout(10 minutes). If the caller's context has a shorter deadline, the poll will exit early with the caller's context error rather thancontext.DeadlineExceededfromwait.PollUntilContextTimeout.The current implementation may work correctly in practice since the controller-runtime context typically doesn't have a deadline, but this could cause unexpected behavior in tests or if the calling context changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller.go` around lines 288 - 319, The polling uses the caller's ctx which can have a shorter deadline; instead create a new timeout-bound context for the poll so it always runs up to checkTimeout: inside waitForNetObservOperator, call context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout, condition as before); reference function waitForNetObservOperator and variables checkInterval/checkTimeout to locate where to replace the ctx usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 166: The go.mod contains a temporary replace directive pointing to a
personal fork for openshift/api#2752; do not merge until the upstream PR is
merged. Add a CI/merge-block gate that fails or flags the PR when the replace
directive "replace github.com/openshift/api" is present, add a clear TODO
comment adjacent to the replace line referencing openshift/api#2752 and stating
it must be removed when that PR merges, and add repository-level tracking (e.g.,
in the PR description or a maintainer checklist) to actively monitor
openshift/api#2752 and remove the replace directive immediately once the
upstream PR is approved and merged.
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1250-1265: The test spawns goroutines that call r.Reconcile and
uses g.Expect inside those goroutines, which can panic because Gomega's fail
handler (t.FailNow) must run in the main test goroutine; fix by removing
g.Expect from the goroutines and collecting errors via a channel (e.g., make
errCh chan error), have each goroutine send its err to errCh after calling
r.Reconcile(req), then in the main goroutine close/iterate the channel and
assert with g.Expect that all received errors are nil; alternatively create a
per-goroutine Gomega tied to a testing.T (NewWithT) if you need per-goroutine
assertions—reference r.Reconcile, req, and the done/err channel approach to
locate where to change code.
In `@pkg/controller/observability/observability_controller.go`:
- Around line 169-173: The call to markNetworkObservabilityDeployed(err) is only
logged on failure which prevents the controller from retrying and can lead to
repeated FlowCollector creation; modify the reconciler to return the error
instead of just logging it so the reconciliation is requeued—locate the call to
r.markNetworkObservabilityDeployed(ctx, &network) in the reconciler function
(the surrounding code that currently does klog.Warningf on error) and replace
the log-without-return with returning a wrapped/annotated error (or simply
return err) so the controller will retry the reconcile loop; keep any logging
but ensure the function exits with an error when
markNetworkObservabilityDeployed fails.
- Around line 144-152: When waitForNetObservOperator(ctx) times out you
currently return ctrl.Result{RequeueAfter: 0}, nil which prevents any automatic
retry; update the timeout branch in the reconciler to either return a non-zero
RequeueAfter (e.g., RequeueAfter: time.Minute*5) so the controller will re-check
later, or return a wrapped error instead of nil to trigger error-based
requeueing; adjust the block around waitForNetObservOperator, the
r.status.SetDegraded call, and the return statement so the controller will retry
(refer to waitForNetObservOperator,
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady",
...), and the existing return ctrl.Result{RequeueAfter: 0}, nil).
---
Nitpick comments:
In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml`:
- Around line 11-14: The RBAC rule for apiGroups "operators.coreos.com" covering
resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is
missing the "watch" and "delete" verbs; update the verbs array for that rule
(for the resources "subscriptions", "clusterserviceversions", and
"operatorgroups") to include "watch" (so informers can track state changes) and
"delete" (to allow cleanup/uninstallation) in addition to the existing verbs.
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1512-1518: The test creates a "manifests" directory and a file at
FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of
the file, leaving the manifests directory behind; update the test to always
clean up the directory by deferring a call to remove the directory (e.g. defer
os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or
replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file
created from flowCollectorManifest and the manifests directory are removed after
the test.
In `@pkg/controller/observability/observability_controller.go`:
- Around line 356-392: markNetworkObservabilityDeployed currently does a Get ->
modify -> Status().Update which can fail on a concurrent update; wrap the
Get/modify/Status().Update sequence in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to
retry on conflicts, re-fetching the latest Network (using r.client.Get) inside
the loop before applying the condition and calling r.client.Status().Update so
transient conflicts are retried safely; keep the same condition logic and return
the final error from the retry.
- Around line 288-319: The polling uses the caller's ctx which can have a
shorter deadline; instead create a new timeout-bound context for the poll so it
always runs up to checkTimeout: inside waitForNetObservOperator, call
context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass
that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout,
condition as before); reference function waitForNetObservOperator and variables
checkInterval/checkTimeout to locate where to replace the ctx usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 826708e4-9aa4-4a6f-941f-be6e5a7c5d63
⛔ Files ignored due to path filters (36)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/config/v1/types_network.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gitignoreis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.golangci.ymlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yamlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/FAQ.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/Makefileis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASESis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/README.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/RELEASE.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/VERSIONING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/alias.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/code-of-conduct.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/conversion/conversion.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/conversion.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/decoder.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/metrics/metrics.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (9)
go.modmanifests/0000_70_cluster-network-operator_02_rbac_observability.yamlmanifests/07-observability-operator.yamlmanifests/08-flowcollector.yamlpkg/controller/add_networkconfig.gopkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gopkg/controller/statusmanager/status_manager.gosample-config.yaml
There was a problem hiding this comment.
♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller_test.go (1)
1268-1283:⚠️ Potential issue | 🟡 MinorRace condition risk:
g.Expectcalled inside goroutines.Gomega's fail handler calls
t.FailNow()which must only be called from the test goroutine. If an assertion fails inside a spawned goroutine, this can cause a test panic.Suggested fix using error channel
// Run 5 concurrent reconciliations done := make(chan bool, 5) + errChan := make(chan error, 5) for i := 0; i < 5; i++ { go func() { _, err := r.Reconcile(context.TODO(), req) - // All should complete without error (idempotent) - g.Expect(err).NotTo(HaveOccurred()) + errChan <- err done <- true }() } // Wait for all to complete for i := 0; i < 5; i++ { <-done + err := <-errChan + g.Expect(err).NotTo(HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 1268 - 1283, The test currently calls g.Expect inside spawned goroutines (see r.Reconcile, req, and done), which risks calling Gomega's FailNow from non-test goroutines; instead capture errors in a channel from each goroutine and perform the assertion in the main test goroutine. Spawn the 5 goroutines to call r.Reconcile and send any returned error (or nil) into an errs channel, then after reading all results from errs in the main goroutine use g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from inside the goroutines.pkg/controller/observability/observability_controller.go (1)
144-152:⚠️ Potential issue | 🟠 MajorTimeout stops reconciliation permanently without requeue.
When
waitForNetObservOperatortimes out, returningRequeueAfter: 0with no error means the controller stops trying. If the operator eventually succeeds, this controller won't notice and won't create the FlowCollector.Consider returning a non-zero
RequeueAfterto periodically check if the operator eventually becomes ready:Suggested fix
if err := r.waitForNetObservOperator(ctx); err != nil { if err == context.DeadlineExceeded { klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout) r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout)) - return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue + return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil // Retry later }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller.go` around lines 144 - 152, When waitForNetObservOperator(ctx) returns context.DeadlineExceeded the current code returns ctrl.Result{RequeueAfter: 0} which stops reconciliation permanently; change that return to schedule a future requeue (e.g., ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller will periodically re-check for the operator and eventually create the FlowCollector. Update the branch handling DeadlineExceeded in the Reconcile logic (the block around waitForNetObservOperator and r.status.SetDegraded) to return a non-zero RequeueAfter (choose a sensible poll interval constant or reuse checkTimeout) instead of 0 and keep returning nil error.
🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller_test.go (1)
1507-1572: Test creates manifest file in working directory.This test writes to
manifests/08-flowcollector.yamlin the working directory. While the cleanup withdefer os.Remove()handles the file, consider:
- If the test panics before
deferis registered (lines 1534-1536), the file persists- Could conflict with parallel tests using the same path
Using a temp directory and temporarily overriding the
FlowCollectorYAMLconstant (if possible) would be more robust, but this is acceptable for an integration-style test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 1507 - 1572, TestReconcile_FirstTimeDeploymentSetsCondition writes to a fixed manifests path (FlowCollectorYAML) which can leak or conflict; fix by creating a temporary directory with os.MkdirTemp at test start, write the FlowCollector manifest into that temp dir (use os.Create or os.WriteFile), set or inject the FlowCollectorYAML path used by the reconciler to point to the temp file (or refactor the code to accept a manifestPath parameter) before calling Reconcile, and register defer os.RemoveAll(tempDir) immediately after creating the temp directory to guarantee cleanup even on panic; update references to FlowCollectorYAML in the test to use the temp file path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test currently calls g.Expect inside spawned
goroutines (see r.Reconcile, req, and done), which risks calling Gomega's
FailNow from non-test goroutines; instead capture errors in a channel from each
goroutine and perform the assertion in the main test goroutine. Spawn the 5
goroutines to call r.Reconcile and send any returned error (or nil) into an errs
channel, then after reading all results from errs in the main goroutine use
g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from
inside the goroutines.
In `@pkg/controller/observability/observability_controller.go`:
- Around line 144-152: When waitForNetObservOperator(ctx) returns
context.DeadlineExceeded the current code returns ctrl.Result{RequeueAfter: 0}
which stops reconciliation permanently; change that return to schedule a future
requeue (e.g., ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller
will periodically re-check for the operator and eventually create the
FlowCollector. Update the branch handling DeadlineExceeded in the Reconcile
logic (the block around waitForNetObservOperator and r.status.SetDegraded) to
return a non-zero RequeueAfter (choose a sensible poll interval constant or
reuse checkTimeout) instead of 0 and keep returning nil error.
---
Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1507-1572: TestReconcile_FirstTimeDeploymentSetsCondition writes
to a fixed manifests path (FlowCollectorYAML) which can leak or conflict; fix by
creating a temporary directory with os.MkdirTemp at test start, write the
FlowCollector manifest into that temp dir (use os.Create or os.WriteFile), set
or inject the FlowCollectorYAML path used by the reconciler to point to the temp
file (or refactor the code to accept a manifestPath parameter) before calling
Reconcile, and register defer os.RemoveAll(tempDir) immediately after creating
the temp directory to guarantee cleanup even on panic; update references to
FlowCollectorYAML in the test to use the temp file path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b46e56ed-85bd-48c6-8a11-d0576b268eb9
⛔ Files ignored due to path filters (8)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/config/v1/types_network.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (4)
go.modpkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gosample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- go.mod
- sample-config.yaml
|
@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller.go (2)
157-162:⚠️ Potential issue | 🟠 MajorTimeout path disables automatic recovery
Line 161 returns
ctrl.Result{RequeueAfter: 0}, nilon timeout, so reconciliation stops and may never create the FlowCollector even if the operator becomes ready later.Suggested fix
if err := r.waitForNetObservOperator(ctx); err != nil { if err == context.DeadlineExceeded { klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout) r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout)) - return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue + return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil }
183-185:⚠️ Potential issue | 🟠 MajorDeployment-condition update errors are swallowed
At Line 183-Line 185,
markNetworkObservabilityDeployedfailure is only logged. That can cause repeated deploy-path execution on later reconciles instead of retrying the status write immediately.Suggested fix
// Mark as deployed in Network CR status if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil { - klog.Warningf("Failed to update Network Observability deployment status: %v", err) + r.status.SetDegraded(statusmanager.ObservabilityConfig, "MarkDeployedError", fmt.Sprintf("Failed to update Network Observability deployment status: %v", err)) + return ctrl.Result{}, err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller.go` around lines 183 - 185, The call to markNetworkObservabilityDeployed is currently logging failures and swallowing the error, which prevents the reconcile loop from retrying status writes; change the code in observability_controller.go so that if r.markNetworkObservabilityDeployed(ctx, &network) returns an error you propagate that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed: %w", err) or wrap with kerrors) instead of only calling klog.Warningf so the reconciler will requeue and retry the status update; locate the markNetworkObservabilityDeployed call in the reconcile flow and replace the swallowed-log branch with an error return (or requeue result) so failures are retried immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 326-338: The readiness check in waitForNetObservOperator currently
returns after inspecting the first matching csv in csvs.Items which can be
non-deterministic; instead collect all CSVs whose name has the
"netobserv-operator" prefix, determine the intended CSV by selecting the one
with the highest semantic version parsed from the name (e.g., parse the version
suffix from csv.GetName() like "netobserv-operator.vX.Y.Z" and compare using
semver semantics), and then call unstructured.NestedString on that selected
CSV's Object to check status.phase; update the logic around csvs.Items,
csv.GetName(), and unstructured.NestedString to implement this deterministic
selection before returning phase == "Succeeded".
---
Duplicate comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 183-185: The call to markNetworkObservabilityDeployed is currently
logging failures and swallowing the error, which prevents the reconcile loop
from retrying status writes; change the code in observability_controller.go so
that if r.markNetworkObservabilityDeployed(ctx, &network) returns an error you
propagate that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed:
%w", err) or wrap with kerrors) instead of only calling klog.Warningf so the
reconciler will requeue and retry the status update; locate the
markNetworkObservabilityDeployed call in the reconcile flow and replace the
swallowed-log branch with an error return (or requeue result) so failures are
retried immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 63c37a51-4f01-487a-9005-0b35a6728d9f
📒 Files selected for processing (1)
pkg/controller/observability/observability_controller.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller.go (1)
133-146: Namespace creation handles not-found but not already-exists race.The Get-then-Create pattern at lines 136-146 could encounter an
AlreadyExistserror if another process creates the namespace between the Get and Create calls. This would fail the reconciliation, though it will succeed on retry.♻️ Optional: handle AlreadyExists gracefully
if errors.IsNotFound(err) { if err := r.client.Create(ctx, ns); err != nil { + if errors.IsAlreadyExists(err) { + klog.V(4).Infof("Namespace %s was created by another process", OperatorNamespace) + } else { r.status.SetDegraded(statusmanager.ObservabilityConfig, "CreateNamespaceError", fmt.Sprintf("Failed to create namespace %s: %v", OperatorNamespace, err)) return ctrl.Result{}, err + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller.go` around lines 133 - 146, The Get-then-Create namespace flow in Reconcile (observability_controller.go) can race: after r.client.Get succeeds with NotFound, r.client.Create may return AlreadyExists if another actor created the namespace; update the Create error handling in the block that constructs ns (using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do not call r.status.SetDegraded or return error) and only set degraded/return for other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig, ...) for real create failures so reconciliation continues cleanly on races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 133-146: The Get-then-Create namespace flow in Reconcile
(observability_controller.go) can race: after r.client.Get succeeds with
NotFound, r.client.Create may return AlreadyExists if another actor created the
namespace; update the Create error handling in the block that constructs ns
(using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do
not call r.status.SetDegraded or return error) and only set degraded/return for
other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig,
...) for real create failures so reconciliation continues cleanly on races.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c15118e0-7f89-43e5-a75e-3ba878565507
📒 Files selected for processing (1)
pkg/controller/observability/observability_controller.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/controller/observability/observability_controller_test.go (1)
1268-1283:⚠️ Potential issue | 🟡 MinorRace condition in concurrent test remains unaddressed.
Calling
g.Expect()inside goroutines can cause test panics because Gomega's fail handler invokest.FailNow(), which must only be called from the test's main goroutine.🔧 Suggested fix using error channel
// Run 5 concurrent reconciliations done := make(chan bool, 5) + errChan := make(chan error, 5) for i := 0; i < 5; i++ { go func() { _, err := r.Reconcile(context.TODO(), req) - // All should complete without error (idempotent) - g.Expect(err).NotTo(HaveOccurred()) + errChan <- err done <- true }() } // Wait for all to complete for i := 0; i < 5; i++ { <-done + err := <-errChan + g.Expect(err).NotTo(HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 1268 - 1283, The test invokes g.Expect inside goroutines which can call t.FailNow from non-test goroutines; instead collect errors from r.Reconcile in a buffered channel and perform Gomega assertions in the main test goroutine. Replace the done channel with an errChan (buffered to 5), have each goroutine call r.Reconcile(context.TODO(), req) and send the returned error into errChan, then loop 5 times in the main goroutine to receive err := <-errChan and call g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent but ensures assertions (using g.Expect) run only on the main test goroutine.
🧹 Nitpick comments (2)
pkg/controller/observability/observability_controller_test.go (2)
936-938: Misleading test name and comment.The test is named
TestReconcile_FlowCollectorDeletedwith comment "tests that reconciliation recreates FlowCollector if it gets deleted," but the actual assertion verifies that reconciliation skips reinstallation when the deployed condition is set. The test logic is correct per the PR's design (avoid reinstallation once deployed), but the name/comment should reflect this behavior.♻️ Suggested rename for clarity
-// TestReconcile_FlowCollectorDeleted tests that reconciliation recreates -// FlowCollector if it gets deleted -func TestReconcile_FlowCollectorDeleted(t *testing.T) { +// TestReconcile_SkipsReinstallWhenFlowCollectorDeleted tests that reconciliation +// does NOT recreate FlowCollector after deletion if the deployed condition is set +func TestReconcile_SkipsReinstallWhenFlowCollectorDeleted(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 936 - 938, Rename the test function TestReconcile_FlowCollectorDeleted and update its comment to reflect the actual behavior being asserted: that reconciliation skips reinstallation when the FlowCollector has the "deployed" condition set. Specifically, change the function name (e.g., to TestReconcile_SkipsReinstallWhenFlowCollectorDeployed) and update the preceding comment to state "tests that reconciliation skips reinstallation of FlowCollector when the deployed condition is present"; also update any references to the old test name in the file (including test registration or helper calls) so they remain consistent.
1530-1536: Test creates files in working directory without full cleanup.The test creates
manifests/directory and writes a file but only removes the file on cleanup, not the directory. Whilemanifests/likely already exists in the repository, consider usingt.TempDir()for full isolation, especially if tests run in parallel.♻️ Suggested fix using temp directory
- err := os.MkdirAll("manifests", 0755) - g.Expect(err).NotTo(HaveOccurred()) - - // Create the FlowCollector manifest at the expected path - err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644) - g.Expect(err).NotTo(HaveOccurred()) - defer os.Remove(FlowCollectorYAML) + // Use temp directory and override the manifest path for this test + tmpDir := t.TempDir() + manifestPath := filepath.Join(tmpDir, "flowcollector.yaml") + err := os.WriteFile(manifestPath, []byte(flowCollectorManifest), 0644) + g.Expect(err).NotTo(HaveOccurred()) + + // Note: This requires the controller to accept a configurable manifest path + // or use a test-specific approach to override FlowCollectorYAMLAlternatively, if modifying the controller isn't feasible, ensure the
manifestsdirectory cleanup:+ defer os.RemoveAll("manifests")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/observability/observability_controller_test.go` around lines 1530 - 1536, The test creates a real manifests/ directory and only removes the file, leaving the directory behind; update the test to use an isolated temp directory so artifacts are fully cleaned up: create a temp dir via t.TempDir() (or os.MkdirTemp if t is not available), write the FlowCollector manifest into filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the existing flowCollectorManifest, and set the test to use that path (or defer os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests" directory; alternatively ensure the test defers os.RemoveAll("manifests") after writing—reference FlowCollectorYAML and flowCollectorManifest to locate the write site in observability_controller_test.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test invokes g.Expect inside goroutines which can
call t.FailNow from non-test goroutines; instead collect errors from r.Reconcile
in a buffered channel and perform Gomega assertions in the main test goroutine.
Replace the done channel with an errChan (buffered to 5), have each goroutine
call r.Reconcile(context.TODO(), req) and send the returned error into errChan,
then loop 5 times in the main goroutine to receive err := <-errChan and call
g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent
but ensures assertions (using g.Expect) run only on the main test goroutine.
---
Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 936-938: Rename the test function
TestReconcile_FlowCollectorDeleted and update its comment to reflect the actual
behavior being asserted: that reconciliation skips reinstallation when the
FlowCollector has the "deployed" condition set. Specifically, change the
function name (e.g., to TestReconcile_SkipsReinstallWhenFlowCollectorDeployed)
and update the preceding comment to state "tests that reconciliation skips
reinstallation of FlowCollector when the deployed condition is present"; also
update any references to the old test name in the file (including test
registration or helper calls) so they remain consistent.
- Around line 1530-1536: The test creates a real manifests/ directory and only
removes the file, leaving the directory behind; update the test to use an
isolated temp directory so artifacts are fully cleaned up: create a temp dir via
t.TempDir() (or os.MkdirTemp if t is not available), write the FlowCollector
manifest into filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the
existing flowCollectorManifest, and set the test to use that path (or defer
os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests"
directory; alternatively ensure the test defers os.RemoveAll("manifests") after
writing—reference FlowCollectorYAML and flowCollectorManifest to locate the
write site in observability_controller_test.go.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 394e355a-3990-4ada-aaac-e4a5ee489446
📒 Files selected for processing (1)
pkg/controller/observability/observability_controller_test.go
| // Check if Network Observability was previously deployed | ||
| // If so, we're done - no need to check or manage anything | ||
| if r.wasNetworkObservabilityDeployed(&network) { | ||
| klog.Info("Network Observability was previously deployed, skipping reconciliation.") |
There was a problem hiding this comment.
Is there any way to clear this flag so it will install Network Observability again? If not, we just need to make it clear in our documentation.
| err := r.client.Get(ctx, types.NamespacedName{ | ||
| Name: "netobserv-operator", | ||
| Namespace: OperatorNamespace, | ||
| }, subscription) |
There was a problem hiding this comment.
The PoC determined that Network Observability was installed if there is a Deployment with the "app=netobserv-operator" label. With the Service deployment model added, it uses a DaemonSet.
It seems checking for a Deployment or DaemonSet is more reliable than a Subscription, which is OLM-specific, because unofficially, you could install it with Helm. Should we not worry about that? Thoughts?
| Group: "operators.coreos.com", | ||
| Version: "v1alpha1", | ||
| Kind: "ClusterServiceVersion", | ||
| }) |
There was a problem hiding this comment.
Same comment in that CSV is specific to OLM. The alternative is to check for the Deployment or DaemonSet.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
go.mod (1)
165-165:⚠️ Potential issue | 🟠 MajorKeep the temporary
openshift/apifork out of the merge path.This still makes release builds resolve API types from a personal fork. Please keep a hard merge blocker on this line and remove it before merging.
Verify that the replace is still present and that the upstream API PR is not merged yet:
#!/bin/bash set -euo pipefail echo "replace directive in go.mod:" rg -n '^replace github\.com/openshift/api ' go.mod echo echo "openshift/api#2752 state:" curl -fsSL https://api.github.com/repos/openshift/api/pulls/2752 | jq '{state, merged_at, title, html_url}'Expected result: if the replace line is present and
merged_atisnull, this remains a pre-merge blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go.mod` at line 165, The go.mod contains a temporary replace directive referencing a personal fork ("replace github.com/openshift/api v0.0.0-20260302174620-dcac36b908db => github.com/OlivierCazade/api v0.0.0-20260324161534-5d972e0f328a") which must not make it into mainline; remove that replace before merging OR add a strict CI merge-blocker that fails the build if that exact replace line exists (e.g., a simple grep check equivalent to the reviewer’s snippet), and ensure the replace is only present in a local/temporary branch and not in the merged branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 144-148: When r.waitForNetObservOperator(ctx) returns
context.DeadlineExceeded, in addition to requeuing keep the current behavior but
call r.status.SetDegraded with the ObservabilityConfig key (use the existing
"OperatorNotReady" reason) and a clear message including checkTimeout so the CNO
status reflects the timeout; ensure you still return ctrl.Result{RequeueAfter:
requeueAfter}, nil. Reference the r.waitForNetObservOperator function,
r.status.SetDegraded method, the ObservabilityConfig constant/key, the
"OperatorNotReady" reason, and the checkTimeout/requeueAfter variables when
adding the status update.
- Around line 129-140: The current logic short-circuits once any Subscription
exists (isNetObservOperatorInstalled) which stops reapplying the desired OLM
manifest and can leave a drifted/partial install stuck; change reconcile to
always call installNetObservOperator until the operator is fully ready (e.g.,
check CSV/ClusterServiceVersion or subscription status for installed/ready
state) instead of relying on Subscription existence: remove or stop using
isNetObservOperatorInstalled, have installNetObservOperator be idempotent and
reapply the operator manifest each reconcile when readiness checks
(CSV/Subscription conditions) are not satisfied, and keep setting degraded via
r.status.SetDegraded(...) on failures so errors still surface.
---
Duplicate comments:
In `@go.mod`:
- Line 165: The go.mod contains a temporary replace directive referencing a
personal fork ("replace github.com/openshift/api
v0.0.0-20260302174620-dcac36b908db => github.com/OlivierCazade/api
v0.0.0-20260324161534-5d972e0f328a") which must not make it into mainline;
remove that replace before merging OR add a strict CI merge-blocker that fails
the build if that exact replace line exists (e.g., a simple grep check
equivalent to the reviewer’s snippet), and ensure the replace is only present in
a local/temporary branch and not in the merged branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d4dbd2bf-3cd4-41dd-a545-948c1fba572d
⛔ Files ignored due to path filters (7)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/config/v1/types_network.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (9)
go.modmanifests/0000_70_cluster-network-operator_02_rbac_observability.yamlmanifests/07-observability-operator.yamlmanifests/08-flowcollector.yamlpkg/controller/add_networkconfig.gopkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gopkg/controller/statusmanager/status_manager.gosample-config.yaml
✅ Files skipped from review due to trivial changes (4)
- pkg/controller/add_networkconfig.go
- manifests/07-observability-operator.yaml
- manifests/08-flowcollector.yaml
- manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- sample-config.yaml
- pkg/controller/observability/observability_controller_test.go
| installed, err := r.isNetObservOperatorInstalled(ctx) | ||
| if err != nil { | ||
| r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err)) | ||
| return ctrl.Result{}, err | ||
| } | ||
| if !installed { | ||
| // Install Network Observability Operator | ||
| if err := r.installNetObservOperator(ctx); err != nil { | ||
| r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err)) | ||
| return ctrl.Result{}, err | ||
| } | ||
| } |
There was a problem hiding this comment.
Reapply the operator manifest until readiness instead of keying on Subscription existence.
As soon as any Subscription exists, this branch stops asserting the desired OLM spec. A drifted or half-created install can then sit in the wait/requeue path forever because installNetObservOperator is never called again.
Suggested fix
- // First time installation - proceed with operator installation
- installed, err := r.isNetObservOperatorInstalled(ctx)
- if err != nil {
- r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err))
- return ctrl.Result{}, err
- }
- if !installed {
- // Install Network Observability Operator
- if err := r.installNetObservOperator(ctx); err != nil {
- r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err))
- return ctrl.Result{}, err
- }
- }
+ // Re-apply the desired OLM resources until the operator becomes ready.
+ if err := r.installNetObservOperator(ctx); err != nil {
+ r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to apply Network Observability Operator manifest: %v", err))
+ return ctrl.Result{}, err
+ }That also makes isNetObservOperatorInstalled removable afterward.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| installed, err := r.isNetObservOperatorInstalled(ctx) | |
| if err != nil { | |
| r.status.SetDegraded(statusmanager.ObservabilityConfig, "CheckOperatorError", fmt.Sprintf("Failed to check if Network Observability Operator is installed: %v", err)) | |
| return ctrl.Result{}, err | |
| } | |
| if !installed { | |
| // Install Network Observability Operator | |
| if err := r.installNetObservOperator(ctx); err != nil { | |
| r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to install Network Observability Operator: %v", err)) | |
| return ctrl.Result{}, err | |
| } | |
| } | |
| // Re-apply the desired OLM resources until the operator becomes ready. | |
| if err := r.installNetObservOperator(ctx); err != nil { | |
| r.status.SetDegraded(statusmanager.ObservabilityConfig, "InstallOperatorError", fmt.Sprintf("Failed to apply Network Observability Operator manifest: %v", err)) | |
| return ctrl.Result{}, err | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/observability/observability_controller.go` around lines 129 -
140, The current logic short-circuits once any Subscription exists
(isNetObservOperatorInstalled) which stops reapplying the desired OLM manifest
and can leave a drifted/partial install stuck; change reconcile to always call
installNetObservOperator until the operator is fully ready (e.g., check
CSV/ClusterServiceVersion or subscription status for installed/ready state)
instead of relying on Subscription existence: remove or stop using
isNetObservOperatorInstalled, have installNetObservOperator be idempotent and
reapply the operator manifest each reconcile when readiness checks
(CSV/Subscription conditions) are not satisfied, and keep setting degraded via
r.status.SetDegraded(...) on failures so errors still surface.
|
@OlivierCazade PLMK when the PR is ready to be reviewed. |
67701bb to
66bf264
Compare
4b236f1 to
477a479
Compare
Sure. I'll post my comments in one or two days. |
|
/retest-required |
arkadeepsen
left a comment
There was a problem hiding this comment.
Where are the e2e tests for this feature going to be added?
| # Valid values: "", "InstallAndEnable", "DoNotInstall" | ||
| # Default (empty or omitted): enabled on multi-node clusters, disabled on SNO | ||
| # "InstallAndEnable": explicitly enable (even on SNO) | ||
| # "DoNotInstall": explicitly disable |
There was a problem hiding this comment.
The text should be updated here: DoNotInstall -> NoAction.
| @@ -0,0 +1,42 @@ | |||
| apiVersion: rbac.authorization.k8s.io/v1 | |||
There was a problem hiding this comment.
The existing RBAC for CNO provide cluster-admin permissions to it. I don't think this file is needed as CNO already has the required permissions.
| // StatusReporter is an interface for reporting status | ||
| type StatusReporter interface { | ||
| SetDegraded(level statusmanager.StatusLevel, reason, message string) | ||
| SetNotDegraded(level statusmanager.StatusLevel) | ||
| } |
There was a problem hiding this comment.
This is not used to update status. Let's remove it.
|
|
||
| type ReconcileObservability struct { | ||
| client crclient.Client | ||
| status StatusReporter |
There was a problem hiding this comment.
Let's remove this one as well.
| err := r.client.Get(ctx, types.NamespacedName{ | ||
| Name: "flowcollectors.flows.netobserv.io", | ||
| }, crd) |
There was a problem hiding this comment.
Shouldn't we also check if there's a CR available, along with checking whether the CRD exists, to be more sure that the operator is installed or not?
There was a problem hiding this comment.
This is what the function is doing, it does a GET query with the flowcollector name on the CRD.
There was a problem hiding this comment.
I meant flowcollector object (custom resource), not the CRD. Having the CRD installed will not ensure that the operator is actually installed. Ideally, the check should be NOO is installed and the flowcollector object is created.
There was a problem hiding this comment.
There is currently multiple way for NOO to be installed (OLM v0 or v1, and helm chart for upstream), the common point between all of them is the CRD. Checking more precisely would introduce a lot of complexity because we would have to check for the different cases and IMO this is a bit unnecessary because unless the user has been hacking a lot, both always come together.
The suggestion actually come from the OLM channel, we asked for a way to check for both OLMv0 and OLMv1 installation and they suggested using the CRD.
There was a problem hiding this comment.
I am not sure if helm chart is officially supported in downstream.
For the others cases, I think we should do the following:
- Check for OLMv0 resources. If they exist then don't do anything.
- If OLMv0 resources don't exist, then check for OLMv1 resources. If they exist then don't do anything. If they don't exist then install everything.
Now, the user can also delete the flowcollector object created by CNO. So, in the second scenario, the flowcollector resource existence should also be checked and if it doesn't exist, then it should also be created again.
Existence of CRD only doesn't guarantee that all the required resources for NOO are installed for it to function correctly.
There was a problem hiding this comment.
@arkadeepsen are you fine with the last changes regarding this thread ?
There was a problem hiding this comment.
I was on PTO till yesterday. I'll take a look at the latest changes and let you know.
There was a problem hiding this comment.
I checked with chai-bot regarding the various scenarios: https://redhat-internal.slack.com/archives/C3VS0LV41/p1782466535969549
Following is my understanding:
- If operator is installed via OLMv0, then the resources installed during installation of the operator is not managed, i.e. the CRD if deleted will not be re-installed. Thus, for the case of OLMv0, the CRD may not exist, but the operator can still be running.
- If operator is installed via OLMv1, then the resources are managed and if deleted will be re-installed. Thus, if the ClusterExtension object is installed, CRD will always be installed.
- The user can delete the ClusterExtension object in the case of OLMv1 using
--cascade=orphanwhich will not delete the CRD. For the case of OLMv0, when CSV and Subscription are deleted, the operator is removed, but the CRD is not removed. Currently if CRD exists and OLM resources do not exist, operator is not installed.
The current flow will try to install the operator if it detects that the CRD is not installed, but the operator may be installed using OLMv0 and CNO will install another version of NOO via OLMv1. A check for this scenario should be added.
I think we also need to document the pitfalls of the different install methods (OLMv0 and OLMv1) as this does deviate from what the API documents:
// NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"There was a problem hiding this comment.
I changed a bit the function to handle the case of an already installed NOO with a deleted CRD :
Updated behavior:
- Always checks both OLMv1 and OLMv0
- If CRD is missing:
- OLMv1 present → Returns error (operator deployed via OLMv1 but CRD manually removed). This case if unlikely to happens because it would only happens if the reconciliation happens while OLMv1 is still recreating the CRD or if OLMv1 failed to recreate it.
- OLMv0 present → Returns error (operator deployed via OLMv0 but CRD manually removed)
- Neither present → Returns false (operator not installed)
- If CRD exists:
- OLMv1 present → Returns true
- OLMv0 present → Returns true
- Neither present → Returns error (CRD exists but no OLM installation found)
- If CRD is missing:
There was a problem hiding this comment.
About documenting this cases, I created a PR on the API side :
openshift/api#2907
| // Mark as deployed to track deployment status | ||
| if err := r.markNetworkObservabilityDeployed(ctx); err != nil { | ||
| klog.Warningf("Failed to mark Network Observability as deployed: %v. Will retry in %v.", err, requeueAfterStandard) | ||
| return ctrl.Result{RequeueAfter: requeueAfterStandard}, nil | ||
| } |
There was a problem hiding this comment.
We should mark the condition in both scenarios, not only in the case of successful install. This will be easier for users to track the staus of the operator deployment.
There was a problem hiding this comment.
Done, when the deployment is failing, the condition is updated with the cause of the failure
|
About the e2e tests, I did not find any e2e tests in this repository, how are you usually implementing them ? |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller.go (1)
112-117: 💤 Low valueConsider logging if status condition update fails.
Discarding the error from
setNetworkObservabilityConditionat lines 115, 125, 129, and 147 is acceptable since the primary error is already being handled with a requeue. However, adding a warning log when the condition update fails would improve observability without changing the control flow.if err := r.installNetObservOperator(ctx); err != nil { klog.Warningf("Failed to install Network Observability Operator: %v. Will retry in %v.", err, requeueAfterOLM) // Mark deployment as failed - _ = r.setNetworkObservabilityCondition(ctx, operatorv1.ConditionFalse, "DeploymentFailed", fmt.Sprintf("Failed to install Network Observability Operator: %v", err)) + if condErr := r.setNetworkObservabilityCondition(ctx, operatorv1.ConditionFalse, "DeploymentFailed", fmt.Sprintf("Failed to install Network Observability Operator: %v", err)); condErr != nil { + klog.V(4).Infof("Failed to update deployment status condition: %v", condErr) + } return ctrl.Result{RequeueAfter: requeueAfterOLM}, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/observability/observability_controller.go` around lines 112 - 117, The call to setNetworkObservabilityCondition in the error paths (used alongside installNetObservOperator) currently discards its returned error; update those sites to capture the error from setNetworkObservabilityCondition and emit a klog.Warningf (or similar) noting that the condition update failed and include the error, but do not change the existing control flow or requeue behavior — just log the failure to improve observability.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 112-117: The call to setNetworkObservabilityCondition in the error
paths (used alongside installNetObservOperator) currently discards its returned
error; update those sites to capture the error from
setNetworkObservabilityCondition and emit a klog.Warningf (or similar) noting
that the condition update failed and include the error, but do not change the
existing control flow or requeue behavior — just log the failure to improve
observability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 17a029ad-6c04-4afc-8147-c652211b825f
⛔ Files ignored due to path filters (100)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/.golangci.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_apiserver.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_authentication.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_cluster_operator.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_dns.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_image.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_infrastructure.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_kmsencryption.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_network.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/console/v1/types_console_plugin.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/console/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/envtest-releases.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/install.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/Makefileis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/doc.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/register.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/types_pacemakercluster.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1alpha1/types_pacemakercluster.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/features.mdis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/features/legacyfeaturegates.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/install.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machine/v1beta1/types_machineset.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/machineconfiguration/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/types_machineconfignode.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1/types_csi_cluster_driver.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/types_ingress.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-CustomNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-CustomNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1alpha1/types_clusterapi.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/quota/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/quota/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/quota/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/route/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/route/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/route/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/security/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/security/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/security/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/security/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gitignoreis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.golangci.ymlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yamlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/FAQ.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/Makefileis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASESis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/README.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/RELEASE.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/VERSIONING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/alias.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/code-of-conduct.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (7)
bindata/observability/07-observability-operator.yamlbindata/observability/08-flowcollector.yamlgo.modpkg/controller/add_networkconfig.gopkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gosample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
- sample-config.yaml
- bindata/observability/08-flowcollector.yaml
- pkg/controller/add_networkconfig.go
- go.mod
The tests are added to https://github.com/openshift/origin repo. Currently, this is a TP feature. You can refer to https://docs.google.com/document/d/1zOL38_KDKwAvsx-LMHfyDdX4-jq3HQU1YWBjfuHaM0Q/edit?usp=drivesdk for details regarding feature gates. I think it'll also have details about test coverage. It also has details about lifting the feature gates and making features available by default. |
|
@arkadeepsen, outside of the e2e, does this PR is fine for you now ? |
I think this comment is still not resolved: #2925 (comment) The description of // NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"I am not sure all the scenarios are handled for this mode. The above comment is related to how CNO will detect that the user has removed the resources managed by it. |
|
Sorry, answering in the comment thread. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 426-447: The ReconcileObservability struct in these tests is
created with featureGate set to nil, causing Reconcile() to exit early at the
feature-gate check before reaching the behavior the tests are designed to
validate. To fix this, initialize the featureGate field in the
ReconcileObservability struct with a properly configured feature gate that
enables the observability feature (rather than leaving it nil). This pattern
should be applied to the ReconcileObservability initialization across all the
affected test functions: TestReconcile_SkipsWhenDisabled at lines 426-447, the
tests at lines 492-532, 970-1001, 1045-1084, 1124-1235, and 1556-1585 so that
Reconcile() progresses past the feature-gate guard and actually exercises the
behavior being asserted.
In `@pkg/controller/observability/observability_controller.go`:
- Around line 105-109: When isNetObservOperatorInstalled returns an error, the
controller currently only logs a warning and requeues without updating the
Network CR status. To surface this error to users, set the
NetworkObservabilityDeployed condition to False with the error message included
whenever isNetObservOperatorInstalled fails. This will make persistent
installation issues (like ClusterExtension failures or inability to identify the
installation source) visible in the Network CR status, allowing users to
understand why deployment is stalled rather than only seeing a requeue loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1e6771cd-b873-4c39-89bf-5237154cf733
⛔ Files ignored due to path filters (100)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/.golangci.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_apiserver.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_authentication.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_cluster_operator.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_dns.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_image.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_infrastructure.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_kmsencryption.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/types_network.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/console/v1/types_console_plugin.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/console/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/envtest-releases.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/install.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/Makefileis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/doc.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/register.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/types_pacemakercluster.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/etcd/v1alpha1/types_pacemakercluster.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/etcd/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/features.mdis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/features/legacyfeaturegates.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/install.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machine/v1beta1/types_machineset.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/machineconfiguration/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/types_machineconfignode.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1/types_csi_cluster_driver.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/types_ingress.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-CustomNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_csi-driver_01_clustercsidrivers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-CustomNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-Default.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-DevPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-OKD.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers-TechPreviewNoUpgrade.crd.yamlis excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1alpha1/types_clusterapi.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/operator/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/quota/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/quota/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/quota/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/route/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/route/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/route/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/security/v1/generated.protois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/security/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/security/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/api/security/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gitignoreis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.golangci.ymlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yamlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/FAQ.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/Makefileis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASESis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/README.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/RELEASE.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/VERSIONING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/alias.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/code-of-conduct.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (7)
bindata/observability/07-observability-operator.yamlbindata/observability/08-flowcollector.yamlgo.modpkg/controller/add_networkconfig.gopkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gosample-config.yaml
✅ Files skipped from review due to trivial changes (1)
- sample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- go.mod
- pkg/controller/add_networkconfig.go
- bindata/observability/08-flowcollector.yaml
|
/lgtm @arkadeepsen Are there any outstanding issues? |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: OlivierCazade, stleerh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
New changes are detected. LGTM label has been removed. |
|
Rebased the branch due to conflicts (no need to update openshift/api anymore it was updated on the main branch) |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
bindata/observability/07-observability-operator.yaml (1)
38-44: 🔒 Security & Privacy | 🔵 TrivialDrop the webhook RBAC after install
create/list/watchonvalidatingwebhookconfigurationsis cluster-wide; if this SA is only needed for OLM v1 pre-authorization, remove or narrow the binding once installation finishes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bindata/observability/07-observability-operator.yaml` around lines 38 - 44, The RBAC on validatingwebhookconfigurations is broader than needed after installation, so narrow or remove the create/list/watch permissions once the OLM v1 pre-authorization flow is done. Update the relevant binding in observability/07-observability-operator.yaml so the service account used for install-time webhook setup no longer retains cluster-wide webhook management access, while keeping only the specific permissions needed by the install path.Sources: Path instructions, Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bindata/observability/07-observability-operator.yaml`:
- Around line 1-4: The openshift-netobserv-operator Namespace is missing the
usual network isolation resource. Add a NetworkPolicy for this namespace,
following the pattern used by the other operator namespace manifests in this
repo, so the operator namespace is explicitly isolated for ingress and egress;
place it alongside the existing Namespace manifest and ensure it targets
openshift-netobserv-operator.
---
Nitpick comments:
In `@bindata/observability/07-observability-operator.yaml`:
- Around line 38-44: The RBAC on validatingwebhookconfigurations is broader than
needed after installation, so narrow or remove the create/list/watch permissions
once the OLM v1 pre-authorization flow is done. Update the relevant binding in
observability/07-observability-operator.yaml so the service account used for
install-time webhook setup no longer retains cluster-wide webhook management
access, while keeping only the specific permissions needed by the install path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1e3de99a-274e-4c6c-8024-88ca299bd4a0
⛔ Files ignored due to path filters (28)
vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gitignoreis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.golangci.ymlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yamlis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/FAQ.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/Makefileis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASESis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/README.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/RELEASE.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTSis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/VERSIONING.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/alias.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/code-of-conduct.mdis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (6)
bindata/observability/07-observability-operator.yamlbindata/observability/08-flowcollector.yamlpkg/controller/add_networkconfig.gopkg/controller/observability/observability_controller.gopkg/controller/observability/observability_controller_test.gosample-config.yaml
✅ Files skipped from review due to trivial changes (2)
- bindata/observability/08-flowcollector.yaml
- pkg/controller/add_networkconfig.go
🚧 Files skipped from review as they are similar to previous changes (3)
- sample-config.yaml
- pkg/controller/observability/observability_controller_test.go
- pkg/controller/observability/observability_controller.go
| apiVersion: v1 | ||
| kind: Namespace | ||
| metadata: | ||
| name: openshift-netobserv-operator |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether a NetworkPolicy exists for the new namespace.
# Expected: at least one NetworkPolicy targeting openshift-netobserv-operator, or an intentional documented exception.
rg -n -C3 'kind:\s*NetworkPolicy|namespace:\s*openshift-netobserv-operator' --glob '*.yaml' --glob '*.yml'Repository: openshift/cluster-network-operator
Length of output: 172
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== files mentioning openshift-netobserv-operator =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'openshift-netobserv-operator' .
echo
echo "== NetworkPolicy manifests =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'kind:\s*NetworkPolicy' .
echo
echo "== target file excerpt =="
sed -n '1,80p' bindata/observability/07-observability-operator.yamlRepository: openshift/cluster-network-operator
Length of output: 5247
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== files mentioning openshift-netobserv-operator =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'openshift-netobserv-operator' .
echo
echo "== NetworkPolicy manifests =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' --glob '!**/.git/**' 'kind:\s*NetworkPolicy' .
echo
echo "== target file excerpt =="
sed -n '1,80p' bindata/observability/07-observability-operator.yamlRepository: openshift/cluster-network-operator
Length of output: 5247
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== manifests/0000_70_cluster-network-operator_00_namespace.yaml =="
sed -n '1,120p' manifests/0000_70_cluster-network-operator_00_namespace.yaml
echo
echo "== NetworkPolicy objects in bindata/observability/07-observability-operator.yaml =="
rg -n -C4 'kind:\s*NetworkPolicy|apiVersion:\s*networking.k8s.io/v1|metadata:|namespace:' bindata/observability/07-observability-operator.yaml
echo
echo "== Namespace-scoped policy refs across repo =="
rg -n --hidden --glob '*.yaml' --glob '*.yml' 'namespace:\s*openshift-netobserv-operator|name:\s*openshift-netobserv-operator' .Repository: openshift/cluster-network-operator
Length of output: 3995
Define a NetworkPolicy for openshift-netobserv-operator. The namespace is created without any ingress/egress isolation, unlike other operator namespaces in this repo.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bindata/observability/07-observability-operator.yaml` around lines 1 - 4, The
openshift-netobserv-operator Namespace is missing the usual network isolation
resource. Add a NetworkPolicy for this namespace, following the pattern used by
the other operator namespace manifests in this repo, so the operator namespace
is explicitly isolated for ingress and egress; place it alongside the existing
Namespace manifest and ensure it targets openshift-netobserv-operator.
Source: Path instructions
There was a problem hiding this comment.
This is actually correct suggestion and we need netpols added for the netobserv namespace.
There was a problem hiding this comment.
@arkadeepsen Agreed — this one still stands. Adding NetworkPolicy resources for openshift-netobserv-operator alongside bindata/observability/07-observability-operator.yaml would be the right fix to keep the new operator namespace explicitly isolated for ingress and egress.
✏️ Learnings added
Learnt from: arkadeepsen
Repo: openshift/cluster-network-operator PR: 2925
File: bindata/observability/07-observability-operator.yaml:1-4
Timestamp: 2026-06-26T14:57:46.466Z
Learning: In the OpenShift Cluster Network Operator repository, when a new operator namespace is introduced in Kubernetes manifests (such as `bindata/observability/07-observability-operator.yaml` for `openshift-netobserv-operator`), the namespace is expected to include NetworkPolicy resources to explicitly isolate ingress and egress traffic, following the existing operator namespace pattern in the repo.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
We are working on adding the network policy directly inside the bundle. We do not provide for now a netpol for the operator.
The network policy for the Network Observability stack are deployed by the operator.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| networkObservability: | ||
| # installationPolicy controls Network Observability installation during cluster deployment (day-0). | ||
| # Valid values: "", "InstallAndEnable", "NoAction" | ||
| # Default (empty or omitted): enabled on multi-node clusters, disabled on SNO | ||
| # "InstallAndEnable": explicitly enable (even on SNO) | ||
| # "NoAction": explicitly disable | ||
| installationPolicy: InstallAndEnable |
There was a problem hiding this comment.
This section should be added in sample-cluster-network-config.yaml rather than in this file as the field was added to config.openshift.io/v1 rather than operator.openshift.io/v1.
| err := r.client.Get(ctx, types.NamespacedName{ | ||
| Name: "flowcollectors.flows.netobserv.io", | ||
| }, crd) |
There was a problem hiding this comment.
I checked with chai-bot regarding the various scenarios: https://redhat-internal.slack.com/archives/C3VS0LV41/p1782466535969549
Following is my understanding:
- If operator is installed via OLMv0, then the resources installed during installation of the operator is not managed, i.e. the CRD if deleted will not be re-installed. Thus, for the case of OLMv0, the CRD may not exist, but the operator can still be running.
- If operator is installed via OLMv1, then the resources are managed and if deleted will be re-installed. Thus, if the ClusterExtension object is installed, CRD will always be installed.
- The user can delete the ClusterExtension object in the case of OLMv1 using
--cascade=orphanwhich will not delete the CRD. For the case of OLMv0, when CSV and Subscription are deleted, the operator is removed, but the CRD is not removed. Currently if CRD exists and OLM resources do not exist, operator is not installed.
The current flow will try to install the operator if it detects that the CRD is not installed, but the operator may be installed using OLMv0 and CNO will install another version of NOO via OLMv1. A check for this scenario should be added.
I think we also need to document the pitfalls of the different install methods (OLMv0 and OLMv1) as this does deviate from what the API documents:
// NetworkObservabilityInstallAndEnable means that network observability should be installed and enabled during cluster deployment
// Since this was explicitly set to install, if the user remove NetworkObservability, it will be installed again unless the value of InstallationPolicy is changed
NetworkObservabilityInstallAndEnable NetworkObservabilityInstallationPolicy = "InstallAndEnable"|
|
||
| installed, err := r.isNetObservOperatorInstalled(context.TODO()) | ||
|
|
||
| // Should return no error but false (not installed yet) |
| apiVersion: v1 | ||
| kind: Namespace | ||
| metadata: | ||
| name: openshift-netobserv-operator |
There was a problem hiding this comment.
This is actually correct suggestion and we need netpols added for the netobserv namespace.
| // Use RawPatch with ApplyPatchType to avoid deprecated crclient.Apply | ||
| patch := crclient.RawPatch(types.ApplyPatchType, data) | ||
| if err := r.client.Patch(ctx, obj, patch, &crclient.PatchOptions{ | ||
| Force: ptr.To(true), |
There was a problem hiding this comment.
Since Force is set to true, it'll overwrite any changes done by the user. Who owns the objects? If CNO does, then this is fine, however if the user owns the objects then this will not respect user changes.
There was a problem hiding this comment.
The object is owned by the user but created by the CNO.
I removed the force field to prevent overriding user object.
c793431 to
f2e4ea2
Compare
This commit introduces automated installation and management of the Network Observability Operator during cluster deployment (day-0). Key features: - Automatic operator installation via OLM when NetworkObservabilityInstall feature gate is enabled - Opt-out model: installs by default except on SNO clusters Implementation details: - New observability controller in pkg/controller/observability/ - Manifest-based operator installation - Default FlowCollector configuration - RBAC permissions for OLM resource management - namespace creation for operator and observability components
|
@OlivierCazade: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add Network Observability Day-0 Installation Support
The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged
This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new
installNetworkObservabilityfield in the Network CR.Summary
The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.
Key Features
Opt-out Model with SNO Detection
spec.installNetworkObservabilityDeployment Tracking
NetworkObservabilityDeployedcondition is setStatus Reporting
ObservabilityConfigstatus level for degraded state reportingConfiguration
The
spec.installNetworkObservabilityfield accepts three values:""(empty/nil): Default behavior - install on multi-node clusters, skip on SNO"Enable": Explicitly enable installation, even on SNO clusters"Disable": Explicitly disable installationRBAC Changes
Added permissions for the CNO to manage:
Related
/cc @stleerh
Summary by CodeRabbit
New Features
Bug Fixes
Documentation