Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 166 additions & 1 deletion .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,175 @@ reviews:
kube-proxy, network-diagnostics, egress-router, cloud-network-config-controller,
network-metrics, node-identity, FRR-K8s, iptables-alerter, MTU-prober,
container-networking-plugins, bond CNI, whereabouts CNI, route-override CNI,
multi-network-policy, and the networking console plugin.
multi-network-policy, EVPN, and the networking console plugin.
- Focus on major issues impacting performance, readability, maintainability and security.
- Consider backward compatibility, upgrade safety, and multi-platform support.
- Avoid nitpicks and avoid verbosity.
- In HyperShift, CNO is deployed by openshift/hypershift (not CVO). If changes
touch `manifests/` or the CNO deployment, check whether corresponding changes
are needed in the openshift/hypershift repo.
- Dual-stack (IPv4+IPv6) must always be considered. Flag changes that only
handle single-stack addresses (e.g. missing net.JoinHostPort for IPv6).
- Two CRDs drive CNO configuration:
`network.config.openshift.io/cluster` is the source of truth set by the
installer. `network.operator.openshift.io/cluster` is the operational
config CNO actually renders from. On each reconcile, CNO copies
`clusterNetwork`, `serviceNetwork`, and `networkType` from config → operator,
unconditionally overwriting whatever is in the operator CR for those fields.
Fields that exist only on the operator CR (e.g. `defaultNetwork.ovnKubernetesConfig`,
`ManagementState`, `disableNetworkDiagnostics`) are NOT touched by this merge
and are the user/admin's to control. Setting a config-owned field directly on
the operator CR has no effect — it gets overwritten on next reconcile.
`ManagementState` and `disableNetworkDiagnostics` need special SSA merge
protection (`pkg/apply/merge.go`) because they are non-pointer fields whose
zero values would otherwise clobber user-set values.

# -------------------------------------------------------------------------
# CVO-managed manifests (applied before CNO starts)
# -------------------------------------------------------------------------
- path: 'manifests/**'
instructions: |
Files in `manifests/` are applied by the Cluster Version Operator (CVO) before
CNO starts. They define the operator's own namespace, deployment, RBAC, CRDs,
and monitoring resources.
- Every resource MUST have the correct `include.release.openshift.io/*`
annotations (`self-managed-high-availability`, `single-node-developer`,
and `ibm-cloud-managed` where applicable). Missing annotations cause the
resource to be skipped on certain profiles.
- Do NOT put NetworkPolicy or resources that depend on CNO being running
into `manifests/`. CVO applies these before CNO is ready, which can cause

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite correct; we deploy the CNO default-deny NetworkPolicy from manifests/.

A better rule would be that you need to be very careful when deploying things to the openshift-network-operator namespace from bindata/, or to anything except the openshift-network-operator namespace from manifests/

connectivity gaps during upgrades. Resources that must be applied
atomically with CNO-managed ones belong in `bindata/`.
- The CNO deployment manifest (`03_deployment.yaml`) has a companion
`03_deployment-ibm-cloud-managed.yaml` variant. Changes to one may
require changes to the other.

# -------------------------------------------------------------------------
# CNO-managed bindata (rendered and applied by the operator at runtime)
# -------------------------------------------------------------------------
- path: 'bindata/**'
instructions: |
Files in `bindata/` are Go-template YAML manifests rendered and applied by CNO
at runtime. They define the actual networking components (DaemonSets, Deployments,
ConfigMaps, RBAC, etc.).

**Critical annotations CNO acts upon — verify correctness when adding/modifying resources:**
- `release.openshift.io/version: "{{.ReleaseVersion}}"` — MUST be present on
every Deployment, DaemonSet, and StatefulSet. CNO uses this to track upgrade
progress. Missing annotation = operator never reports upgrade complete.
- `network.operator.openshift.io/cluster-name` — specifies which cluster a
resource belongs to in HyperShift (management vs guest). CNO uses this to
select the correct API client for applying. Wrong value = resource applied
to wrong cluster. Must be set correctly on all HyperShift-aware resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether coderabbit will understand what "all HyperShift-aware resources" means, but I don't.

- `networkoperator.openshift.io/create-only` — resource is created but never
updated. Removing this changes apply semantics.
- `network.operator.openshift.io/copy-from` — resource content is copied from
another object (format: `ClusterName/Namespace/Name`). Verify source exists.
- `networkoperator.openshift.io/non-critical` — marks non-critical resources
that don't block upgrades.
Comment on lines +82 to +83

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `networkoperator.openshift.io/non-critical` — marks non-critical resources
that don't block upgrades.
- `networkoperator.openshift.io/non-critical` — marks resources that should
not block the operator from becoming `Available` during cluster install,
because they depend on other operators.


**Namespace manifests:**
- Namespaces running DaemonSets MUST have `openshift.io/node-selector: ""`
annotation to override any default node selector. Missing this causes pods
to be unschedulable when cluster-wide defaultNodeSelector is set.

**General:**
- When a field is added to containers in a DaemonSet/Deployment (e.g.
`terminationMessagePolicy`, resource limits), verify ALL containers
including init containers are covered — partial coverage is a recurring bug.

# -------------------------------------------------------------------------
# OVN-Kubernetes: managed (HyperShift) vs self-hosted alignment
# -------------------------------------------------------------------------
- path: 'bindata/network/ovn-kubernetes/**'
instructions: |
OVN-Kubernetes manifests are split into three directories:
- `common/` — shared across both HyperShift (managed) and standalone (self-hosted)
- `managed/` — HyperShift control plane (runs in the management cluster)
- `self-hosted/` — standalone OCP control plane (runs on cluster nodes)

**Managed ↔ Self-hosted alignment (HIGH PRIORITY):**
- `managed/` and `self-hosted/` contain parallel files for the same components
(ovnkube-control-plane, ovnkube-node, config, monitoring).
- When changing a file in one, ALWAYS check if the same change is needed in
the other. This is the #1 source of HyperShift drift bugs. Examples:
masquerade subnet config, interconnect flags, IPv6 URL formatting.
Some settings are HyperShift-only (SOCKS proxy, different cert paths,
`{{.OVNControlPlaneImage}}`), but most feature flags must appear in both.

**OVN-K rollout behavior:**
- CNO rolls out OVN-K changes in a specific order: control-plane first, then
nodes (only after control-plane rollout completes). The pre-puller DaemonSet
runs before node rollout to pull images.
- `release.openshift.io/version` on DaemonSets/Deployments drives rollout
progress tracking. `networkoperator.openshift.io/rollout-hung` is set when
a rollout appears stuck.
- Rollout/progress checks must handle zero-node and zero-replica edge cases
(HyperShift clusters can have zero workers during provisioning).

**What triggers OVN-K pod restarts (and what does not):**
- The `ovnkube-script-lib` ConfigMap (`008-script-lib.yaml`) is hashed into a
pod template annotation (`OVNKubeConfigHash`). Changes to this ConfigMap
trigger ovnkube-node restarts.
- Separate hashes exist for node-only config (`OVNKubeNodeConfigHash` — tracks
`outboundSNAT`) and control-plane-only config (`OVNKubeControlPlaneConfigHash`
— tracks BGP `asNumber`/`topology`), so changes to one don't restart the other.
- Annotations set via `setOVNObjectAnnotation` (`ip-family-mode`,
`cluster-network-cidr`, `hybrid-overlay-status`) are placed on both the
object AND the pod template, so changes trigger a rollout.
- Other ConfigMaps (e.g. `004-config.yaml`) are NOT hashed — changes to them
are only picked up on next pod restart. If a new config option is added to a
non-hashed ConfigMap and must take effect immediately, either add it to the
hash computation or use a different restart mechanism.

**CLI flags:**
- After upstream OVN-K syncs or rebases, audit for duplicated or obsolete

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no "upstream OVN-K rebases" in this repo

CLI flags in the container command/args. Duplicated flags are a recurring issue.

# -------------------------------------------------------------------------
# Other components with managed/self-hosted variants
# -------------------------------------------------------------------------
- path: 'bindata/network/node-identity/**'
instructions: |
Node identity controller — has `managed/` and `self-hosted/` variants
like OVN-K. Ensure both variants stay aligned when making changes.

# -------------------------------------------------------------------------
# Go source: rendering, apply, and operator logic
# -------------------------------------------------------------------------
- path: 'pkg/network/*.go'
instructions: |
Network component rendering functions. Each file (multus.go, kube_proxy.go,
ovn_kubernetes.go, etc.) renders bindata templates for its component.
- Every render function must set `data.Data["ReleaseVersion"]` from
`os.Getenv("RELEASE_VERSION")`. Missing this = the component's workloads
won't have the version annotation and upgrades stall.
- `daemonSetProgressing()` / `deploymentProgressing()` in ovn_kubernetes.go
drive upgrade status. Must handle zero DesiredNumberScheduled, rollout-hung
annotation, generation mismatch. Bugs here block cluster upgrades.

- path: 'pkg/apply/**'
instructions: |
CNO uses Server-Side Apply exclusively (`Force: true`). Changes here
affect ALL components.
- Fields NOT included in the applied object are released (not deleted).
To actually remove a field from the live object, explicit deletion
logic is needed.
- SSA does NOT handle non-pointer boolean fields well — `false` is
indistinguishable from "not set" in JSON serialization. This is why
`merge.go` has special-case merge functions (e.g. for
`disableNetworkDiagnostics`, `managementState`). If adding a new
user-settable field without `omitempty` or that is not a pointer,
a merge function is likely needed.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should just not add any more such fields. If coderabbit sees us trying to work around this problem in a new API, it should tell us that the API is wrong and we need to fix it.


- path: 'pkg/hypershift/**'
instructions: |
HyperShift integration layer.
- Uses a locally-defined subset of the HostedControlPlane API to avoid
importing the full openshift/hypershift dependency.
- Environment variables (`HYPERSHIFT`, `HOSTED_CLUSTER_NAME`, etc.) control
HyperShift mode detection.

# https://docs.coderabbit.ai/configuration/path-instructions#path-filters
path_filters:
- "!vendor/**"
Expand Down