-
Notifications
You must be signed in to change notification settings - Fork 474
(WIP) Network policies POC #5755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: kube-rbac-proxy-crio-deny-all | ||
| namespace: openshift-machine-config-operator | ||
| annotations: | ||
| include.release.openshift.io/ibm-cloud-managed: "true" | ||
| include.release.openshift.io/self-managed-high-availability: "true" | ||
| include.release.openshift.io/single-node-developer: "true" | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: kube-rbac-proxy-crio | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: machine-config-controller-deny-all | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: machine-config-controller | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: machine-config-daemon-deny-all | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: machine-config-daemon | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress | ||
|
Comment on lines
+1
to
+12
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deny-all policy will break machine-config-daemon functionality. Similar to the machine-config-server policy, this defines
Blocking all egress will prevent MCD from functioning. You'll need to add appropriate 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: machine-config-server-deny-all | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: machine-config-server | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress | ||
|
Comment on lines
+1
to
+12
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deny-all policy will block machine-config-server traffic. This NetworkPolicy defines For a functional policy, you'll need to add Example structure with ingress rules spec:
podSelector:
matchLabels:
k8s-app: machine-config-server
policyTypes:
- Ingress
- Egress
+ ingress:
+ - from:
+ - ipBlock:
+ cidr: <node-cidr> # Or appropriate source selector
+ ports:
+ - protocol: TCP
+ port: 22623🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -653,7 +653,13 @@ func (optr *Operator) syncRenderConfig(_ *renderConfig, _ *configv1.ClusterOpera | |
| optr.setOperatorLogLevel(mcop.Spec.OperatorLogLevel) | ||
| } | ||
|
|
||
| optr.renderConfig = getRenderConfig(optr.namespace, string(kubeAPIServerServingCABytes), spec, &imgs.RenderConfigImages, infra, pointerConfigData, apiServer, fmt.Sprintf("%d", optr.logLevel)) | ||
| // Get the Cluster Network CIDR for the MCC's allow NetworkPolicy | ||
| clusterNetworkCIDR, err := optr.getClusterNetworkCIDR() | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| optr.renderConfig = getRenderConfig(optr.namespace, string(kubeAPIServerServingCABytes), spec, &imgs.RenderConfigImages, infra, pointerConfigData, apiServer, fmt.Sprintf("%d", optr.logLevel), clusterNetworkCIDR) | ||
|
Comment on lines
+656
to
+662
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Network CIDR fetch failure will block entire operator sync. If the cluster Network resource doesn't have Consider adding graceful handling similar to how 🛡️ Proposed fix for graceful degradation // Get the Cluster Network CIDR for the MCC's allow NetworkPolicy
clusterNetworkCIDR, err := optr.getClusterNetworkCIDR()
if err != nil {
- return err
+ // During early cluster bringup, the network status may not be populated yet.
+ // Log a warning but continue with empty CIDR; network policies will be applied
+ // on subsequent syncs once the network is ready.
+ if optr.inClusterBringup {
+ klog.Warningf("Could not get cluster network CIDR during bringup: %v", err)
+ clusterNetworkCIDR = ""
+ } else {
+ return err
+ }
}Alternatively, if network policies are critical, consider moving 🤖 Prompt for AI Agents |
||
|
|
||
| return nil | ||
| } | ||
|
|
@@ -1994,6 +2000,21 @@ func (optr *Operator) getOsImageURLs(namespace, osImageStreamName string) (strin | |
| return cfg.BaseOSContainerImage, cfg.BaseOSExtensionsContainerImage, nil | ||
| } | ||
|
|
||
| func (optr *Operator) getClusterNetworkCIDR() (string, error) { | ||
| // Fetch the cluster-wide Network configuration | ||
| network, err := optr.networkLister.Get("cluster") | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| // Always use Status as it represents the current reality of the CNI | ||
| if len(network.Status.ClusterNetwork) > 0 { | ||
| return network.Status.ClusterNetwork[0].CIDR, nil | ||
| } | ||
|
|
||
| return "", fmt.Errorf("no cluster network CIDR found in status") | ||
| } | ||
|
|
||
| func (optr *Operator) getCAsFromConfigMap(namespace, name, key string) ([]byte, error) { | ||
| cm, err := optr.clusterCmLister.ConfigMaps(namespace).Get(name) | ||
| if err != nil { | ||
|
|
@@ -2138,7 +2159,7 @@ func setGVK(obj runtime.Object, scheme *runtime.Scheme) error { | |
| return nil | ||
| } | ||
|
|
||
| func getRenderConfig(tnamespace, kubeAPIServerServingCA string, ccSpec *mcfgv1.ControllerConfigSpec, imgs *ctrlcommon.RenderConfigImages, infra *configv1.Infrastructure, pointerConfigData []byte, apiServer *configv1.APIServer, logLevel string) *renderConfig { | ||
| func getRenderConfig(tnamespace, kubeAPIServerServingCA string, ccSpec *mcfgv1.ControllerConfigSpec, imgs *ctrlcommon.RenderConfigImages, infra *configv1.Infrastructure, pointerConfigData []byte, apiServer *configv1.APIServer, logLevel, clusterNetworkCIDR string) *renderConfig { | ||
| tlsMinVersion, tlsCipherSuites := ctrlcommon.GetSecurityProfileCiphersFromAPIServer(apiServer) | ||
| return &renderConfig{ | ||
| TargetNamespace: tnamespace, | ||
|
|
@@ -2153,6 +2174,7 @@ func getRenderConfig(tnamespace, kubeAPIServerServingCA string, ccSpec *mcfgv1.C | |
| TLSMinVersion: tlsMinVersion, | ||
| TLSCipherSuites: tlsCipherSuites, | ||
| LogLevel: logLevel, | ||
| ClusterNetworkCIDR: clusterNetworkCIDR, | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -2610,3 +2632,52 @@ func (optr *Operator) getOCPVersionFromClusterVersion() string { | |
| } | ||
| return fmt.Sprintf("%d.%d.%d", parsedVersion.Major(), parsedVersion.Minor(), parsedVersion.Patch()) | ||
| } | ||
|
|
||
| func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error { | ||
| // 1. Define your manifest list | ||
| manifests := []string{ | ||
| // "common/network-policies/00-default-deny-all.yaml", | ||
| "common/network-policies/machine-config-controller-allow.yaml", | ||
| // "common/network-policies/03-allow-operator.yaml", | ||
| } | ||
|
|
||
| // if config.IsImageBuildEnabled { | ||
| // manifests = append(manifests, "common/network-policies/02-allow-os-builder.yaml") | ||
| // } | ||
|
|
||
| // 2. Iterate and Apply using Server-Side Apply | ||
| for _, path := range manifests { | ||
| // Render the template using the operator's internal asset renderer | ||
| npBytes, err := renderAsset(config, path) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to render %s: %w", path, err) | ||
| } | ||
|
Comment on lines
+2649
to
+2654
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Description: Verify the network policy manifest path exists in the embedded filesystem
echo "Searching for network policy manifest files..."
# Check the templates directory
echo "=== Checking templates/ ==="
fd -t f "machine-config-controller-allow" templates/ 2>/dev/null || echo "Not found in templates/"
# Check the manifests directory
echo ""
echo "=== Checking manifests/ ==="
fd -t f "machine-config-controller-allow" manifests/ 2>/dev/null || echo "Not found in manifests/"
# Check for any network-policies directories
echo ""
echo "=== Searching for network-policies directories ==="
fd -t d "network-policies" . 2>/dev/null
# Check how manifests.ReadFile is used and what paths are valid
echo ""
echo "=== Checking manifests package embed directive ==="
rg -n "//go:embed" pkg/manifests/ manifests/ 2>/dev/null | head -20Repository: openshift/machine-config-operator Length of output: 430 🏁 Script executed: # Search for renderAsset function definition
rg -n "func renderAsset" pkg/operator/ -A 10Repository: openshift/machine-config-operator Length of output: 643 🏁 Script executed: # Examine manifests.go to understand the embed directive
cat -n pkg/manifests/manifests.go | head -30Repository: openshift/machine-config-operator Length of output: 135 🏁 Script executed: # Check how other manifest paths are used in the sync.go file
rg -n "renderAsset" pkg/operator/sync.go -B 2 -A 2 | head -50Repository: openshift/machine-config-operator Length of output: 1485 🏁 Script executed: # Find where manifests.go actually exists
find . -name "manifests.go" -type f 2>/dev/nullRepository: openshift/machine-config-operator Length of output: 101 🏁 Script executed: # Find where "common/network-policies/machine-config-controller-allow.yaml" is referenced
rg -n "common/network-policies/machine-config-controller-allow" . -A 2 -B 2Repository: openshift/machine-config-operator Length of output: 421 🏁 Script executed: # Find newAssetRenderer implementation
rg -n "func newAssetRenderer" pkg/operator/ -A 15Repository: openshift/machine-config-operator Length of output: 833 🏁 Script executed: # Read manifests.go to see the embed directive
cat -n ./manifests/manifests.go | head -30Repository: openshift/machine-config-operator Length of output: 802 🏁 Script executed: # List files in the templates directory to see the actual path structure
fd -t f "" templates/common/network-policies/ -x echoRepository: openshift/machine-config-operator Length of output: 147 🏁 Script executed: # Check other path patterns used in sync.go to understand the naming convention
rg -n '"\w+/\w+\.yaml"' pkg/operator/sync.go | head -20Repository: openshift/machine-config-operator Length of output: 138 🏁 Script executed: # Check what's actually in the manifests directory
ls -la manifests/common/network-policies/ 2>/dev/null || echo "Directory does not exist in manifests/"Repository: openshift/machine-config-operator Length of output: 115 🏁 Script executed: # Check the full directory structure of manifests/
fd -t d "" manifests/ | head -20Repository: openshift/machine-config-operator Length of output: 283 🏁 Script executed: # Check if templates are part of the build process
rg -n "templates" go.mod pkg/ Makefile build/ 2>/dev/null | grep -i "copy\|build\|embed" | head -10Repository: openshift/machine-config-operator Length of output: 522 🏁 Script executed: # Check if there are any other embed directives that include templates/common
rg -n "go:embed.*common" . 2>/dev/nullRepository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # List all files in the manifests directory to see what's available
fd -t f "" manifests/ | sortRepository: openshift/machine-config-operator Length of output: 4338 🏁 Script executed: # Check git history or comments around this code to see if this is new
rg -n "network-policies" pkg/operator/sync.go -B 5 -A 5Repository: openshift/machine-config-operator Length of output: 790 The manifest path will fail at runtime because the file is not embedded in the manifests directory. The path Either move the file to 🤖 Prompt for AI Agents |
||
|
|
||
| // Convert raw bytes to a NetworkPolicy object | ||
| netPol := resourceread.ReadNetworkPolicyV1OrDie(npBytes) | ||
|
|
||
| // Marshal the NetworkPolicy to JSON for Server-Side Apply | ||
| netPolBytes, err := json.Marshal(netPol) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to marshal network policy %s: %w", netPol.Name, err) | ||
| } | ||
|
|
||
| // Apply with ForceOwnership to ensure the Operator is the source of truth | ||
| // Apply via Server-Side Apply with ForceOwnership | ||
| // This ensures the MCO 'owns' these fields and reverts manual drifts | ||
| _, err = optr.kubeClient.NetworkingV1().NetworkPolicies(netPol.Namespace).Patch( | ||
| context.TODO(), | ||
| netPol.Name, | ||
| types.ApplyPatchType, | ||
| netPolBytes, | ||
| metav1.PatchOptions{ | ||
| FieldManager: "machine-config-operator", | ||
| Force: ptr.To(true), | ||
| }, | ||
| ) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to apply network policy %s: %w", netPol.Name, err) | ||
| } | ||
| } | ||
| return nil | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: mco-allow-controller | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: # TODO: see if the following should be `k8s-app: {{.ControllerAppLabel}}` instead | ||
| k8s-app: machine-config-controller | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress | ||
| ingress: | ||
| - from: # Allow Prometheus Metrics Scraping | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| network.openshift.io/policy-group: monitoring | ||
| ports: | ||
| - protocol: TCP | ||
| port: 9001 | ||
| - from: # Allow Kubelet Health Probes (from Node Network) | ||
| - ipBlock: | ||
| cidr: {{.ClusterNetworkCIDR}} | ||
| ports: | ||
| - protocol: TCP | ||
| port: 9443 | ||
|
Comment on lines
+21
to
+26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # Search for ClusterNetworkCIDR usage in the repository
fd -t f '\.yaml\|\.yml\|\.go\|\.md' | xargs grep -l "ClusterNetworkCIDR" | head -20Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Also search for NodeNetworkCIDR or ServiceNetworkCIDR to understand the variable naming
rg "NetworkCIDR" --type yaml --type md -A 2 -B 2 | head -80Repository: openshift/machine-config-operator Length of output: 585 🏁 Script executed: # Look for the actual file and surrounding context
cat templates/common/network-policies/machine-config-controller-allow.yamlRepository: openshift/machine-config-operator Length of output: 1237 🏁 Script executed: # Search for documentation or comments about kubelet probes and network CIDR
rg "kubelet.*probe|health.*probe" --type yaml --type md -A 3 -B 3 | head -100Repository: openshift/machine-config-operator Length of output: 6902 🏁 Script executed: # Search for where ClusterNetworkCIDR is defined as a variable/parameter
rg "ClusterNetworkCIDR" -B 5 -A 5 --type go --type yaml --type mdRepository: openshift/machine-config-operator Length of output: 8443 🏁 Script executed: # Look for other network CIDR definitions to understand the pattern
rg "NetworkCIDR|NetworkNetwork" --type go --type yaml | head -50Repository: openshift/machine-config-operator Length of output: 5480 🏁 Script executed: # Search for how these template variables are populated
rg "\.ClusterNetworkCIDR|ClusterNetworkCIDR:" --type go -B 3 -A 3 | head -80Repository: openshift/machine-config-operator Length of output: 395 🏁 Script executed: # Check for any documentation or README files about network setup
fd -t f '\.md$|README' | xargs grep -i "ClusterNetworkCIDR\|network.*cidr" | head -30Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Read the getClusterNetworkCIDR implementation
sed -n '/func (optr \*Operator) getClusterNetworkCIDR/,/^}/p' pkg/operator/sync.goRepository: openshift/machine-config-operator Length of output: 517 🏁 Script executed: # Search for any node network or machine network CIDR usage
rg "NodeNetwork|MachineNetwork|node.*cidr" --type go -A 3 -B 3 | head -100Repository: openshift/machine-config-operator Length of output: 9425 🏁 Script executed: # Look for kubelet configuration or probe documentation in the codebase
rg "9443|kubelet.*probe|Kubelet Health" -A 3 -B 3 --type yaml --type go | head -80Repository: openshift/machine-config-operator Length of output: 7475 🏁 Script executed: # Search for what uses or listens on port 9443
rg "9443|machine.*config.*controller" --type go --type yaml -B 3 -A 3 | grep -A 10 -B 10 "9443\|probe\|webhook" | head -150Repository: openshift/machine-config-operator Length of output: 11461 🏁 Script executed: # Look for webhook configurations or health probe definitions
rg "webhook|healthz|readiness.*probe|liveness.*probe" --type yaml -A 5 -B 5 | head -120Repository: openshift/machine-config-operator Length of output: 15321 🏁 Script executed: # Search the codebase for how the machine-config-controller is accessed on port 9443
fd -t f '\.go$' | xargs grep -l "9443" | head -10Repository: openshift/machine-config-operator Length of output: 402 🏁 Script executed: # Find machine-config-controller deployment manifests
fd -t f "\.yaml\|\.yml" | xargs grep -l "machine-config-controller" | head -20Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Look for any deployment/pod specs for machine-config-controller
fd -t f "deployment\|pod\|statefulset" -path "*/machine-config*"Repository: openshift/machine-config-operator Length of output: 245 🏁 Script executed: # Search for webhook or TLS server configuration in machine-config-controller code
rg "ValidatingWebhook|MutatingWebhook|listener.*9443|Serve.*9443" --type go -A 5 -B 5Repository: openshift/machine-config-operator Length of output: 50390 🏁 Script executed: # Look for any references to what calls/probes this service
rg "machine.*config.*controller.*9443|9443.*machine" --type yaml --type goRepository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Search in the actual pkg directory for machine-config-controller main code
find pkg -name "*.go" -type f | xargs grep -l "port.*9443\|9443.*port" 2>/dev/null | head -10Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Search for where machine-config-controller might have a server/listen on a port
rg "ListenAndServe|server.*Start|Addr.*Port" pkg/controller --type go -B 5 -A 5 | head -80Repository: openshift/machine-config-operator Length of output: 789 🏁 Script executed: # Let's look at what the file names suggest about network policies
ls -la templates/common/network-policies/Repository: openshift/machine-config-operator Length of output: 276 🏁 Script executed: # Search for metrics or server port configuration in machine-config-controller
rg "Addr.*:80|Addr.*:8080|Addr.*:9000|Addr.*:9" pkg/controller --type go -B 3 -A 3 | head -60Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Look for any operator code that manages network policies
rg "mco-allow\|mco.*allow\|NetworkPolicy" pkg/operator --type go -B 3 -A 3 | head -100Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Check if port 9443 is specifically for webhooks or health checks
rg "webhook|health|readiness" pkg/controller --type go | head -30Repository: openshift/machine-config-operator Length of output: 59 🏁 Script executed: # Look at what the comment in the NetworkPolicy file says around the problematic rule
head -n 30 templates/common/network-policies/machine-config-controller-allow.yamlRepository: openshift/machine-config-operator Length of output: 891 Use the correct CIDR source for the intended access. The comment says "from Node Network" but the rule uses 🤖 Prompt for AI Agents |
||
| egress: | ||
| - to: # Allow API Server Access | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: openshift-kube-apiserver | ||
| ports: | ||
| - protocol: TCP | ||
| port: 6443 | ||
| - to: # Allow DNS Lookups | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: openshift-dns | ||
| ports: | ||
| - protocol: UDP | ||
| port: 53 | ||
| - protocol: TCP | ||
| port: 53 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deny-all policy will block metrics scraping and API server communication.
This policy blocks all traffic for
kube-rbac-proxy-criopods. The kube-rbac-proxy needs:Without allow rules, metrics collection will fail and the proxy cannot authorize requests.
Additionally, this file uses a hardcoded
namespace: openshift-machine-config-operatorwhile the other two NetworkPolicy files use the{{.TargetNamespace}}template. Confirm whether this inconsistency is intentional (e.g., this file is applied directly while others go through templating).🤖 Prompt for AI Agents