feat: support simple network policies (without cilium)#214
Conversation
GrigoryPervakov
left a comment
There was a problem hiding this comment.
AI-generated review (Claude, posted on Grigorii's request). Treat as triage input; the maintainer will decide what holds.
Overall shape looks correct — native-first, per-CR enable, ingress-only, per-component object, owner-ref + delete-on-disable. Four concerns before merge.
Bugs
1. TLS ports break the policy. networkpolicy.go hardcodes 9000 (PortNative) and 8123 (PortHTTP). When spec.settings.tls.enabled is true, buildProtocols in internal/controller/clickhouse/templates.go:579 drops 9000/8123 entirely and exposes 9440 (PortNativeSecure) and 8443 (PortHTTPSecure). A TLS-enabled cluster with networkPolicy.enabled: true gets a policy that opens dead ports and silently blocks the live ones.
2. spec.additionalPorts are silently blocked. This is the documented extension surface for MySQL/Postgres/gRPC and custom listeners — docs/guides/configuration.mdx:428, validated at internal/webhook/v1alpha1/clickhousecluster_webhook.go:132, wired into pods and the headless Service at internal/controller/clickhouse/templates.go:38 and :507. The PR ignores cr.Spec.AdditionalPorts in networkpolicy.go. A user with `additionalPorts: [{port: 9004, name: mysql}]` who enables `networkPolicy` loses MySQL connectivity with no warning.
Both 1 and 2 share a root cause: `networkpolicy.go` re-derives the port set independently instead of asking the same source `templates.go` does. One helper returning "all client-facing TCP ports for this CR" (TLS-aware, `additionalPorts`-aware), called from both the Service template and the policy builder, fixes both. Adding a new port type later is then one line in one place.
Design
3. `Backend string` is dead API surface. The field accepts any string, validates nothing, is read nowhere. `// TODO: add Cilium` is a fine internal comment but not a reason to ship an open-ended string on the CRD. Either drop it until CNP support actually lands, or make it `+kubebuilder:validation:Enum=kubernetes` with one value today. API additions on `networkPolicy` are hard to retract; see Strimzi proposal 028 for the kind of rationale these surfaces eventually need.
4. Prometheus is silently blocked on enable. With `enabled: true` and `monitoringPeers` nil, port 9363 has no ingress rule and scraping breaks. Default `monitoringPeers` to "open to all" (Strimzi's choice for metrics ports), or document loudly in both the field godoc and the user guide that enabling the policy without setting `monitoringPeers` will drop metrics.
Scope
No Keeper policy. CH → Keeper on `keeper.PortNative` (2181) and Raft `keeper.PortInterserver` (9234) stay open. If the intent is "ClickHouse-side only, Keeper in a follow-up", say so in the PR title/body and file an issue. Otherwise this is half a feature.
Minor
- Self-ingress rule conflates `PortInterserver` + `PortNative` into one rule. Interserver replication (9009) and intra-cluster distributed-query (9000) are different concerns; two rules document intent better.
- `clusterNamespace` peer (`kubernetes.io/metadata.name: cr.Namespace`) allows any pod in the operand namespace to hit client ports. If the intent was "operator can reach the pod", narrow to operator-labelled pods (Strimzi pattern) — the operator already talks through the headless Service.
- e2e test uses `Replicas: new(int32(1))` in two places — `new()` takes a type, not an expression. CI should catch it; flagging in case it doesn't.
- Positive e2e check (`nc -z` from a `role=allowed` pod) passes regardless of policy enforcement — kindnet doesn't enforce NetworkPolicies. Negative cases live behind `NP_ENFORCING_CNI` env with no CI job setting it. Test name `manages and enforces the cluster NetworkPolicy` overpromises — under default CI it's structural only.
What
support simple network policies