Skip to content

feat: support simple network policies (without cilium)#214

Draft
scanhex12 wants to merge 1 commit into
mainfrom
network_policies
Draft

feat: support simple network policies (without cilium)#214
scanhex12 wants to merge 1 commit into
mainfrom
network_policies

Conversation

@scanhex12

Copy link
Copy Markdown
Member

What

support simple network policies

@scanhex12 scanhex12 marked this pull request as draft June 3, 2026 07:48

@GrigoryPervakov GrigoryPervakov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-generated review (Claude, posted on Grigorii's request). Treat as triage input; the maintainer will decide what holds.

Overall shape looks correct — native-first, per-CR enable, ingress-only, per-component object, owner-ref + delete-on-disable. Four concerns before merge.

Bugs

1. TLS ports break the policy. networkpolicy.go hardcodes 9000 (PortNative) and 8123 (PortHTTP). When spec.settings.tls.enabled is true, buildProtocols in internal/controller/clickhouse/templates.go:579 drops 9000/8123 entirely and exposes 9440 (PortNativeSecure) and 8443 (PortHTTPSecure). A TLS-enabled cluster with networkPolicy.enabled: true gets a policy that opens dead ports and silently blocks the live ones.

2. spec.additionalPorts are silently blocked. This is the documented extension surface for MySQL/Postgres/gRPC and custom listeners — docs/guides/configuration.mdx:428, validated at internal/webhook/v1alpha1/clickhousecluster_webhook.go:132, wired into pods and the headless Service at internal/controller/clickhouse/templates.go:38 and :507. The PR ignores cr.Spec.AdditionalPorts in networkpolicy.go. A user with `additionalPorts: [{port: 9004, name: mysql}]` who enables `networkPolicy` loses MySQL connectivity with no warning.

Both 1 and 2 share a root cause: `networkpolicy.go` re-derives the port set independently instead of asking the same source `templates.go` does. One helper returning "all client-facing TCP ports for this CR" (TLS-aware, `additionalPorts`-aware), called from both the Service template and the policy builder, fixes both. Adding a new port type later is then one line in one place.

Design

3. `Backend string` is dead API surface. The field accepts any string, validates nothing, is read nowhere. `// TODO: add Cilium` is a fine internal comment but not a reason to ship an open-ended string on the CRD. Either drop it until CNP support actually lands, or make it `+kubebuilder:validation:Enum=kubernetes` with one value today. API additions on `networkPolicy` are hard to retract; see Strimzi proposal 028 for the kind of rationale these surfaces eventually need.

4. Prometheus is silently blocked on enable. With `enabled: true` and `monitoringPeers` nil, port 9363 has no ingress rule and scraping breaks. Default `monitoringPeers` to "open to all" (Strimzi's choice for metrics ports), or document loudly in both the field godoc and the user guide that enabling the policy without setting `monitoringPeers` will drop metrics.

Scope

No Keeper policy. CH → Keeper on `keeper.PortNative` (2181) and Raft `keeper.PortInterserver` (9234) stay open. If the intent is "ClickHouse-side only, Keeper in a follow-up", say so in the PR title/body and file an issue. Otherwise this is half a feature.

Minor

  • Self-ingress rule conflates `PortInterserver` + `PortNative` into one rule. Interserver replication (9009) and intra-cluster distributed-query (9000) are different concerns; two rules document intent better.
  • `clusterNamespace` peer (`kubernetes.io/metadata.name: cr.Namespace`) allows any pod in the operand namespace to hit client ports. If the intent was "operator can reach the pod", narrow to operator-labelled pods (Strimzi pattern) — the operator already talks through the headless Service.
  • e2e test uses `Replicas: new(int32(1))` in two places — `new()` takes a type, not an expression. CI should catch it; flagging in case it doesn't.
  • Positive e2e check (`nc -z` from a `role=allowed` pod) passes regardless of policy enforcement — kindnet doesn't enforce NetworkPolicies. Negative cases live behind `NP_ENFORCING_CNI` env with no CI job setting it. Test name `manages and enforces the cluster NetworkPolicy` overpromises — under default CI it's structural only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants