Skip to content

feat: add --drop-cidr for VM egress isolation#4

Merged
CMGS merged 1 commit into
mainfrom
feat/vm-egress-isolation
Jun 30, 2026
Merged

feat: add --drop-cidr for VM egress isolation#4
CMGS merged 1 commit into
mainfrom
feat/vm-egress-isolation

Conversation

@tonicmuroq

@tonicmuroq tonicmuroq commented Jun 28, 2026

Copy link
Copy Markdown

What

Adds a --drop-cidr flag (repeatable, on both init and adopt) that blocks VM-originated traffic to a destination CIDR. It is persisted to pool.json and reapplied by the daemon as FORWARD DROP rules inserted at the head of the chain.

This gives cocoon VMs egress isolation: a single --drop-cidr 172.20.0.0/16 cuts all VM-to-VM (same- and cross-node, since bridge-nf-call-iptables=1 routes bridged traffic through FORWARD), and --drop-cidr 10.0.0.0/8 cuts access to internal/VPC management ranges — while internet egress and return traffic are unaffected.

How it's wired

--drop-cidr (init/adopt)
  -> flagDropCIDRs (cmd/helpers.go)
  -> pool.State.DropCIDRs  (persisted in pool.json)
  -> node.Config.DropCIDRs (daemon reads state)
  -> setupIPTables: validate + iptInsert  FORWARD -i cni0 -d <cidr> -j DROP
  • New iptInsert helper, symmetric to iptEnsure, idempotent (Exists check) and inserts at position 1 so DROP precedes the accept rules.
  • CIDRs validated with net.ParseCIDR before any rule is applied, so a bad value fails init before pool.json is written.
  • dropCIDRs is omitempty — existing pool.json without the field decodes to nil = no drops (backward compatible).

Why this is add-only (not reconciling) for now

Intentional, for consistency with the existing iptables code. setupIPTables already uses iptEnsure (append + Exists check) and never deletes — e.g. changing --subnet leaves the old MASQUERADE -s <old> rule behind until the next reboot reapplies from current config. The new DROP rules follow the same semantics.

Note the asymmetry that already exists in cocoon-net: the cloud-side alias does reconcile (assignAliasIP replaces the stale cocoon-pods entry on the node), but host iptables does not. Making the drop rules declarative in isolation would be inconsistent with the surrounding MASQUERADE/FORWARD rules; a proper fix would convert the whole setupIPTables to a managed chain (flush + rebuild covering MASQUERADE + FORWARD + drops). That is out of scope here and left as a follow-up.

Scenarios covered

  • Re-run / daemon restart: idempotent, no duplicate rules; iptables survives process restart.
  • Node reboot: daemon reapplies all rules (incl. drops) on start.
  • adopt: --manage-iptables=false (default) skips local iptables but persists DropCIDRs; the daemon enforces them. --manage-iptables=true applies immediately.
  • Not covered: removing a CIDR from config leaves a stale DROP until reboot (see above); and traffic to the node's own address (e.g. a kubelet bound on the cni0 gateway IP) is delivered via INPUT, not FORWARD, so --drop-cidr does not cover it — documented in the README.

Testing

  • make lint (GOOS=linux + GOOS=darwin): 0 issues.
  • go build ./... + go vet ./...: clean on both platforms.

@CMGS CMGS force-pushed the feat/vm-egress-isolation branch from 6ba6510 to 8cf0862 Compare June 30, 2026 08:25
@CMGS

CMGS commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Rebased onto main and reworked this into a single commit (build + lint green on linux and darwin). The core change is the interface, plus closing the same-node fail-open we discussed.

--drop-cidr <own-subnet>--drop-internal-access. cocoon-net already knows its range from --subnet, so making the operator restate it as a CIDR was redundant and easy to get wrong. --drop-internal-access (bool) derives the VM-to-VM FORWARD -i cni0 -d <subnet> -j DROP from what we already have. --drop-cidr stays, but now only for genuinely external ranges (e.g. 10.0.0.0/8 management).

Same-node isolation was silently fail-open. Same-node VMs share cni0 and are L2-switched, which bypasses iptables FORWARD unless bridge-nf-call-iptables=1. The old code never set it. I checked a live node (.52): the toggle was 1, but kubelet is inactive — Docker had set it (its DOCKER-USER/DOCKER-FORWARD chains are in FORWARD). So the value comes from unrelated software, not from us or any guaranteed prereq; on a bare cocoon-net node it'd be 0 and the headline VM-to-VM drop would quietly do nothing. So node.Setup now modprobes br_netfilter + sets the sysctl + reads it back and fails closed when a drop target is configured.

Also folded in: drop CIDRs validated + canonicalized (ipNet.String()) up front before any rule install, IPv6 rejected (rules go through the v4 binary), and iptInsert now wraps its error.

One design call left to you — rule lifecycle. It's still add-only: there's no iptables Delete/flush anywhere, and teardown doesn't touch iptables. So removing/changing a drop across daemon restarts leaves the old DROP in place, and teardown leaves stale rules on a reused host. If you want it reconcilable, tag the rules with -m comment --comment cocoon-net:drop so the daemon can prune ones no longer in pool.json and teardown can flush them. Didn't want to pick that policy for you — happy to add it whichever way you prefer.

VM-to-VM isolation no longer asks operators to restate the cocoon subnet as
a --drop-cidr: --drop-internal-access derives the FORWARD DROP target from the
subnet cocoon already knows. --drop-cidr now covers only external ranges (e.g.
management networks).

Same-node VMs share cni0 and are L2-switched, which bypasses iptables unless
bridge-nf-call-iptables=1. Node setup now loads br_netfilter and enables it
(verifying, failing closed) whenever a drop target is set, so the isolation is
never silently a no-op. Drop CIDRs are validated and canonicalized up front;
IPv6 is rejected.

The DROP rules are tagged cocoon-net-drop so teardown removes exactly them.
@CMGS CMGS force-pushed the feat/vm-egress-isolation branch from 8cf0862 to 4031547 Compare June 30, 2026 08:51
@CMGS

CMGS commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Update — CMGS okayed adding the rule lifecycle, so it's no longer add-only:

  • DROP rules are tagged -m comment --comment cocoon-net-drop.
  • teardown now calls node.ClearDropRules, which lists FORWARD and removes exactly the tagged rules, leaving a reused host clean.

One gotcha worth recording: the tag uses a hyphen, not a colon. iptables -S quotes any comment containing chars outside [-_+./0-9A-Za-z], and the quotes then break the ListDelete round-trip — confirmed on a live node. The marker also makes a daemon-side reconcile (prune rules no longer in pool.json) a small follow-up if you ever want it.

Still one commit, build + lint green on linux and darwin.

@CMGS CMGS merged commit a0bfb21 into main Jun 30, 2026
2 checks passed
@CMGS CMGS deleted the feat/vm-egress-isolation branch June 30, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants