Skip to content

fix(gmc): support NodeLocal DNSCache for tenant DNS egress (Q136)#232

Open
karlkfi wants to merge 2 commits into
mainfrom
claude/kind-lewin-3fbb9b
Open

fix(gmc): support NodeLocal DNSCache for tenant DNS egress (Q136)#232
karlkfi wants to merge 2 commits into
mainfrom
claude/kind-lewin-3fbb9b

Conversation

@karlkfi

@karlkfi karlkfi commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What

Q136. Adds an ipBlock peer for the IPv4 link-local block 169.254.0.0/16 to the per-tenant DNS egress rule, so DNS resolves on clusters running NodeLocal DNSCache (node-local-dns) — alongside the existing kube-dns selector peer Q105 (#228) added.

The change is one peer added to the shared dnsEgressRule() helper in cmd/gmc/internal/controller/builder.go, which all three per-tenant NetworkPolicies (workload, AGC, proxy) consume — so it fixes every policy at once.

Why

Q105 (#228) confined port-53 egress to the cluster DNS service via a kube-dns namespaceSelector+podSelector peer. On NodeLocal DNSCache clusters, pods send DNS to a link-local address (169.254.20.10 by the kube-standard __PILLAR__LOCAL__DNS__ convention) served by a per-node hostNetwork DNSCache pod — which no pod/namespace selector matches. On an enforcing CNI (Calico/Cilium) that traffic was dropped, breaking resolution for workers and the proxy. That was a regression Q105 introduced; this restores it.

Both peers are OR'd, so a cluster without NodeLocal DNSCache still resolves directly via kube-dns. The kube-dns rule is kept, not replaced.

Security — preserves Q105's attribution property

169.254.0.0/16 is non-routable and node-scoped, so it cannot reach an arbitrary external resolver. The DNS-exfiltration side-channel Q105 closed stays closed; the per-tenant egress-IP attribution is intact. Reasoned in a code comment and noted in 05-security.md.

Tests

Extended the Q105 authoring guard TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS to assert each policy's port-53 rule now has both peers: the kube-dns selector AND the 169.254.0.0/16 ipBlock (bare, no Except). kindnet doesn't enforce egress NetworkPolicy, so this spec-level authoring test is the CI guard — same approach as Q105/Q7b; no kind e2e needed.

make check green.

Docs

  • docs/design/network-architecture.md — all three policy YAMLs + DNS Resolution prose now show the two-peer rule.
  • docs/design/05-security.md — DNS Exfiltration Side-Channel row notes the link-local peer and why it preserves attribution.
  • docs/operations/security-operations.md — NodeLocal DNSCache is now supported out of the box (replaces the Q105 known-limitation/workaround note).

Commits

  • Code + design/ops docs in one commit; docs/STATUS.md (Q136 row removal) isolated per repo convention.

@karlkfi karlkfi force-pushed the claude/kind-lewin-3fbb9b branch from 4462af3 to 7e343ab Compare June 15, 2026 04:20
karlkfi added 2 commits June 14, 2026 21:33
…Cache (Q136)

Q105 (#228) confined worker/proxy/AGC port-53 egress to the cluster DNS
service via a kube-dns namespaceSelector+podSelector peer. On clusters
running NodeLocal DNSCache (node-local-dns), pods send DNS to a
link-local address (169.254.20.10 by the kube-standard
`__PILLAR__LOCAL__DNS__` convention) served by a per-node hostNetwork
DNSCache pod — which no pod/namespace selector can match. On an enforcing
CNI (Calico/Cilium) that DNS traffic was dropped, breaking resolution for
workers AND the proxy on any NodeLocal-DNSCache cluster — a regression
Q105 introduced.

Add a second OR'd peer to the shared dnsEgressRule() helper: an ipBlock
allowing port-53 to the IPv4 link-local block 169.254.0.0/16, alongside
the existing kube-dns selector. Both paths now work — a cluster without
NodeLocal DNSCache still resolves directly via kube-dns. Because the
helper is shared, this fixes all three per-tenant policies (workload,
AGC, proxy) at once.

This preserves Q105's per-tenant attribution property: 169.254.0.0/16 is
non-routable and node-scoped, so it cannot reach an arbitrary external
resolver — the DNS-exfiltration side-channel Q105 closed stays closed.

kindnet does not enforce egress NetworkPolicy, so the CI guard is the
authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS,
extended to assert each policy's port-53 rule now selects BOTH the
kube-dns peer AND the 169.254.0.0/16 ipBlock.

Docs: network-architecture.md (three policy YAMLs + DNS Resolution
prose), 05-security.md (DNS Exfiltration Side-Channel row notes the
link-local peer and why it preserves attribution), security-operations.md
(NodeLocal DNSCache now supported out of the box, replacing the Q105
known-limitation note).
@karlkfi karlkfi force-pushed the claude/kind-lewin-3fbb9b branch from 7e343ab to 9949671 Compare June 15, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant