fix(gmc): support NodeLocal DNSCache for tenant DNS egress (Q136)#232
Open
karlkfi wants to merge 2 commits into
Open
fix(gmc): support NodeLocal DNSCache for tenant DNS egress (Q136)#232karlkfi wants to merge 2 commits into
karlkfi wants to merge 2 commits into
Conversation
4462af3 to
7e343ab
Compare
…Cache (Q136) Q105 (#228) confined worker/proxy/AGC port-53 egress to the cluster DNS service via a kube-dns namespaceSelector+podSelector peer. On clusters running NodeLocal DNSCache (node-local-dns), pods send DNS to a link-local address (169.254.20.10 by the kube-standard `__PILLAR__LOCAL__DNS__` convention) served by a per-node hostNetwork DNSCache pod — which no pod/namespace selector can match. On an enforcing CNI (Calico/Cilium) that DNS traffic was dropped, breaking resolution for workers AND the proxy on any NodeLocal-DNSCache cluster — a regression Q105 introduced. Add a second OR'd peer to the shared dnsEgressRule() helper: an ipBlock allowing port-53 to the IPv4 link-local block 169.254.0.0/16, alongside the existing kube-dns selector. Both paths now work — a cluster without NodeLocal DNSCache still resolves directly via kube-dns. Because the helper is shared, this fixes all three per-tenant policies (workload, AGC, proxy) at once. This preserves Q105's per-tenant attribution property: 169.254.0.0/16 is non-routable and node-scoped, so it cannot reach an arbitrary external resolver — the DNS-exfiltration side-channel Q105 closed stays closed. kindnet does not enforce egress NetworkPolicy, so the CI guard is the authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS, extended to assert each policy's port-53 rule now selects BOTH the kube-dns peer AND the 169.254.0.0/16 ipBlock. Docs: network-architecture.md (three policy YAMLs + DNS Resolution prose), 05-security.md (DNS Exfiltration Side-Channel row notes the link-local peer and why it preserves attribution), security-operations.md (NodeLocal DNSCache now supported out of the box, replacing the Q105 known-limitation note).
7e343ab to
9949671
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Q136. Adds an
ipBlockpeer for the IPv4 link-local block169.254.0.0/16to the per-tenant DNS egress rule, so DNS resolves on clusters running NodeLocal DNSCache (node-local-dns) — alongside the existing kube-dns selector peer Q105 (#228) added.The change is one peer added to the shared
dnsEgressRule()helper incmd/gmc/internal/controller/builder.go, which all three per-tenant NetworkPolicies (workload, AGC, proxy) consume — so it fixes every policy at once.Why
Q105 (#228) confined port-53 egress to the cluster DNS service via a
kube-dnsnamespaceSelector+podSelector peer. On NodeLocal DNSCache clusters, pods send DNS to a link-local address (169.254.20.10by the kube-standard__PILLAR__LOCAL__DNS__convention) served by a per-nodehostNetworkDNSCache pod — which no pod/namespace selector matches. On an enforcing CNI (Calico/Cilium) that traffic was dropped, breaking resolution for workers and the proxy. That was a regression Q105 introduced; this restores it.Both peers are OR'd, so a cluster without NodeLocal DNSCache still resolves directly via kube-dns. The kube-dns rule is kept, not replaced.
Security — preserves Q105's attribution property
169.254.0.0/16is non-routable and node-scoped, so it cannot reach an arbitrary external resolver. The DNS-exfiltration side-channel Q105 closed stays closed; the per-tenant egress-IP attribution is intact. Reasoned in a code comment and noted in05-security.md.Tests
Extended the Q105 authoring guard
TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNSto assert each policy's port-53 rule now has both peers: the kube-dns selector AND the169.254.0.0/16ipBlock (bare, no Except). kindnet doesn't enforce egress NetworkPolicy, so this spec-level authoring test is the CI guard — same approach as Q105/Q7b; no kind e2e needed.make checkgreen.Docs
docs/design/network-architecture.md— all three policy YAMLs + DNS Resolution prose now show the two-peer rule.docs/design/05-security.md— DNS Exfiltration Side-Channel row notes the link-local peer and why it preserves attribution.docs/operations/security-operations.md— NodeLocal DNSCache is now supported out of the box (replaces the Q105 known-limitation/workaround note).Commits
docs/STATUS.md(Q136 row removal) isolated per repo convention.