fix(gmc): confine worker/proxy DNS egress to cluster DNS (Q105)#228
Merged
Conversation
All three per-tenant NetworkPolicies (workload, AGC, proxy) emitted a port-53 egress rule with no `To` peer — i.e. DNS to *any* server. That is an unattributed data-exfiltration side-channel: untrusted worker job code could smuggle data out via DNS queries to an attacker-controlled resolver, bypassing the per-tenant egress-IP attribution that all real egress (via the tenant proxy) is supposed to enforce. Restrict the DNS rule to the cluster DNS service only — kube-dns / CoreDNS in kube-system, matched by a single peer combining namespaceSelector (kubernetes.io/metadata.name=kube-system) AND podSelector (k8s-app=kube-dns). kube-dns recurses upstream, so legitimate resolution (including the proxy's own GitHub-hostname lookups) is unaffected; only the "any resolver" breadth is removed. Factored into a shared dnsEgressRule() helper so the posture cannot drift between the three policies. kindnet does not enforce egress NetworkPolicy, so the CI guard is the authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS, which asserts each policy's port-53 rule selects kube-dns and is never open. Docs: network-architecture.md (three policy YAMLs + DNS Resolution prose), 05-security.md (new DNS Exfiltration Side-Channel threat row), and security-operations.md (operator caveat on the kube-dns label assumption).
Adds a "Tenant egress posture & deliberate widening" section to security-operations.md: the secure default is controller-managed and not opt-in; deliberate widening for jobs the proxy can't carry (non-HTTP, internal services, custom resolver) is done via an additive NetworkPolicy in the tenant namespace, targeting the existing actions-gateway/component=workload and actions-gateway/runner-group=<name> labels for tenant-wide or per-runner-type scope. States the secure-by-default trade-off (additive egress bypasses per-tenant proxy egress-IP attribution; keep the allowlist narrow; it requires namespace NetworkPolicy-write, so it is a platform decision), recommends a CoreDNS forward zone over reopening worker DNS, and points at the existing spec.proxy.managedNetworkPolicy=false hand-off for the proxy egress rule.
25bdc30 to
7964ca8
Compare
karlkfi
added a commit
that referenced
this pull request
Jun 15, 2026
…Cache (Q136) Q105 (#228) confined worker/proxy/AGC port-53 egress to the cluster DNS service via a kube-dns namespaceSelector+podSelector peer. On clusters running NodeLocal DNSCache (node-local-dns), pods send DNS to a link-local address (169.254.20.10 by the kube-standard `__PILLAR__LOCAL__DNS__` convention) served by a per-node hostNetwork DNSCache pod — which no pod/namespace selector can match. On an enforcing CNI (Calico/Cilium) that DNS traffic was dropped, breaking resolution for workers AND the proxy on any NodeLocal-DNSCache cluster — a regression Q105 introduced. Add a second OR'd peer to the shared dnsEgressRule() helper: an ipBlock allowing port-53 to the IPv4 link-local block 169.254.0.0/16, alongside the existing kube-dns selector. Both paths now work — a cluster without NodeLocal DNSCache still resolves directly via kube-dns. Because the helper is shared, this fixes all three per-tenant policies (workload, AGC, proxy) at once. This preserves Q105's per-tenant attribution property: 169.254.0.0/16 is non-routable and node-scoped, so it cannot reach an arbitrary external resolver — the DNS-exfiltration side-channel Q105 closed stays closed. kindnet does not enforce egress NetworkPolicy, so the CI guard is the authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS, extended to assert each policy's port-53 rule now selects BOTH the kube-dns peer AND the 169.254.0.0/16 ipBlock. Docs: network-architecture.md (three policy YAMLs + DNS Resolution prose), 05-security.md (DNS Exfiltration Side-Channel row notes the link-local peer and why it preserves attribution), security-operations.md (NodeLocal DNSCache now supported out of the box, replacing the Q105 known-limitation note).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The AGC/proxy NetworkPolicy builder emitted a port-53 (UDP/TCP) DNS egress rule with no
Topeer (to: []≡ any destination) on all three per-tenant NetworkPolicies — workload (AGC + workers), AGC, and proxy. Worker and proxy pods could therefore send DNS to any server.This change restricts the port-53 egress rule to the cluster DNS service only: kube-dns / CoreDNS in
kube-system, selected by a single peer combining anamespaceSelectoron the well-known immutablekubernetes.io/metadata.name: kube-systemlabel and apodSelectoron the conventionalk8s-app: kube-dnslabel. The shareddnsEgressRule()helper is now used by all three policies so the DNS posture cannot drift between them.Why
An open port-53 rule is an unattributed data-exfiltration side-channel. Untrusted GitHub Actions job code running in a worker pod could encode data into DNS queries aimed at an attacker-controlled authoritative server, bypassing the per-tenant egress-IP attribution that is a headline isolation property of this project — all real egress is supposed to traverse the tenant proxy, whose source IPs are attributable. An open DNS path is an unattributed bypass.
kube-dns recurses upstream on the pod's behalf, so legitimate resolution — including the proxy's own GitHub-hostname lookups — is unaffected. Only the "any resolver" breadth is removed, not resolution itself.
This is Q105 (S-sized,
securitydocs, flagged must-resolve-before-1.0). The preferred fix (restrict to kube-dns) was achievable in all supported topologies, so the documentation-only fallback was not needed.Testing
TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS: asserts every policy's port-53 rule has a non-emptyTopeer selecting kube-dns (namespacekube-system+ podk8s-app: kube-dns) and is never open. This mirrors the existing egress-negative guard pattern (Q7b) because kindnet does not enforce egress NetworkPolicy, so a live e2e deny test is unreliable.TestBuildNetworkPolicy_DNSEgressAlwaysPresentstill passes (DNS UDP/TCP still present on all three policies).make checkgreen.Docs
docs/design/network-architecture.md— updated the three policy YAML blocks and the DNS Resolution prose.docs/design/05-security.md— new DNS Exfiltration Side-Channel threat row + updated §5.2 intro.docs/operations/security-operations.md— operator caveat: clusters running DNS under a non-standard label/namespace with an enforcing CNI must relabel or supply their own DNS egress rule.docs/operations/security-operations.md— new Tenant egress posture & deliberate widening section: the secure default is controller-managed (not opt-in); how to grant a runner group (or a whole tenant) extra egress via an additive NetworkPolicy targeting the existingactions-gateway/component=workload+actions-gateway/runner-group=<name>labels, with the secure-by-default trade-off (additive egress bypasses proxy attribution; keep it narrow; it is a platform/admin action), a CoreDNS-forward recommendation for custom DNS, and a pointer to the existingspec.proxy.managedNetworkPolicy=falsehand-off.docs/plan/release-1.0.md+docs/plan/security-audit-2026-06.md— marked Q105 resolved.docs/STATUS.md— Q105 row removed (isolated commit).Rebased onto latest main (includes #229 Q121/Q122/Q125, #230 network-doc fix, #227 Q109).