Skip to content

fix(gmc): confine worker/proxy DNS egress to cluster DNS (Q105)#228

Merged
karlkfi merged 6 commits into
mainfrom
claude/determined-maxwell-db62f4
Jun 14, 2026
Merged

fix(gmc): confine worker/proxy DNS egress to cluster DNS (Q105)#228
karlkfi merged 6 commits into
mainfrom
claude/determined-maxwell-db62f4

Conversation

@karlkfi

@karlkfi karlkfi commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

What

The AGC/proxy NetworkPolicy builder emitted a port-53 (UDP/TCP) DNS egress rule with no To peer (to: [] ≡ any destination) on all three per-tenant NetworkPolicies — workload (AGC + workers), AGC, and proxy. Worker and proxy pods could therefore send DNS to any server.

This change restricts the port-53 egress rule to the cluster DNS service only: kube-dns / CoreDNS in kube-system, selected by a single peer combining a namespaceSelector on the well-known immutable kubernetes.io/metadata.name: kube-system label and a podSelector on the conventional k8s-app: kube-dns label. The shared dnsEgressRule() helper is now used by all three policies so the DNS posture cannot drift between them.

Why

An open port-53 rule is an unattributed data-exfiltration side-channel. Untrusted GitHub Actions job code running in a worker pod could encode data into DNS queries aimed at an attacker-controlled authoritative server, bypassing the per-tenant egress-IP attribution that is a headline isolation property of this project — all real egress is supposed to traverse the tenant proxy, whose source IPs are attributable. An open DNS path is an unattributed bypass.

kube-dns recurses upstream on the pod's behalf, so legitimate resolution — including the proxy's own GitHub-hostname lookups — is unaffected. Only the "any resolver" breadth is removed, not resolution itself.

This is Q105 (S-sized, security docs, flagged must-resolve-before-1.0). The preferred fix (restrict to kube-dns) was achievable in all supported topologies, so the documentation-only fallback was not needed.

Testing

  • Added authoring/spec-level guard TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS: asserts every policy's port-53 rule has a non-empty To peer selecting kube-dns (namespace kube-system + pod k8s-app: kube-dns) and is never open. This mirrors the existing egress-negative guard pattern (Q7b) because kindnet does not enforce egress NetworkPolicy, so a live e2e deny test is unreliable.
  • The existing TestBuildNetworkPolicy_DNSEgressAlwaysPresent still passes (DNS UDP/TCP still present on all three policies).
  • make check green.

Docs

  • docs/design/network-architecture.md — updated the three policy YAML blocks and the DNS Resolution prose.
  • docs/design/05-security.md — new DNS Exfiltration Side-Channel threat row + updated §5.2 intro.
  • docs/operations/security-operations.md — operator caveat: clusters running DNS under a non-standard label/namespace with an enforcing CNI must relabel or supply their own DNS egress rule.
  • docs/operations/security-operations.md — new Tenant egress posture & deliberate widening section: the secure default is controller-managed (not opt-in); how to grant a runner group (or a whole tenant) extra egress via an additive NetworkPolicy targeting the existing actions-gateway/component=workload + actions-gateway/runner-group=<name> labels, with the secure-by-default trade-off (additive egress bypasses proxy attribution; keep it narrow; it is a platform/admin action), a CoreDNS-forward recommendation for custom DNS, and a pointer to the existing spec.proxy.managedNetworkPolicy=false hand-off.
  • docs/plan/release-1.0.md + docs/plan/security-audit-2026-06.md — marked Q105 resolved.
  • docs/STATUS.md — Q105 row removed (isolated commit).

Rebased onto latest main (includes #229 Q121/Q122/Q125, #230 network-doc fix, #227 Q109).

karlkfi added 3 commits June 14, 2026 14:09
All three per-tenant NetworkPolicies (workload, AGC, proxy) emitted a
port-53 egress rule with no `To` peer — i.e. DNS to *any* server. That is
an unattributed data-exfiltration side-channel: untrusted worker job code
could smuggle data out via DNS queries to an attacker-controlled resolver,
bypassing the per-tenant egress-IP attribution that all real egress (via
the tenant proxy) is supposed to enforce.

Restrict the DNS rule to the cluster DNS service only — kube-dns / CoreDNS
in kube-system, matched by a single peer combining namespaceSelector
(kubernetes.io/metadata.name=kube-system) AND podSelector (k8s-app=kube-dns).
kube-dns recurses upstream, so legitimate resolution (including the proxy's
own GitHub-hostname lookups) is unaffected; only the "any resolver" breadth
is removed. Factored into a shared dnsEgressRule() helper so the posture
cannot drift between the three policies.

kindnet does not enforce egress NetworkPolicy, so the CI guard is the
authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS,
which asserts each policy's port-53 rule selects kube-dns and is never open.

Docs: network-architecture.md (three policy YAMLs + DNS Resolution prose),
05-security.md (new DNS Exfiltration Side-Channel threat row), and
security-operations.md (operator caveat on the kube-dns label assumption).
Adds a "Tenant egress posture & deliberate widening" section to
security-operations.md: the secure default is controller-managed and not
opt-in; deliberate widening for jobs the proxy can't carry (non-HTTP, internal
services, custom resolver) is done via an additive NetworkPolicy in the tenant
namespace, targeting the existing actions-gateway/component=workload and
actions-gateway/runner-group=<name> labels for tenant-wide or per-runner-type
scope. States the secure-by-default trade-off (additive egress bypasses
per-tenant proxy egress-IP attribution; keep the allowlist narrow; it requires
namespace NetworkPolicy-write, so it is a platform decision), recommends a
CoreDNS forward zone over reopening worker DNS, and points at the existing
spec.proxy.managedNetworkPolicy=false hand-off for the proxy egress rule.
@karlkfi karlkfi force-pushed the claude/determined-maxwell-db62f4 branch from 25bdc30 to 7964ca8 Compare June 14, 2026 21:23
@karlkfi karlkfi merged commit e59dc52 into main Jun 14, 2026
24 checks passed
@karlkfi karlkfi deleted the claude/determined-maxwell-db62f4 branch June 14, 2026 22:17
karlkfi added a commit that referenced this pull request Jun 15, 2026
…Cache (Q136)

Q105 (#228) confined worker/proxy/AGC port-53 egress to the cluster DNS
service via a kube-dns namespaceSelector+podSelector peer. On clusters
running NodeLocal DNSCache (node-local-dns), pods send DNS to a
link-local address (169.254.20.10 by the kube-standard
`__PILLAR__LOCAL__DNS__` convention) served by a per-node hostNetwork
DNSCache pod — which no pod/namespace selector can match. On an enforcing
CNI (Calico/Cilium) that DNS traffic was dropped, breaking resolution for
workers AND the proxy on any NodeLocal-DNSCache cluster — a regression
Q105 introduced.

Add a second OR'd peer to the shared dnsEgressRule() helper: an ipBlock
allowing port-53 to the IPv4 link-local block 169.254.0.0/16, alongside
the existing kube-dns selector. Both paths now work — a cluster without
NodeLocal DNSCache still resolves directly via kube-dns. Because the
helper is shared, this fixes all three per-tenant policies (workload,
AGC, proxy) at once.

This preserves Q105's per-tenant attribution property: 169.254.0.0/16 is
non-routable and node-scoped, so it cannot reach an arbitrary external
resolver — the DNS-exfiltration side-channel Q105 closed stays closed.

kindnet does not enforce egress NetworkPolicy, so the CI guard is the
authoring-level test TestBuildNetworkPolicy_DNSEgressRestrictedToKubeDNS,
extended to assert each policy's port-53 rule now selects BOTH the
kube-dns peer AND the 169.254.0.0/16 ipBlock.

Docs: network-architecture.md (three policy YAMLs + DNS Resolution
prose), 05-security.md (DNS Exfiltration Side-Channel row notes the
link-local peer and why it preserves attribution), security-operations.md
(NodeLocal DNSCache now supported out of the box, replacing the Q105
known-limitation note).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant