Skip to content

fix(cluster): resolve DNS failures on systemd-resolved hosts#516

Open
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:fix/cluster-dns-systemd-resolved
Open

fix(cluster): resolve DNS failures on systemd-resolved hosts#516
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:fix/cluster-dns-systemd-resolved

Conversation

@brianwtaylor
Copy link

@brianwtaylor brianwtaylor commented Mar 21, 2026

Supersedes #478

@drew — reworked per your review: the bootstrap crate now sniffs resolvers and passes them as an UPSTREAM_DNS env var. No system files are mounted into the container.

Summary

  • Sniff upstream DNS resolvers from the Rust bootstrap crate by reading /run/systemd/resolve/resolv.conf (systemd-resolved hosts only)
  • Filter loopback addresses (127.x.x.x, ::1) and pass result to container as UPSTREAM_DNS env var
  • Skip DNS sniffing for remote deploys where local resolvers would be wrong
  • Entrypoint reads UPSTREAM_DNS first, falls back to /etc/resolv.conf for manual launches
  • Add DNS verification logging on failure

Closes #437

Changes

crates/openshell-bootstrap/src/docker.rs — Add resolve_upstream_dns() that reads /run/systemd/resolve/resolv.conf, filters loopback addresses, and returns real upstream resolvers. Pass them as UPSTREAM_DNS env var to the cluster container (skipped for remote deploys). Includes unit tests.

deploy/docker/cluster-entrypoint.sh — Add get_upstream_resolvers() that reads UPSTREAM_DNS env var (priority) or falls back to /etc/resolv.conf. When upstream resolvers are found, write them directly to the k3s resolv.conf instead of relying on DNAT proxy. Improve DNS verification logging on failure.

deploy/docker/tests/test-dns-resolvers.sh — Shell-level tests for the entrypoint resolver logic.

Root Cause

Docker's embedded DNS at 127.0.0.11 is only reachable from the container's own network namespace. The existing DNAT rules forward to this loopback address, but k3s pods run in child network namespaces where the forwarded packets are dropped as martian packets. On systemd-resolved hosts, /etc/resolv.conf contains 127.0.0.53 (another loopback), so the fallback also fails silently.

DNS Flow — Before vs After

BEFORE (broken on systemd-resolved hosts):

  Pod → CoreDNS → resolv.conf → iptables DNAT → Docker DNS
        (cache)   127.0.0.11     PREROUTING     127.0.0.11
                        │                            │
                        └──── FAILS ─────────────────┘
                        loopback DNAT from pod namespace
                        dropped as martian packet

AFTER (this PR):

  Pod → CoreDNS → resolv.conf → upstream resolver → response
        (cache)   e.g. 192.168.1.1    (direct UDP)
                   ▲
                   │
              set by Rust bootstrap:
              resolve_upstream_dns()
              reads /run/systemd/resolve/resolv.conf
              passes via UPSTREAM_DNS env var
              entrypoint writes to k3s resolv.conf

NON-SYSTEMD HOSTS (macOS, WSL2, Alpine) — unchanged:

  Pod → CoreDNS → resolv.conf → iptables DNAT → Docker DNS → host
        (cache)   container IP   PREROUTING     127.0.0.11

  /run/systemd/resolve/resolv.conf absent → UPSTREAM_DNS not set →
  entrypoint falls back to existing DNAT proxy path. Zero behavior change.

Testing

  • Tested on DGX Spark (Ubuntu 24.04, systemd-resolved, Docker with cgroupns=host)
  • Verified DNS resolution works from k3s pods after the fix
  • Verified no behavior change on macOS (Apple Silicon) and Windows/WSL2 hosts

===VALIDATION TOPOLOGY===

              ┌─────────────────────────┐
              │     Node A (Linux)      │
              │    aarch64 · GPU        │
              │                         │
              │  BASELINE + ORCHESTRATOR │
              │  (read-only, runs all   │
              │   tests from here)      │
              └─────┬──────────┬────────┘
                    │          │
      high-speed    │          │ LAN
      interconnect  │          │
                    │          ├──────────────┐
          ┌─────────▼──┐   ┌───▼──────────┐  ┌▼──────────────┐
          │  Node B    │   │  Node C      │  │  Node D       │
          │  Linux     │   │  macOS       │  │  Windows/WSL2 │
          │  aarch64   │   │  Apple Si    │  │  x86_64       │
          │            │   │  no systemd  │  │  no systemd   │
          │ TEST TARGET │   │              │  │               │
          │ (DNS-fixed │   │  CONTROL     │  │  CONTROL      │
          │  gateway   │   │  ✓ verified  │  │  ✓ verified   │
          │  deployed) │   │              │  │               │
          └────────────┘   └──────────────┘  └───────────────┘

==WHAT EACH NODE PROVED DURING VALIDATION===

Node A ─── "Does the fix break anything that already works?"
Captured baseline iptables, TLS certs, and DNS state.
All comparisons showed zero drift.

Node B ─── "Does the new code handle edge-case input safely?"

Node C ─── "Does the fix affect macOS hosts?"
No systemd-resolved → no UPSTREAM_DNS set → no change.
Existing DNAT proxy path untouched.

Node D ─── "Does the fix affect Windows/WSL2 hosts?"
No systemd-resolved → no UPSTREAM_DNS set → no change.
Existing DNAT proxy path untouched.

Automated Tests

cargo test -p openshell-bootstrap

Checklist

Docker's embedded DNS at 127.0.0.11 is only reachable from the
container's own network namespace. k3s pods in child namespaces
cannot reach it, causing silent DNS failures on Ubuntu and other
systemd-resolved hosts where /etc/resolv.conf contains 127.0.0.53.

Sniff upstream DNS resolvers from the host in the Rust bootstrap
crate by reading /run/systemd/resolve/resolv.conf (systemd-resolved
only — intentionally does NOT read /etc/resolv.conf to avoid
bypassing Docker Desktop's DNAT proxy on macOS/Windows). Filter
loopback addresses (127.x.x.x and ::1) and pass the result to
the container as the UPSTREAM_DNS env var. Skip DNS sniffing for
remote deploys where the local host's resolvers would be wrong.

The entrypoint checks UPSTREAM_DNS first, falling back to
/etc/resolv.conf inside the container for manual launches. This
follows the existing pattern used by registry config, SSH gateway,
GPU support, and image tags.

Closes NVIDIA#437

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
@brianwtaylor brianwtaylor requested a review from a team as a code owner March 21, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DNS proxy in cluster-entrypoint.sh fails silently on Linux with systemd-resolved

1 participant