Skip to content

Gateway fails to start on Jetson Orin / L4T: kube-proxy exits with iptables "Could not fetch rule set generation id" when host uses iptables-legacy #467

@tangc3022-hub

Description

@tangc3022-hub

Agent Diagnostic

  • Investigated gateway startup failure on Jetson Orin (tegra-ubuntu, L4T).
  • Context discovered: On Orin, veth is missing or unusable, so Docker cannot bring up the gateway with default bridge/NAT. This repo uses host network (network_mode: host) for the gateway to work around that. After using host network, the iptables incompatibility appears.
  • Root cause: Host kernel uses iptables-legacy tables (L4T / Ubuntu 20.04). The gateway cluster image (Ubuntu 24.04–based) ships iptables-nft as default. kube-proxy inside the container runs in the same kernel (host network) and invokes the container’s iptables (nft); it cannot read the host’s legacy nat table → "Could not fetch rule set generation id: Invalid argument", kube-proxy exits, k3s shuts down.
  • Skills / checks performed:
    • Confirmed failure pattern in container logs: kube-proxy exited: iptables is not available on this host, # Warning: iptables-legacy tables present, use iptables-legacy to see them, iptables v1.8.10 (nf_tables): Could not fetch rule set generation id.
    • Verified on host: update-alternatives --display iptables shows link currently points to /usr/sbin/iptables-legacy.
    • Tried switching host to iptables-nft to align with container: Docker then fails when creating the gateway network with the same iptables error (Docker daemon calls iptables for NAT; kernel nat table remains legacy on L4T, so nft still fails). Conclusion: on L4T the host must stay on iptables-legacy; only the container should use legacy.
  • Mitigations implemented in our fork: (1) Add --prefer-bundled-bin to k3s server args in CLI so k3s uses bundled iptables. (2) In cluster image entrypoint, before exec k3s, run update-alternatives --set iptables /usr/sbin/iptables-legacy (and ip6tables) when available. (3) Add iptables-specific failure diagnosis so the error is not misreported as "Network connectivity issue".

Description

What happened:
On Jetson Orin (tegra-ubuntu, L4T), openshell gateway start --name nemoclaw --port 30051 fails. The gateway container starts then exits. Container logs show kube-proxy exiting with: iptables is not available on this host : error listing chain "POSTROUTING" in table "nat": exit status 4: # Warning: iptables-legacy tables present, use iptables-legacy to see them and iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument. The CLI then reports "K8s namespace not ready" and "kube-proxy failed: iptables incompatible with host (e.g. Jetson L4T)".

What we expected:
Gateway should start and remain running on Jetson L4T when using host network (as required there due to missing veth / Docker NAT limitations). kube-proxy should either use a compatible iptables backend or the default cluster image should switch to iptables-legacy inside the container on such hosts.

Why it matters:
Jetson Orin / L4T is a supported deployment target; many users need to run the gateway with host network. Without a fix in upstream OpenShell (e.g. entrypoint switching to iptables-legacy when present, or --prefer-bundled-bin for k3s), users must rebuild the cluster image or the CLI from a fork.

Reproduction Steps

  1. Set up a Jetson Orin (or similar L4T device) with Docker, e.g. nvidia@tegra-ubuntu.
  2. Ensure host uses iptables-legacy (default on L4T):
    sudo update-alternatives --display iptables → link currently points to /usr/sbin/iptables-legacy.
  3. Install OpenShell CLI (e.g. from offline bundle or build).
  4. Run:
    openshell gateway start --name nemoclaw --port 30051
  5. Observe: "✓ Checking Docker", "✓ Downloading gateway", "x Initializing environment", then "Gateway failed: nemoclaw" with message "kube-proxy failed: iptables incompatible with host (e.g. Jetson L4T)" and container logs containing the iptables-legacy / nf_tables error above.
  6. (Optional) If host is switched to iptables-nft, reproduce Docker network failure: "failed to create Docker network" with "Failed to Setup IP tables ... Could not fetch rule set generation id", so host must remain on legacy.

Environment

  • OS: drive Orin, tegra-ubuntu (L4T; Ubuntu 20.04–based)
  • Docker: Docker Engine on L4T (version as reported by docker version on device)
  • OpenShell: v0.x.x (output of openshell --version from the build or offline bundle used)
  • Network: Gateway configured to use host network (port 30051) due to missing veth / Docker NAT limitations on this platform.
  • iptables (host): update-alternatives → iptables currently points to /usr/sbin/iptables-legacy; kernel nat table is legacy. Container image (cluster) provides iptables-nft by default.
  • Relevant logs: Container stderr shows kube-proxy exit with iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument and "iptables-legacy tables present".

Logs

nvidia@tegra-ubuntu:~/nemoclaw-offline-20260319$ ~/.local/bin/openshell gateway start --name nemoclaw --port 30051
✓ Checking Docker
✓ Downloading gateway
x Initializing environment                                                                                                                                                                                  x Gateway failed: nemoclaw

Network connectivity issue

  Could not reach the container registry. This could be a DNS resolution failure, firewall blocking the connection, or general internet connectivity issue.

  To fix:

  1. Check your internet connection

  2. Test DNS resolution

     nslookup ghcr.io

  3. Test registry connectivity

     curl -I https://ghcr.io/v2/

  4. If behind a corporate firewall/proxy, ensure Docker is configured to use it

  5. Restart Docker and try again

Error:   × K8s namespace not ready
  ╰─▶ gateway container is not running while waiting for namespace 'openshell': container exited (status=EXITED, exit_code=0)

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions