Describe the Bug
Summary
When a Newt site registers with an endpoint IP that falls inside a subnet the CLI also routes through the WireGuard tunnel, the holepunch monitor's test packets are routed through the tunnel itself (via the relay), receive responses, and are misclassified as successful direct holepunches. The CLI then switches the peer to direct mode, the data plane fails because the registered endpoint isn't reachable over the underlying network, and the connection cycles between relay and broken-direct indefinitely.
The --holepunch=false flag is documented but does not stop this behavior — only the initial rapid startup test honors the flag, while the ongoing holepunch monitor continues to run and trigger the cycle.
Topology
- Two Newt sites configured on the Pangolin server:
- Site 3 registered with the office WAN IP — works fine, stays on relay, no flap
- Site 6 registered with a LAN IP
10.1.10.16:63047 — flaps continuously
- Site 6's LAN IP got registered automatically because the Newt host shares a LAN with the Pangolin server, so gerbil saw the registration arrive from the LAN-side source IP. No explicit endpoint advertisement was set on Newt — this is default-configuration behavior.
- The CLI is running on a roaming client on a completely separate network (
10.0.120.0/24) with no underlying-network path to either the Pangolin server's WAN IP or to 10.1.10.16.
- The CLI installs a route for
10.1.0.0/16 via the pangolin interface, which covers the misregistered Site 6 endpoint.
Environment
- Pangolin CLI version: 0.6.2 (latest at time of filing)
- OS: Ubuntu 24.04 LTS, x86_64 (Surface Pro 9, linux-surface kernel)
- Pangolin server version: 1.17.0
- Newt version on peer site: 1.11.0, native binary running as systemd service, no explicit endpoint flag (default args:
--id <redacted> --secret <redacted> --endpoint https://pangolin.example.com)
To Reproduce
- Configure a Newt site such that its registered endpoint is a private IP. This happens automatically when the Newt host shares a LAN with the Pangolin server.
- Configure a resource that pushes a route covering that private IP through the tunnel (e.g.
10.1.0.0/16).
- From a network with no underlying path to either the Pangolin server's WAN or the registered LAN IP, run
pangolin up (or pangolin up --holepunch=false).
- Run
pangolin logs client --follow and observe the cycle.
Expected Behavior
Expected behavior
Holepunch test fails because there is no underlying-network path to the endpoint. Peer stays on relay. Connection is stable.
Additionally, --holepunch=false should disable holepunch attempts entirely, including the ongoing monitor.
Actual behavior
The startup rapid-test correctly fails (it tests via the underlying socket before the tunnel is up). But once the tunnel is up and the relay is working, the holepunch connection monitor sends test packets to the registered endpoint via the standard kernel routing table. Those packets match the 10.1.0.0/16 tunnel route, traverse the tunnel via the relay, reach the legitimate Newt peer (which actually is at 10.1.10.16 on its own LAN), and get a response. The monitor reports "CONNECTED" with a relay-roundtrip RTT and switches the peer to direct mode.
The data plane then fails because the WireGuard endpoint can't be reached over the underlying network. After ~5 seconds of timeout, the CLI falls back to relay. The monitor immediately succeeds again (same recursive path), switches back to direct, fails again. The cycle repeats indefinitely every 15-30 seconds.
This happens whether --holepunch is true or false.
Describe the Bug
Summary
When a Newt site registers with an endpoint IP that falls inside a subnet the CLI also routes through the WireGuard tunnel, the holepunch monitor's test packets are routed through the tunnel itself (via the relay), receive responses, and are misclassified as successful direct holepunches. The CLI then switches the peer to direct mode, the data plane fails because the registered endpoint isn't reachable over the underlying network, and the connection cycles between relay and broken-direct indefinitely.
The
--holepunch=falseflag is documented but does not stop this behavior — only the initial rapid startup test honors the flag, while the ongoing holepunch monitor continues to run and trigger the cycle.Topology
10.1.10.16:63047— flaps continuously10.0.120.0/24) with no underlying-network path to either the Pangolin server's WAN IP or to10.1.10.16.10.1.0.0/16via thepangolininterface, which covers the misregistered Site 6 endpoint.Environment
--id <redacted> --secret <redacted> --endpoint https://pangolin.example.com)To Reproduce
10.1.0.0/16).pangolin up(orpangolin up --holepunch=false).pangolin logs client --followand observe the cycle.Expected Behavior
Expected behavior
Holepunch test fails because there is no underlying-network path to the endpoint. Peer stays on relay. Connection is stable.
Additionally,
--holepunch=falseshould disable holepunch attempts entirely, including the ongoing monitor.Actual behavior
The startup rapid-test correctly fails (it tests via the underlying socket before the tunnel is up). But once the tunnel is up and the relay is working, the holepunch connection monitor sends test packets to the registered endpoint via the standard kernel routing table. Those packets match the
10.1.0.0/16tunnel route, traverse the tunnel via the relay, reach the legitimate Newt peer (which actually is at10.1.10.16on its own LAN), and get a response. The monitor reports "CONNECTED" with a relay-roundtrip RTT and switches the peer to direct mode.The data plane then fails because the WireGuard endpoint can't be reached over the underlying network. After ~5 seconds of timeout, the CLI falls back to relay. The monitor immediately succeeds again (same recursive path), switches back to direct, fails again. The cycle repeats indefinitely every 15-30 seconds.
This happens whether
--holepunchis true or false.