-
Notifications
You must be signed in to change notification settings - Fork 49
fix:add container veth discovery to replace hardcoded eth0 #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix:add container veth discovery to replace hardcoded eth0 #392
Conversation
✅ Deploy Preview for urunc canceled.
|
abb1215 to
5d6bddc
Compare
4e84f6d to
418b9c9
Compare
1faaa38 to
35f05d3
Compare
- Add `discoverContainerIface` in `pkg/network/network.go` to discover the container-side veth endpoint by scanning links in the current netns. - Replace places that relied on the hardcoded `DefaultInterface` with the discovered link. This implements interface discovery instead of assuming `eth0`, improving compatibility across different runtimes/CNIs. Fixes urunc-dev#14 Signed-off-by: sidneychang <2190206983@qq.com>
cmainas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @sidneychang ,
thank you for this contribution. May I ask the reasoning of the algorithm to discover the container interface?
I am fine with the changes and they seem to work, but I am not sure if these changes cover all cases of interfaces inside a network namespace.
There is also another corner case that we need to test manually and verify the correctness of this PR. In kubernetes setups when a pod is getting restarted, the network namespace (created by the pause container) remains active and hence the tap0_urunc device still exists. Therefore, when urunc (re)creates the container it identifies the tap0_urunc device and it does not recreates it. So the suggested changes will find out the tap0_urunc interface and as far as I understand it will not be selected, since it will not have a default route. However, it would be nice to verify this assumption with a small test.
Thanks for the review. In Kubernetes or container network model, the CNI plugin creates the veth interface inside the Pod network namespace (typically eth0), assigns it an IP address, and configures the default gateway and default route. As documented by common CNI implementations (e.g., AWS VPC CNI and Multus), the default route is normally attached to eth0, which makes it a reliable semantic indicator of the primary egress interface. So I tried to find a netlink with default route to replace hardencoded 'eth0'. I also validated the Kubernetes restart scenario. I created a urnc pod in my kubernetes cluster. After force-killing the QEMU process of a urunc Pod, Kubernetes restarts the Pod while preserving the Pod-level network namespace. In this case, the tap device created by the previous instance still exists. During restart, the network setup fails due to the existing logic: This treats any pre-existing tap as an attempt to spawn multiple unikernels in the same network namespace and aborts network setup, resulting in: This behavior is not introduced by the current changes—the same error can be reproduced with the original implementation as well. The issue is therefore related to tap reuse in preserved network namespaces during Pod restarts, rather than to interface discovery or default-route-based selection of eth0. I will continue to follow up on this issue, further investigate whether it has any potential side effects or broader impact, and try to address it accordingly.:) References: AWS VPC CNI: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md Multus CNI: https://k8snetworkplumbingwg.github.io/multus-cni/docs/how-to-use.html |
discoverContainerIfaceinpkg/network/network.goto discover the container-side veth endpoint by scanning links in the current netns.DefaultInterfacewith the discovered link.This implements interface discovery instead of assuming
eth0, improving compatibility across different runtimes/CNIs.Fixes #14