TL;DR checkProcMount() won't let me mount /proc/sys/net as read-write.
I'm trying to run a libvirt KVM VM inside of a docker container without using --privileged. I've worked around a lot of other errors by:
- Adding
/dev/kvm and /dev/net/tun devices
- Granting
CAP_NET_ADMIN (safe: net-namespaced)
- Mounting
/sys/fs/cgroup/* read-write (safe?)
- Mounting
/sys/devices/virtual/net read-write (safe: net-namespaced)
But there's one error I can't work around:
libvirt.libvirtError: cannot write to /proc/sys/net/ipv6/conf/virbr2/disable_ipv6 to enable/disable IPv6 on bridge virbr2: Read-only file system
What I would like to do is allow /proc/sys/net to be mounted read-write inside of the container. My understanding is that this is safe because everything in that subdirectory is net-namespaced, so a container can't affect the host net ns. (I would have to audit some kernel code to be sure, but it's certainly better than --privileged).
The problem is that checkProcMount() won't let me:
$ docker version
Client: Docker Engine - Community
Version: 20.10.3
API version: 1.41
Go version: go1.13.15
Git commit: 48d30b5
Built: Fri Jan 29 14:33:25 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.3
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 46229ca
Built: Fri Jan 29 14:31:38 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker run --rm -it -v "/proc/sys/net:/proc/sys/net:rw" debian:10
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:59: mounting "/proc/sys/net" to rootfs at "/proc/sys/net" caused: "/var/lib/docker/overlay2/9fd477a20091dd5d9babf5ce2bddb8d517349c89ba9a4e6c7f74f275f0c370c9/merged/proc/sys/net" cannot be mounted because it is inside /proc: unknown.
My only alternative to --privileged is granting CAP_SYS_ADMIN (for mount(2)) and remounting /proc/sys inside the container. This is a horrible alternative because:
CAP_SYS_ADMIN is terribly overloaded
/proc/sys has lots of kernel global options which aren't namespaced
TL;DR
checkProcMount()won't let me mount/proc/sys/netas read-write.I'm trying to run a libvirt KVM VM inside of a docker container without using
--privileged. I've worked around a lot of other errors by:/dev/kvmand/dev/net/tundevicesCAP_NET_ADMIN(safe: net-namespaced)/sys/fs/cgroup/*read-write (safe?)/sys/devices/virtual/netread-write (safe: net-namespaced)But there's one error I can't work around:
What I would like to do is allow
/proc/sys/netto be mounted read-write inside of the container. My understanding is that this is safe because everything in that subdirectory is net-namespaced, so a container can't affect the host net ns. (I would have to audit some kernel code to be sure, but it's certainly better than--privileged).The problem is that
checkProcMount()won't let me:My only alternative to
--privilegedis grantingCAP_SYS_ADMIN(formount(2)) and remounting/proc/sysinside the container. This is a horrible alternative because:CAP_SYS_ADMINis terribly overloaded/proc/syshas lots of kernel global options which aren't namespaced