feat(vm): dm-snapshot CoW with jailer mode support#208
Open
lucas77778 wants to merge 7 commits intomasterfrom
Open
feat(vm): dm-snapshot CoW with jailer mode support#208lucas77778 wants to merge 7 commits intomasterfrom
lucas77778 wants to merge 7 commits intomasterfrom
Conversation
Introduce `CowManager` in `snapshot_cow.rs` that uses Linux dm-snapshot to share a single read-only template ext4 image across sandboxes. Each sandbox gets a sparse COW file; only written blocks consume disk space. Key design decisions: - Uses `std::process::Command` + `spawn_blocking` instead of `tokio::process::Command` to avoid SIGCHLD conflicts with the PID-1 reaper in arcbox-agent. - Two-step busybox `losetup -f` + `losetup DEV FILE` serialized via `AsyncMutex` to prevent TOCTOU races on concurrent sandbox creation. - Synchronous `cleanup_stale_sync()` runs at init to remove orphaned dm devices and COW files from previous crashes. - Template loop devices are refcounted; detached when last sandbox using that template is removed.
In direct mode (jailer: None), `do_boot()` now calls
`cow_manager.setup()` to create a dm-snapshot device and passes
`/dev/mapper/arcbox-snap-{id}` to Firecracker as the rootfs block
device. On failure, falls back to using the rootfs image directly.
`remove_sandbox_impl()` tears down the dm-snapshot after the
Firecracker process exits. TTL expiry and restore paths also pass
the cow_manager through for proper cleanup.
Jailer mode and checkpoint/restore are unchanged (Phase 2/3).
The default EnvFilter only allowed `arcbox_agent=info`, which silently dropped all logs from the `arcbox_vm` crate (including dm-snapshot setup/teardown). Add `arcbox_vm=info` so CowManager diagnostics are visible in agent.log.
- teardown(): only delete COW file after both dm device removal and loop detach succeed; avoids unlinking a still-referenced backing file that would delay space reclamation. - Downgrade dm-snapshot fallback log from warn to debug in do_boot(); CowManager::new() already warns once at init when dmsetup is missing, so per-sandbox warnings are redundant noise.
Instead of copying the full rootfs into the jailer chroot, create a
block device node (mknod) pointing to the dm-snapshot device. The
Firecracker jailer preserves pre-existing files in the chroot after
pivot_root, so the device node is accessible to FC.
Flow: cow_manager.setup() → stat dm device for major:minor →
mknod {chroot}/rootfs.ext4 b major minor → chown to jailer uid:gid.
Split stage_files_for_jailer() into stage_kernel_for_jailer() and
stage_rootfs_copy_for_jailer() so the kernel copy can be reused
independently. Add stage_rootfs_device_for_jailer() for the mknod
path. Falls back to full rootfs copy if dm-snapshot or mknod fails.
New public helpers in snapshot_cow.rs: device_major_minor() and
mknod_blkdev().
Firecracker's jailer panics when /sys/devices/system/cpu/cpu0/cache/
index{N}/size is missing — Virtualization.framework omits size,
coherency_line_size, and number_of_sets from the guest sysfs.
Fix by synthesising the missing cache attributes via bind mounts
during PID 1 init, and mount /var without nodev so the jailer can
open mknod'd block device nodes in its chroot.
- init.rs: add ensure_cpu_cache_topology() to fill in missing sysfs
cache files with reasonable ARM64 defaults via bind mount
- init.rs: mount /var with mount_tmpfs_dev() (no nodev) so jailer
chroot block device nodes are accessible
- config.rs: enable jailer config with chroot at /var/lib/arcbox/jailer
- sandbox.rs: create jailer chroot base dir in SandboxManager::new()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
size,coherency_line_size,number_of_setsvia bind mounts)/varwithoutnodevso jailer chroot block device nodes are accessiblePerformance
Single sandbox (1GB rootfs, 1536MB memory): 0.6s to ready
3 concurrent sandboxes (same config): 1.36s all ready
Guest-side boot time (jailer spawn → FC config → microVM ready): ~13ms
Test plan
docker run --privileged --pid=host/sys/devices/system/cpu/cpu0/cache/index0/size/vartmpfsnodevwas root cause of block device Permission deniedsandbox run -- uname -a)