Skip to content

feat(vm): dm-snapshot CoW with jailer mode support#208

Open
lucas77778 wants to merge 7 commits intomasterfrom
feat/dm-snapshot-jailer
Open

feat(vm): dm-snapshot CoW with jailer mode support#208
lucas77778 wants to merge 7 commits intomasterfrom
feat/dm-snapshot-jailer

Conversation

@lucas77778
Copy link
Copy Markdown
Member

Summary

  • Add dm-snapshot based block-level CoW for sandbox rootfs, eliminating full rootfs copies on sandbox creation
  • Wire dm-snapshot into the sandbox lifecycle with automatic teardown on stop/remove
  • Support jailer mode by creating block device nodes (mknod) in the jailer chroot pointing to dm-snapshot devices
  • Fix Firecracker jailer panic caused by missing CPU cache topology in Virtualization.framework guest sysfs (synthesise size, coherency_line_size, number_of_sets via bind mounts)
  • Mount /var without nodev so jailer chroot block device nodes are accessible
  • Enable jailer as the default sandbox execution mode

Performance

Single sandbox (1GB rootfs, 1536MB memory): 0.6s to ready
3 concurrent sandboxes (same config): 1.36s all ready
Guest-side boot time (jailer spawn → FC config → microVM ready): ~13ms

Test plan

  • Verified missing sysfs files in guest VM via docker run --privileged --pid=host
  • Confirmed jailer panic on missing /sys/devices/system/cpu/cpu0/cache/index0/size
  • Confirmed cache fixup bind mounts applied correctly after agent restart
  • Confirmed /var tmpfs nodev was root cause of block device Permission denied
  • Created and ran sandbox in jailer mode successfully (sandbox run -- uname -a)
  • Benchmarked 1 and 3 concurrent sandbox creation with 1GB rootfs

Introduce `CowManager` in `snapshot_cow.rs` that uses Linux dm-snapshot
to share a single read-only template ext4 image across sandboxes. Each
sandbox gets a sparse COW file; only written blocks consume disk space.

Key design decisions:
- Uses `std::process::Command` + `spawn_blocking` instead of
  `tokio::process::Command` to avoid SIGCHLD conflicts with the PID-1
  reaper in arcbox-agent.
- Two-step busybox `losetup -f` + `losetup DEV FILE` serialized via
  `AsyncMutex` to prevent TOCTOU races on concurrent sandbox creation.
- Synchronous `cleanup_stale_sync()` runs at init to remove orphaned
  dm devices and COW files from previous crashes.
- Template loop devices are refcounted; detached when last sandbox using
  that template is removed.
In direct mode (jailer: None), `do_boot()` now calls
`cow_manager.setup()` to create a dm-snapshot device and passes
`/dev/mapper/arcbox-snap-{id}` to Firecracker as the rootfs block
device. On failure, falls back to using the rootfs image directly.

`remove_sandbox_impl()` tears down the dm-snapshot after the
Firecracker process exits. TTL expiry and restore paths also pass
the cow_manager through for proper cleanup.

Jailer mode and checkpoint/restore are unchanged (Phase 2/3).
The default EnvFilter only allowed `arcbox_agent=info`, which silently
dropped all logs from the `arcbox_vm` crate (including dm-snapshot
setup/teardown). Add `arcbox_vm=info` so CowManager diagnostics are
visible in agent.log.
- teardown(): only delete COW file after both dm device removal and
  loop detach succeed; avoids unlinking a still-referenced backing file
  that would delay space reclamation.
- Downgrade dm-snapshot fallback log from warn to debug in do_boot();
  CowManager::new() already warns once at init when dmsetup is missing,
  so per-sandbox warnings are redundant noise.
Instead of copying the full rootfs into the jailer chroot, create a
block device node (mknod) pointing to the dm-snapshot device. The
Firecracker jailer preserves pre-existing files in the chroot after
pivot_root, so the device node is accessible to FC.

Flow: cow_manager.setup() → stat dm device for major:minor →
mknod {chroot}/rootfs.ext4 b major minor → chown to jailer uid:gid.

Split stage_files_for_jailer() into stage_kernel_for_jailer() and
stage_rootfs_copy_for_jailer() so the kernel copy can be reused
independently. Add stage_rootfs_device_for_jailer() for the mknod
path. Falls back to full rootfs copy if dm-snapshot or mknod fails.

New public helpers in snapshot_cow.rs: device_major_minor() and
mknod_blkdev().
Firecracker's jailer panics when /sys/devices/system/cpu/cpu0/cache/
index{N}/size is missing — Virtualization.framework omits size,
coherency_line_size, and number_of_sets from the guest sysfs.

Fix by synthesising the missing cache attributes via bind mounts
during PID 1 init, and mount /var without nodev so the jailer can
open mknod'd block device nodes in its chroot.

- init.rs: add ensure_cpu_cache_topology() to fill in missing sysfs
  cache files with reasonable ARM64 defaults via bind mount
- init.rs: mount /var with mount_tmpfs_dev() (no nodev) so jailer
  chroot block device nodes are accessible
- config.rs: enable jailer config with chroot at /var/lib/arcbox/jailer
- sandbox.rs: create jailer chroot base dir in SandboxManager::new()
Copilot AI review requested due to automatic review settings April 7, 2026 15:31

This comment was marked as outdated.

@lucas77778 lucas77778 requested review from AprilNEA and PeronGH April 8, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants