Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 83 additions & 11 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,17 +221,89 @@ Recursive nesting (Host → L1 → L2 → ...) is enabled via the `arm64.nv2` ke
- **Host kernel**: 6.18+ with `kvm-arm.mode=nested` AND DSB patches
- **Nested kernel**: Custom kernel with CONFIG_KVM=y (use `--kernel-profile nested`)

### Host Kernel with DSB Patches
### Kernel Patch Layout

```
kernel/
├── 0001-fuse-add-remap_file_range-support.patch # Universal (symlinked down)
├── host/
│ └── arm64/
│ ├── 0001-fuse-*.patch -> ../../ # symlink
│ └── nv2-mmio-barrier.patch # DSB before ioeventfd in io_mem_abort()
├── nested/
│ └── arm64/
│ ├── 0001-fuse-*.patch -> ../../ # symlink
│ ├── nv2-vsock-cache-sync.patch # DSB at kvm_nested_sync_hwstate()
│ ├── nv2-vsock-dcache-flush.patch # Cache flush in vsock TX
│ ├── nv2-vsock-rx-barrier.patch # DSB before virtqueue read
│ ├── nv2-virtio-kick-barrier.patch # Flush vring before notify
│ ├── mmfr4-override.vm.patch # ID register override
│ ├── psci-debug-handle-exit.patch # PSCI debug logging
│ └── psci-debug-psci.patch # PSCI debug logging
├── nested.conf
└── nested-x86.conf
```

**Principle**: Put patches at highest level where they apply, symlink down.

### Kernel Patch Management (stgit)

Patches are managed with **stgit** (Stacked Git) in `~/linux` for automatic line number updates.

**CRITICAL**: Both host AND guest kernels need DSB patches for cache coherency under NV2.
**Branches:**
- `fcvm-host`: v6.18 + FUSE patch + host DSB barrier
- `fcvm-nested`: v6.18 + all nested patches

**Editing a patch:**
```bash
cd ~/linux
git checkout fcvm-nested
# Make changes to source files
stg refresh # Updates current patch
```

**Adding a new patch:**
```bash
stg new my-fix -m "Fix something"
# Make changes
stg refresh
```

**Exporting to fcvm:**
```bash
stg export -d /home/ubuntu/fcvm/kernel/nested/arm64/
# For host:
git checkout fcvm-host
stg export -d /home/ubuntu/fcvm/kernel/host/arm64/
```

**Rebasing when kernel version changes:**
```bash
git fetch origin tag v6.19
stg rebase v6.19 # Auto-adjusts line numbers
stg export -d /home/ubuntu/fcvm/kernel/nested/arm64/
```

**Sparse checkout:** The ~/linux repo uses sparse checkout. Add directories as needed:
```bash
git sparse-checkout add drivers/virtio net/vmw_vsock
```

### Host Kernel with DSB Patches

**Install host kernel**: `make install-host-kernel` (builds kernel, installs to /boot, updates GRUB).
Patches from `kernel/patches/` are applied automatically during the build.
Patches from `kernel/host/arm64/` are applied automatically.

**Host patches** (L0 bare metal):
- `nv2-mmio-barrier.patch`: DSB SY before ioeventfd signaling in io_mem_abort()

**Current patches** (all apply to both host and guest kernels):
- `nv2-vsock-cache-sync.patch`: DSB SY in `kvm_nested_sync_hwstate()`
- `nv2-vsock-rx-barrier.patch`: DSB SY in `virtio_transport_rx_work()`
- `mmfr4-override.vm.patch`: ID register override for recursive nesting (guest only)
**Nested patches** (L1 guest VM):
- `nv2-vsock-cache-sync.patch`: DSB SY in kvm_nested_sync_hwstate() after nested exit
- `nv2-vsock-dcache-flush.patch`: Cache flush in vsock TX path for NV2
- `nv2-vsock-rx-barrier.patch`: DSB SY before reading virtqueue in RX path
- `nv2-virtio-kick-barrier.patch`: Flush vring cache + DSB+ISB before virtqueue_notify()
- `mmfr4-override.vm.patch`: ID register override for recursive nesting
- `psci-debug-*.patch`: Debug logging for PSCI shutdown (temporary)

**VM Graceful Shutdown (PSCI)**:
- fc-agent uses `poweroff -f` to trigger PSCI SYSTEM_OFF (function ID 0x84000008)
Expand Down Expand Up @@ -301,7 +373,7 @@ make test-root FILTER=kvm

1. Added `arm64.nv2` alias for `id_aa64mmfr4.nv_frac=2` (NV2_ONLY)
2. Changed `FTR_LOWER_SAFE` to `FTR_HIGHER_SAFE` for MMFR4 to allow upward overrides
3. Kernel patch: `kernel/patches/mmfr4-override.patch`
3. Kernel patch: `kernel/nested/arm64/mmfr4-override.vm.patch`

**Why it's safe**: The host KVM *does* provide NV2 emulation - we're just fixing the guest's
view of this capability. We're not faking a feature, we're correcting a visibility issue.
Expand Down Expand Up @@ -337,7 +409,7 @@ From [`arch/arm64/kvm/arch_timer.c`](https://github.com/torvalds/linux/blob/mast
issues due to double Stage 2 translation (L2 GPA → L1 S2 → L1 HPA → L0 S2 → physical). Large writes
that fragment into multiple vsock packets may see stale/zero data instead of actual content.

**Fix**: The DSB SY kernel patch in `kernel/patches/nv2-vsock-cache-sync.patch` fixes this issue.
**Fix**: The DSB SY kernel patch in `kernel/nested/arm64/nv2-vsock-cache-sync.patch` fixes this issue.
The patch adds a full system data synchronization barrier in `kvm_nested_sync_hwstate()` to ensure
L2's writes are visible to L1's reads before returning from the nested guest exit handler.

Expand Down Expand Up @@ -1322,9 +1394,9 @@ Key config fields in `[kernel_profiles.nested.arm64]`:
```toml
kernel_version = "6.18.3" # Version to download/build
kernel_repo = "ejc3/fcvm" # GitHub repo for releases
build_inputs = ["kernel/nested.conf", "kernel/patches/*.patch"] # Files for SHA
build_inputs = ["kernel/nested.conf", "kernel/nested/arm64/*.patch"] # Files for SHA
kernel_config = "kernel/nested.conf" # Kernel .config
patches_dir = "kernel/patches" # Directory with patches
patches_dir = "kernel/nested/arm64" # Directory with patches
```

**Creating/Editing Kernel Patches:**
Expand Down
8 changes: 7 additions & 1 deletion .config/nextest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ slow-timeout = { period = "600s", terminate-after = 1 }

# VM tests get 10 minute timeout (non-snapshot tests)
[[profile.default.overrides]]
filter = "package(fcvm) & test(/test_/) & !test(/stress_100/) & !test(/pjdfstest_vm/) & !test(/snapshot/) & !test(/clone/)"
filter = "package(fcvm) & test(/test_/) & !test(/stress_100/) & !test(/pjdfstest_vm/) & !test(/snapshot/) & !test(/clone/) & !test(/nested/)"
test-group = "vm-tests"
slow-timeout = { period = "600s", terminate-after = 1 }

Expand All @@ -83,6 +83,12 @@ filter = "package(fcvm) & test(/pjdfstest_vm/)"
test-group = "vm-tests"
slow-timeout = { period = "900s", terminate-after = 1 }

# Nested tests need 30 minutes (VM inside VM is very slow)
[[profile.default.overrides]]
filter = "package(fcvm) & test(/nested/)"
test-group = "vm-tests"
slow-timeout = { period = "1800s", terminate-after = 1 }

# fuse-pipe tests can run with full parallelism
[[profile.default.overrides]]
filter = "package(fuse-pipe)"
Expand Down
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 22 additions & 4 deletions Containerfile.nested
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
# Build:
# cp target/release/fcvm target/release/fc-agent artifacts/
# cp /mnt/fcvm-btrfs/firecracker/firecracker-nested-*.bin artifacts/firecracker-nested
# # Pre-pull nginx image for faster nested tests:
# podman pull public.ecr.aws/nginx/nginx:alpine
# podman save -o artifacts/nginx-alpine.tar public.ecr.aws/nginx/nginx:alpine
# sudo podman build -t localhost/nested-test -f Containerfile.nested .
#
# The nested test is driven by test_nested_chain tests which:
Expand Down Expand Up @@ -48,7 +51,22 @@ COPY rootfs-config.toml /etc/fcvm/rootfs-config.toml
COPY nested.sh /usr/local/bin/nested
RUN chmod +x /usr/local/bin/nested

# Default command - create runtime dirs, start nginx (for health checks), and sleep
# /run/netns is needed for ip netns (bridged networking)
# /run/containers/storage is needed for podman
CMD ["sh", "-c", "mkdir -p /run/netns /run/containers/storage && nginx && sleep infinity"]
# Pre-pulled container images for faster nested tests (avoids FUSE pull overhead)
# These get loaded into podman storage at container startup
COPY artifacts/nginx-alpine.tar /var/lib/fcvm-images/nginx-alpine.tar

# Startup script that loads pre-pulled images and starts services
RUN printf '%s\n' \
'#!/bin/bash' \
'mkdir -p /run/netns /run/containers/storage' \
'# Load pre-pulled images if not already present' \
'if ! podman image exists public.ecr.aws/nginx/nginx:alpine 2>/dev/null; then' \
' echo "Loading pre-pulled nginx image..."' \
' podman load -i /var/lib/fcvm-images/nginx-alpine.tar 2>/dev/null || true' \
'fi' \
'nginx' \
'exec sleep infinity' \
> /usr/local/bin/entrypoint.sh && chmod +x /usr/local/bin/entrypoint.sh

# Default command - load images, start nginx (for health checks), and sleep
CMD ["/usr/local/bin/entrypoint.sh"]
19 changes: 19 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -463,10 +463,29 @@ setup-fcvm: setup-default
@echo "==> Running fcvm setup --kernel-profile btrfs..."
./target/release/fcvm setup --kernel-profile btrfs --build-kernels

# Setup nested profile (kernel + firecracker for running VMs inside VMs)
setup-nested: build setup-btrfs
sudo ./target/release/fcvm setup --kernel-profile nested --build-kernels

# Build and install host kernel with all patches from kernel/patches/
# Requires reboot to activate the new kernel
install-host-kernel: build setup-btrfs
sudo ./target/release/fcvm setup --kernel-profile nested --build-kernels --install-host-kernel
@$(MAKE) verify-grub

# Verify grub.cfg matches /etc/default/grub (catches manual edits)
verify-grub:
@EXPECTED=$$(grep '^GRUB_DEFAULT=' /etc/default/grub 2>/dev/null | cut -d'"' -f2); \
ACTUAL=$$(sudo grep 'set default=' /boot/grub/grub.cfg 2>/dev/null | grep -v next_entry | head -1 | cut -d'"' -f2); \
if [ "$$EXPECTED" != "$$ACTUAL" ]; then \
echo "ERROR: grub.cfg out of sync with /etc/default/grub"; \
echo " Expected: $$EXPECTED"; \
echo " Actual: $$ACTUAL"; \
echo " Fix with: sudo update-grub"; \
exit 1; \
else \
echo "✓ GRUB configured for: $$EXPECTED"; \
fi

# Run setup inside container (for CI - container has Firecracker)
container-setup-fcvm: container-build setup-btrfs
Expand Down
3 changes: 3 additions & 0 deletions fuse-pipe/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ fuser = { git = "https://github.com/ejc3/fuser.git", branch = "remap-file-range-
# Concurrent data structures
dashmap = "5.5"

# Checksum for corruption detection
crc32fast = "1.3"

[dev-dependencies]
tokio = { version = "1", features = ["rt-multi-thread", "macros", "test-util", "process", "time"] }
tempfile = "3"
Expand Down
Loading
Loading