diff --git a/docs/architecture/cri-passthrough.md b/docs/architecture/cri-passthrough.md index 36b63c8..1b60ad7 100644 --- a/docs/architecture/cri-passthrough.md +++ b/docs/architecture/cri-passthrough.md @@ -78,7 +78,7 @@ Windows named-pipe URIs use forward slashes after the scheme -- the path is `//. - **Linux**: CRI is fully supported by containerd v2. All crictl commands behave as they would against a standalone containerd install. - **Windows**: containerd v2 ships a native Windows CRI implementation (Hyper-V isolated containers). A handful of CRI features that assume Linux semantics (cgroups, mount propagation flags) are no-ops or return errors -- this mirrors upstream containerd behavior. -- **WSL-to-Windows**: when the Windows host routes Linux jobs to the WSL worker, `ephemerd crictl` on the host only sees the Windows containerd CRI. To inspect WSL-side Linux containers, use `wsl -- ephemerd crictl ...` inside the distro. +- **In-VM containerd on Windows**: when the Windows host routes Linux jobs to the Hyper-V Linux VM, `ephemerd crictl` on the host only sees the Windows containerd CRI. To inspect Linux containers inside the VM, exec into the VM (via `ephemerd debugexec` or the VM's console) and run `ephemerd crictl ...` against the VM's local containerd socket. ## Typical Debugging Workflow diff --git a/docs/architecture/embedded-containerd.md b/docs/architecture/embedded-containerd.md index 1df7225..4c13360 100644 --- a/docs/architecture/embedded-containerd.md +++ b/docs/architecture/embedded-containerd.md @@ -27,7 +27,7 @@ On startup, `server.New()`: 5. Calls `ctdserver.New(ctx, cfg)` to create the in-process server. 6. Creates a gRPC listener on the platform-appropriate socket and serves in a background goroutine. 7. Also creates a tTRPC listener for task/event APIs. -8. Optionally creates a TCP listener for remote access (used by the Windows host to connect to WSL containerd). +8. Optionally creates a TCP listener for remote access (used by the Windows or macOS host to connect to the in-VM containerd). 9. Connects an in-process containerd client and waits for it to become ready (up to 15 seconds). The server, gRPC listeners, and client all run in the same process. On shutdown, `Server.Stop()` closes the client, stops the server, cancels the context, and waits for the background goroutines to finish. @@ -48,7 +48,7 @@ The `SocketPath()` function in `pkg/containerd/server.go` returns the correct pa When `TCPPort` is set in the config (e.g., `--containerd-tcp-port 10000`), the server also listens on TCP. This is used for: -- **Windows host to WSL**: the Windows scheduler connects to WSL's containerd via TCP since named pipes do not cross the WSL boundary. +- **Windows host to Hyper-V Linux VM**: the Windows scheduler connects to the in-VM containerd via TCP since named pipes do not cross the VM boundary. - **macOS host to Linux VM**: the macOS host connects to containerd inside the Virtualization.framework Linux VM via TCP over NAT. The TCP bind address defaults to `127.0.0.1` but can be configured to `0.0.0.0` for VM environments where the host is on a different network interface. diff --git a/docs/architecture/forgejo-gitea.md b/docs/architecture/forgejo-gitea.md index a90eb04..b474b66 100644 --- a/docs/architecture/forgejo-gitea.md +++ b/docs/architecture/forgejo-gitea.md @@ -36,7 +36,7 @@ ephemerd exploits this by mounting its [fake Docker socket]({{< relref "fake-doc flowchart TB F["Forge Instance
(Forgejo or Gitea)"] - subgraph H ["ephemerd host (Linux, Windows via WSL2, or macOS via Vz)"] + subgraph H ["ephemerd host (Linux, Windows via Hyper-V Linux VM, or macOS via Vz)"] E[ephemerd] CTD["containerd"] DSock["Fake Docker Socket
pkg/dind
/var/run/docker.sock"] @@ -88,7 +88,7 @@ flowchart TB ### Lifecycle 1. ephemerd creates the runner container from the upstream runner image, with the fake Docker socket bind-mounted at `/var/run/docker.sock`. -2. containerd starts the runner -- on Linux directly, inside WSL2 on Windows, inside the Vz Linux VM on macOS. +2. containerd starts the runner -- on Linux directly, inside the Hyper-V Linux VM on Windows, inside the Vz Linux VM on macOS. 3. Runner registers with the forge as an ephemeral runner and long-polls `FetchTask`. 4. Forge returns a task -- workflow YAML bytes, context, secrets, vars. 5. act parses the workflow and determines the job image from `runs-on:` label mapping. @@ -185,7 +185,7 @@ Forgejo/Gitea Actions is a Linux-jobs-only ecosystem today. On all three host OS | Host OS | How Linux containers run | |---------|-------------------------| | Linux | Direct containerd | -| Windows | containerd inside WSL2 | +| Windows | containerd inside Hyper-V Linux VM | | macOS | containerd inside Vz Linux VM | ## Configuration diff --git a/docs/architecture/macos-vms.md b/docs/architecture/macos-vms.md index f2eb446..71299bb 100644 --- a/docs/architecture/macos-vms.md +++ b/docs/architecture/macos-vms.md @@ -46,7 +46,7 @@ The rootfs tarball and Linux binary are embedded in the macOS binary via `go:emb - **virtio-fs**: the host's data directory is shared into the VM at `/mnt/ephemerd`. The ephemerd Linux binary lives here -- no need to copy it into the disk image. It loads into memory on exec and runs at native speed. - **TCP over NAT**: containerd inside the VM listens on a TCP port. The host connects a gRPC containerd client to `127.0.0.1:`. -Unlike Windows WSL dispatch, macOS does not need a separate dispatch layer. The containerd gRPC client is platform-agnostic -- the macOS host binary can create Linux containers directly via the TCP connection. Only the container runtime code (OCI spec, snapshotter, networking) runs inside the VM. +Unlike the Windows Hyper-V dispatch, macOS does not need a separate dispatch layer. The containerd gRPC client is platform-agnostic -- the macOS host binary can create Linux containers directly via the TCP connection. Only the container runtime code (OCI spec, snapshotter, networking) runs inside the VM. ### Two Boot Modes diff --git a/docs/architecture/multi-forge-providers.md b/docs/architecture/multi-forge-providers.md index ea604ad..106c5eb 100644 --- a/docs/architecture/multi-forge-providers.md +++ b/docs/architecture/multi-forge-providers.md @@ -157,7 +157,7 @@ Only one provider should be configured at a time. Precedence when multiple secti The entire container infrastructure is provider-agnostic: - Container runtime (`pkg/runtime`) -- WSL dispatch (Linux jobs on Windows) +- Hyper-V Linux VM dispatch (Linux jobs on Windows) - Networking (CNI on Linux, HCN on Windows) - Embedded containerd - gRPC control plane (status, jobs, drain) diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 726a909..fdbe871 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -82,7 +82,7 @@ Standard OCI containers via embedded containerd, running directly on the host ke containerd runs natively on Windows and supports Hyper-V isolation. Each container gets its own kernel in a lightweight VM -- real isolation, malicious code cannot escape to the host. Same OCI images, same containerd APIs, just compiled for Windows. Startup ~5-10s. Networking via HCN (Host Compute Network) with NAT and per-endpoint ACL policies. -Linux jobs on a Windows host are dispatched to a WSL2 worker via gRPC. See [Windows WSL dispatch]({{< relref "windows-wsl-dispatch" >}}). +Linux jobs on a Windows host are dispatched via gRPC to a Hyper-V Linux VM that ephemerd boots and manages directly. See [Windows Hyper-V dispatch]({{< relref "windows-wsl-dispatch" >}}). ### macOS: Virtualization.framework @@ -98,7 +98,7 @@ Because Windows can run Hyper-V Linux VMs and macOS can run Virtualization.frame |------|-----------|----------------| | Linux x86_64 | containerd (direct) | -- | | Linux arm64 | containerd (direct) | -- | -| Windows x86_64 | containerd in WSL2 Linux VM | Hyper-V Windows containers | +| Windows x86_64 | containerd in Hyper-V Linux VM | Hyper-V Windows containers | | macOS arm64 | containerd in Virtualization.framework Linux VM | Ephemeral macOS VMs (clone-on-write) | A Windows box and a Mac Mini together cover every combination: linux/amd64, linux/arm64, windows/amd64. @@ -111,7 +111,7 @@ Each OS/arch combination produces one self-contained binary with containerd comp |--------|--------|----------------------| | linux/amd64 | `ephemerd` | containerd direct | | linux/arm64 | `ephemerd` | containerd direct | -| windows/amd64 | `ephemerd.exe` | containerd + Hyper-V (Windows jobs) / WSL2 (Linux jobs) | +| windows/amd64 | `ephemerd.exe` | containerd + Hyper-V (Windows jobs) / Hyper-V Linux VM (Linux jobs) | | darwin/arm64 | `ephemerd` | Virtualization.framework Linux VM + containerd inside | No runtime dependencies beyond the OS kernel, Hyper-V (Windows), or Virtualization.framework (macOS). diff --git a/docs/architecture/pre-baked-rootfs.md b/docs/architecture/pre-baked-rootfs.md index 026f2da..500b3c2 100644 --- a/docs/architecture/pre-baked-rootfs.md +++ b/docs/architecture/pre-baked-rootfs.md @@ -3,20 +3,20 @@ title: Pre-Baked Rootfs weight: 7 --- -The WSL and macOS Linux VM rootfs is an Alpine minirootfs with gcompat and iptables baked in at compile time. This eliminates network-dependent package installation during boot. +The Linux VM rootfs (Hyper-V on Windows, Vz on macOS) and the temporary WSL distro that `ephemerd run` uses are all Alpine minirootfs with gcompat and iptables baked in at compile time. This eliminates network-dependent package installation during boot. ## Context -Every WSL distro boot (and Vz Linux VM boot) needs two packages that are not in the stock Alpine minirootfs: +Every Linux VM boot (Hyper-V on Windows, Vz on macOS) and every `ephemerd run` WSL distro import needs two packages that are not in the stock Alpine minirootfs: - **gcompat** -- glibc compatibility shim required by `containerd-shim-runc-v2`, which is built against glibc. - **iptables** -- required by CNI plugins for container network NAT rules. -Previously these were installed at runtime via `apk add --no-cache gcompat iptables` after each distro import. This had several problems: +Previously these were installed at runtime via `apk add --no-cache gcompat iptables` after each boot/import. This had several problems: - 10-30s of boot time spent downloading and installing packages over the network. -- DNS flakes -- WSL networking is not always ready immediately after distro import, requiring a retry loop with backoffs. -- The only network-dependent step in the entire distro boot sequence. +- DNS flakes -- guest networking is not always ready immediately after the VM/distro starts, requiring a retry loop with backoffs. +- The only network-dependent step in the entire boot sequence. - Multiplied cost -- `ephemerd run` creates a fresh distro per invocation, paying this penalty every time. ## How It Works diff --git a/docs/architecture/windows-wsl-dispatch.md b/docs/architecture/windows-wsl-dispatch.md index 9bfb741..19a1d2a 100644 --- a/docs/architecture/windows-wsl-dispatch.md +++ b/docs/architecture/windows-wsl-dispatch.md @@ -1,25 +1,32 @@ --- -title: Windows WSL Dispatch +title: Windows Hyper-V Dispatch weight: 3 +aliases: + - /architecture/windows-wsl-dispatch/ --- -On Windows, ephemerd runs a single scheduler that handles both Windows and Linux jobs. Windows jobs run natively in Hyper-V containers. Linux jobs are dispatched to a WSL2 worker via gRPC. +On Windows, ephemerd runs a single scheduler that handles both Windows and Linux jobs. Windows jobs run natively as Hyper-V isolated containers. Linux jobs are dispatched via gRPC to a Hyper-V Linux VM that ephemerd boots and manages directly. + +## Why a Hyper-V VM (not WSL2) + +An earlier revision dispatched Linux jobs to a WSL2 distro. That works when ephemerd runs as a user process, but Windows Services execute as `LocalSystem`, and WSL2 has no `LocalSystem` support — calling `wsl --import` or `wsl --exec` from `LocalSystem` fails with `0x80370102` / `WSL_E_USER_NOT_REGISTERED`. The Hyper-V Compute Service (HCS) has no such restriction, so ephemerd creates the Linux VM by calling `vmcompute.dll` directly. The same code path works for an interactive user *and* for the installed Windows service. ## Architecture -One poller on Windows dispatches Linux jobs to WSL via gRPC. WSL runs containerd-only plus a dispatch worker -- no scheduler, no GitHub credentials. +One poller on Windows dispatches Linux jobs to the Hyper-V VM via gRPC. The VM runs `ephemerd serve --containerd-only` plus a dispatch worker — no scheduler, no GitHub credentials. ``` Windows Host (ephemerd.exe serve): - +-- Containerd (Windows, named pipe) + +-- Containerd (Windows, named pipe + 127.0.0.1 TCP) +-- Scheduler (single poller for ALL jobs) | +-- Windows job -> local Runtime.Create() on Windows containerd - | +-- Linux job -> gRPC DispatchClient -> WSL dispatch server - +-- WSL VM boot (containerd-only + dispatch worker) + | +-- Linux job -> gRPC DispatchClient -> Hyper-V VM dispatch server + +-- Hyper-V Linux VM boot (HCS / vmcompute.dll) -WSL (ephemerd serve --containerd-only): +Hyper-V Linux VM (ephemerd serve --containerd-only): +-- Containerd (Linux, TCP :10000) - +-- Runner extracted, CNI extracted, networking initialized + +-- Persistent VHDX rootfs (data dir / containerd state) + +-- Embedded Linux ephemerd binary, runner, CNI, gcompat, iptables +-- Dispatch gRPC server (TCP :10001) +-- CreateJob(id, image, jitConfig) -> local Runtime.Create() +-- WaitJob(id) -> local Runtime.Wait() @@ -36,7 +43,7 @@ A Windows-compiled `Runtime.Create()` cannot create Linux containers. The runtim - Container I/O (`cio.NullIO` on Windows, log file on Linux) - Runner mount paths (`C:\actions-runner` vs `/actions-runner`) -The Linux-specific code must run inside WSL. The gRPC dispatch layer bridges the gap: the Windows scheduler sends job requests to the WSL worker, which creates Linux containers using its own Linux-compiled runtime. +The Linux-specific code must run inside the Linux VM. The gRPC dispatch layer bridges the gap: the Windows scheduler sends job requests to the in-VM worker, which creates Linux containers using its own Linux-compiled runtime. ## Protobuf Dispatch Service @@ -65,7 +72,7 @@ message DestroyJobResponse {} ## Key Components -### Dispatch Server (WSL side) +### Dispatch Server (Linux VM side) Implemented in `pkg/scheduler/dispatch.go`. The `dispatchServer` struct wraps a `*runtime.Runtime` and a map of active `RunnerEnv` objects: @@ -81,10 +88,10 @@ Also in `pkg/scheduler/dispatch.go`. The `DispatchClient` struct holds a gRPC co ### Containerd-Only Mode -When WSL boots ephemerd with `--containerd-only`: +When the in-VM ephemerd boots with `--containerd-only`: -1. Starts embedded containerd with a TCP listener. -2. Extracts the runner binary and CNI plugins. +1. Starts embedded containerd with a TCP listener on `0.0.0.0:10000`. +2. Extracts the runner binary and CNI plugins from its embedded payload. 3. Initializes networking (CNI bridge, stale bridge cleanup). 4. Creates a local `runtime.Runtime`. 5. Starts the dispatch gRPC server on `containerdPort + 1` (default port 10001). @@ -104,38 +111,47 @@ Windows-labeled jobs go through the normal local `Runtime.Create()` path. ## End-to-End Flow 1. Windows host starts: native containerd + single scheduler. -2. WSL VM boots in background: containerd-only + dispatch worker. +2. Hyper-V Linux VM boots in background: containerd-only + dispatch worker. 3. GitHub job queued with `runs-on: [self-hosted, linux, x64]`. 4. Windows scheduler sees it, detects `"linux"` label and `LinuxDispatcher != nil`. 5. Registers JIT runner with `["self-hosted", "linux", "x64"]` labels. -6. Calls `dispatcher.Create(name, image, jitConfig)` -- gRPC to WSL. -7. WSL dispatch server creates a Linux container using its local Runtime. +6. Calls `dispatcher.Create(name, image, jitConfig)` -- gRPC to the VM IP. +7. Dispatch server in VM creates a Linux container using its local Runtime. 8. Windows scheduler calls `dispatcher.Wait(name)` -- blocks until job completes. -9. Windows scheduler calls `dispatcher.Destroy(name)` -- cleans up container + networking in WSL. +9. Windows scheduler calls `dispatcher.Destroy(name)` -- cleans up container + networking in the VM. 10. Windows jobs follow the normal local Runtime flow. -## WSL VM Lifecycle +## Hyper-V VM Lifecycle -The WSL VM is managed by `pkg/vm/linuxvm_windows.go`: +The Linux VM is managed by `pkg/vm/linuxvm_windows.go` via the HCS (Host Compute Service) API: -- On startup, imports a WSL distro from the embedded pre-built rootfs. -- Runs the Linux ephemerd binary from `/mnt/c/` (Windows disk mount, avoids slow 9P copy into the distro). -- Launches with `--containerd-only` -- no GitHub credentials are needed in WSL. -- After containerd is ready, connects a dispatch gRPC client to port `containerdPort + 1`. -- On shutdown, the distro is unregistered via `wsl --unregister`. +- On startup, the embedded Linux kernel (`vmlinuz`) and initrd (containing a pre-baked Alpine rootfs + the cross-compiled Linux `ephemerd` binary) are written into `/vm/linux/`. +- A persistent VHDX root disk is created on first boot at `/containerd/linux-root/root.vhdx` (default 100 GB). Image content and containerd metadata live here, so a host restart doesn't re-pull every image. +- ephemerd builds an HCS compute system document for a KernelDirect (LCOW) boot and calls `vmcompute.dll` directly. We don't use hcsshim's `uvm.CreateLCOW` because it assumes a Microsoft GCS is running inside the VM (vsock-based), and we run a normal Linux userspace instead. +- An HCN endpoint on the Default Switch is attached to the VM. ephemerd watches WMI events to discover the assigned IP, then connects: + - `:10000` -- containerd gRPC (only used by buildkit and per-job runtime calls; jobs themselves see a unix socket inside the VM). + - `:10001` -- dispatch gRPC (CreateJob / WaitJob / DestroyJob). +- The Linux ephemerd binary launches with `--containerd-only`. No PEM file, no config.toml, no GitHub credentials inside the VM. +- On shutdown, ephemerd asks HCS to terminate the compute system and releases the HCN endpoint. The VHDX persists for the next boot. -The WSL VM boots asynchronously in a background goroutine. Windows jobs can run immediately while the WSL worker starts up. Linux jobs queue until the dispatch client is connected. +The VM boots asynchronously in a background goroutine. Windows jobs can run immediately while the Linux VM starts up. Linux jobs queue until the dispatch client is connected to the VM. ## Pre-Baked Rootfs -The WSL rootfs is an Alpine minirootfs with gcompat and iptables baked in at compile time. This eliminates network-dependent `apk add` calls during boot. See [Pre-baked rootfs]({{< relref "pre-baked-rootfs" >}}). +The rootfs inside the initrd is an Alpine minirootfs with gcompat and iptables baked in at compile time. This eliminates network-dependent `apk add` calls during boot. See [Pre-baked rootfs]({{< relref "pre-baked-rootfs" >}}). ## What This Architecture Removes Compared to the earlier dual-scheduler approach: -- No PEM file copy into WSL. -- No config.toml rewriting for WSL. -- No duplicate GitHub polling from WSL. -- No GitHub App token refresh in WSL. -- WSL has no GitHub credentials at all. +- No PEM file copy into the worker. +- No config.toml rewriting for the worker. +- No duplicate GitHub polling from the worker. +- No GitHub App token refresh in the worker. +- The Linux worker has no GitHub credentials at all. + +Compared to the earlier WSL2-based worker: + +- Works under `LocalSystem`, so the installed Windows service can manage Linux jobs. +- No dependency on the `wsl.exe` toolchain or any WSL distro registration. +- Boot is deterministic — same kernel, same initrd, same VHDX root every time. diff --git a/docs/cli/serve.md b/docs/cli/serve.md index d99c22d..2940469 100644 --- a/docs/cli/serve.md +++ b/docs/cli/serve.md @@ -14,7 +14,7 @@ ephemerd serve [flags] | Flag | Default | Description | |------|---------|-------------| | `--config`, `-c` | `/config.toml` | Path to config file | -| `--containerd-tcp-port` | (none) | Also expose containerd on a TCP port (used by WSL host integration) | +| `--containerd-tcp-port` | (none) | Also expose containerd on a TCP port (used by the in-VM worker so the Windows host can reach containerd over TCP) | | `--containerd-tcp-addr` | `127.0.0.1` | Bind address for the containerd TCP listener (use `0.0.0.0` when host lives outside the network namespace) | | `--containerd-only` | `false` | Only run containerd and the dispatch worker (no scheduler, no GitHub polling, no runner extraction) | | `--dind` | `false` | Mount a fake Docker socket into each container (overrides config file setting) | @@ -32,23 +32,23 @@ When `serve` starts, it performs these steps in order: 7. **Initialize networking** -- sets up the container network (CNI bridge on Linux, HCN NAT on Windows). 8. **Install firewall rules** -- blocks container access to RFC1918 and link-local ranges. 9. **Create GitHub client** -- authenticates using `GITHUB_TOKEN` or GitHub App credentials from the config. -10. **Wait for Linux dispatcher** -- if a WSL2 VM is booting in the background (Windows only), waits for the gRPC dispatch client to become ready. +10. **Wait for Linux dispatcher** -- if a Hyper-V Linux VM is booting in the background (Windows only), waits for the gRPC dispatch client to become ready. 11. **Configure webhook tunnel** -- sets up localtunnel or ngrok for webhook delivery, or falls back to polling mode. 12. **Start scheduler** -- begins discovering and processing GitHub Actions jobs. 13. **Start metrics server** -- if metrics are enabled in the config, starts the Prometheus metrics endpoint. ## Containerd-only mode -When `--containerd-only` is set, the daemon runs a stripped-down mode intended for the WSL2 Linux worker: +When `--containerd-only` is set, the daemon runs a stripped-down mode intended for the in-VM Linux worker (Hyper-V Linux VM on Windows, Vz Linux VM on macOS): - Starts containerd with the TCP listener. - Extracts the runner and CNI plugins. -- Cleans stale CNI bridges from previous WSL boots. +- Cleans stale CNI bridges from previous boots. - Sets up networking and firewall rules. - Starts the gRPC dispatch server on `containerd-tcp-port + 1`. - Does **not** start the scheduler, poll GitHub, or require GitHub credentials. -The Windows host dispatches Linux jobs to this worker via gRPC. +The host dispatches Linux jobs to this worker via gRPC. ## Signal handling @@ -75,7 +75,7 @@ sudo ephemerd serve # Start with a custom config file sudo ephemerd serve --config /etc/ephemerd/config.toml -# Start in WSL worker mode (used by Windows host integration) +# Start in in-VM worker mode (invoked automatically inside the Hyper-V or Vz Linux VM) ephemerd serve --containerd-tcp-port 10000 --containerd-tcp-addr 0.0.0.0 --containerd-only # Start with Docker-in-Docker support diff --git a/docs/contributing/dev-setup.md b/docs/contributing/dev-setup.md index 4b1d147..2d64129 100644 --- a/docs/contributing/dev-setup.md +++ b/docs/contributing/dev-setup.md @@ -32,7 +32,7 @@ mage ci | `mage lint` | Download golangci-lint and run it | | `mage test` | Download embedded deps and run all tests | | `mage build:build` | Compile ephemerd for the current OS | -| `mage build:windows` | Two-stage Windows build (cross-compiles and embeds Linux binary for WSL) | +| `mage build:windows` | Two-stage Windows build (cross-compiles and embeds the Linux binary that runs inside the Hyper-V Linux VM) | | `mage e2e` | Unprivileged e2e tests (requires `GITHUB_TOKEN`) | | `mage e2eall` | All e2e tests including privileged (requires root + containerd) | | `mage e2eforgejo` | Forgejo provider e2e (requires Docker with compose) | diff --git a/docs/contributing/project-layout.md b/docs/contributing/project-layout.md index 3b0b83b..4a6d7c7 100644 --- a/docs/contributing/project-layout.md +++ b/docs/contributing/project-layout.md @@ -22,7 +22,7 @@ pkg/ Library packages metrics/ Prometheus metrics endpoint artifacts/ OCI artifact extraction for macOS VM jobs workflow/ Local workflow parser and runner (ephemerd run) - vm/ Linux VM (WSL/Vz) and macOS VM (Vz) + vm/ Linux VM (Hyper-V on Windows, Vz on macOS) and macOS VM (Vz) runner/ Embedded GitHub Actions runner binary api/v1/ gRPC protobuf definitions mage/ Mage build and download targets @@ -38,7 +38,7 @@ Platform code uses Go build tags and file name suffixes: | Suffix | Platform | Examples | |---|---|---| | `*_linux.go` | Linux only | CNI networking, iptables, seccomp | -| `*_windows.go` | Windows only | HCN networking, Hyper-V, WSL | +| `*_windows.go` | Windows only | HCN networking, Hyper-V Windows containers, Hyper-V Linux VM, WSL run-mode distro | | `*_darwin.go` | macOS only | Virtualization.framework | | `*_stub.go` / `*_other.go` | Fallback stubs | No-op implementations for unsupported platforms | diff --git a/docs/getting-started/configuration.md b/docs/getting-started/configuration.md index cf60087..630c0da 100644 --- a/docs/getting-started/configuration.md +++ b/docs/getting-started/configuration.md @@ -223,7 +223,7 @@ Default images when `default_image` is not set: - **Linux:** `ghcr.io/actions/actions-runner:latest` - **Windows:** `mcr.microsoft.com/windows/servercore:ltsc20XX` (auto-detected from host build) -**VM resource planning (Windows and macOS):** On Windows and macOS, `max_concurrent` applies to the entire ephemerd instance — Linux container jobs and native OS jobs share the same concurrency pool. All Linux jobs run inside a single VM (WSL2 on Windows, Virtualization.framework on macOS), so if `max_concurrent = 4`, that VM could be running 4 jobs simultaneously. Size the VM's CPU and memory (`[vm.linux]`) accordingly, or jobs will compete for resources and slow each other down. +**VM resource planning (Windows and macOS):** On Windows and macOS, `max_concurrent` applies to the entire ephemerd instance — Linux container jobs and native OS jobs share the same concurrency pool. All Linux jobs run inside a single VM (Hyper-V Linux VM on Windows, Virtualization.framework on macOS), so if `max_concurrent = 4`, that VM could be running 4 jobs simultaneously. Size the VM's CPU and memory (`[vm.linux]`) accordingly, or jobs will compete for resources and slow each other down. ### `[vm.linux]` @@ -236,7 +236,7 @@ Linux VM for running Linux jobs on Windows or macOS hosts. | `memory_mb` | integer | `2048` | Memory in MB | | `disk_size_gb` | integer | `50` | Sparse disk size in GB | -On Windows, this creates a WSL2 distro with an embedded rootfs. On macOS, it uses Virtualization.framework. +On Windows, this creates a Hyper-V Linux VM via the HCS (Host Compute Service) API, booted from an embedded kernel + initrd onto a persistent VHDX. On macOS, it uses Virtualization.framework. ### `[vm.macos]` diff --git a/docs/guides/how-it-works.md b/docs/guides/how-it-works.md index 2833ae1..bbe4bcc 100644 --- a/docs/guides/how-it-works.md +++ b/docs/guides/how-it-works.md @@ -32,30 +32,32 @@ graph TD On Windows, ephemerd manages two types of jobs from a single process: - **Windows jobs** run as Hyper-V isolated containers. Each container gets its own Windows kernel, providing strong isolation. The embedded containerd talks to the HCS (Host Compute Service) via the runhcs shim. -- **Linux jobs** are dispatched to a WSL2 distro via gRPC. The WSL distro runs a second copy of ephemerd (cross-compiled for Linux and embedded in the Windows binary) with its own containerd instance. +- **Linux jobs** are dispatched via gRPC to a Hyper-V Linux VM that ephemerd boots and manages directly. The VM runs a second copy of ephemerd (cross-compiled for Linux and embedded in the Windows binary) with its own containerd instance. -A single scheduler on the Windows host polls GitHub for all jobs and routes them by OS label. The WSL worker has no GitHub credentials and no scheduler -- it only runs containers on demand. +ephemerd creates the Linux VM by calling the Hyper-V Compute Service (`vmcompute.dll`) directly, with a KernelDirect boot from an embedded Linux kernel and initrd. The root filesystem lives on a persistent VHDX under the data directory, and the VM is attached to the Hyper-V Default Switch with an HCN endpoint that ephemerd allocates. This replaced an earlier WSL2-based worker so that ephemerd can run from any Windows security context -- including `LocalSystem`, which is what the Windows service uses, and which WSL2 does not support. + +A single scheduler on the Windows host polls GitHub for all jobs and routes them by OS label. The Linux VM has no GitHub credentials and no scheduler -- it only runs containers on demand. ```mermaid graph TD A[ephemerd.exe serve] --> B[Windows containerd] A --> C[Scheduler - single poller] - A --> D[Boot WSL2 distro] + A --> D[Boot Hyper-V Linux VM] - C -->|Windows job| E[Hyper-V container] + C -->|Windows job| E[Hyper-V Windows container] B --> E E --> F[HCN NAT network] C -->|Linux job| G[gRPC DispatchClient] - G -->|CreateJob| H[WSL dispatch server] - H --> I[Linux containerd] + G -->|TCP to VM IP:10001| H[Dispatch server in VM] + H --> I[Linux containerd in VM] I --> J[OCI container] - J --> K[CNI bridge network] + J --> K[CNI bridge network in VM] - D -->|imports rootfs| H + D -->|kernel + initrd + VHDX root| H ``` -The WSL distro is created from an embedded Alpine rootfs with gcompat and iptables pre-installed. The Linux ephemerd binary runs from `/mnt/c/` (the Windows disk mount) to avoid the slow 9P filesystem copy into WSL. On shutdown, the distro is unregistered and destroyed. +The VM boots from an embedded Alpine-based initrd with gcompat and iptables baked in, mounts the persistent VHDX as its rootfs, and starts the embedded Linux `ephemerd` binary in `--containerd-only` mode. The Windows host reaches the VM over TCP -- containerd gRPC on `:10000` and the dispatch service on `:10001`. On shutdown, the VM is stopped via HCS and the HCN endpoint is released; the VHDX persists across restarts so image content survives reboots. ### macOS @@ -93,14 +95,14 @@ Every supported host can run both its native jobs and Linux jobs: | Host OS | Linux jobs | Native jobs | |---------|-----------|-------------| | **Linux** | OCI containers (direct) | OCI containers (direct) | -| **Windows** | OCI containers (via WSL2) | Hyper-V containers | +| **Windows** | OCI containers (via Hyper-V Linux VM) | Hyper-V Windows containers | | **macOS** | OCI containers (via Vz Linux VM) | APFS clone-on-write macOS VMs | This means a single ephemerd host can serve workflows that need both Linux and native platform steps. ### Resource planning for VMs -On Windows and macOS, the `max_concurrent` setting applies globally — Linux container jobs and native OS jobs share the same concurrency pool. All Linux container jobs run inside a single VM (WSL2 on Windows, Virtualization.framework on macOS), so if `max_concurrent = 4`, that one VM could be running up to 4 concurrent jobs. +On Windows and macOS, the `max_concurrent` setting applies globally — Linux container jobs and native OS jobs share the same concurrency pool. All Linux container jobs run inside a single VM (Hyper-V Linux VM on Windows, Virtualization.framework on macOS), so if `max_concurrent = 4`, that one VM could be running up to 4 concurrent jobs. Size the Linux VM resources accordingly: @@ -117,7 +119,7 @@ If the VM is undersized, jobs will compete for CPU and memory and slow each othe ## One Image, Every Host -The same Dockerfile produces an image that runs on all three platforms. ephemerd always uses containerd to run Linux OCI containers -- whether that containerd is running natively (Linux), inside WSL2 (Windows), or inside a Virtualization.framework VM (macOS). The container runtime is identical in all cases. +The same Dockerfile produces an image that runs on all three platforms. ephemerd always uses containerd to run Linux OCI containers -- whether that containerd is running natively (Linux), inside a Hyper-V Linux VM (Windows), or inside a Virtualization.framework VM (macOS). The container runtime is identical in all cases. ```yaml jobs: diff --git a/docs/guides/providers.md b/docs/guides/providers.md index 3d52b74..4ab0f02 100644 --- a/docs/guides/providers.md +++ b/docs/guides/providers.md @@ -148,7 +148,7 @@ agent_secret = "your-shared-secret" Regardless of which provider is active, the following subsystems are shared: - Container runtime (containerd, OCI images, overlayfs/windows snapshotter) -- WSL2 dispatch for Linux jobs on Windows hosts +- Hyper-V Linux VM dispatch for Linux jobs on Windows hosts - macOS VM support via Virtualization.framework - CNI bridge networking (Linux) and HCN NAT networking (Windows) - Concurrency limiting, job dedup, and graceful drain diff --git a/docs/guides/runner-images.md b/docs/guides/runner-images.md index d6c5cb0..7d088e1 100644 --- a/docs/guides/runner-images.md +++ b/docs/guides/runner-images.md @@ -213,7 +213,7 @@ default_image = "ghcr.io/your-org/ci-image:latest" ## One Image, Every Host -The same Linux container image runs identically on Linux, Windows (via WSL2), and macOS (via Virtualization.framework). In all three cases, containerd is the runtime that pulls and executes the image. There is no need to maintain separate images per host platform. +The same Linux container image runs identically on Linux, Windows (via the Hyper-V Linux VM), and macOS (via Virtualization.framework). In all three cases, containerd is the runtime that pulls and executes the image. There is no need to maintain separate images per host platform. ## Reference: ephemerd CI Images