Skip to content

fix(image): streamline the Incus base image (#57, #45)#125

Merged
stuffbucket merged 6 commits into
mainfrom
fix/incus-base-image
Jun 22, 2026
Merged

fix(image): streamline the Incus base image (#57, #45)#125
stuffbucket merged 6 commits into
mainfrom
fix/incus-base-image

Conversation

@stuffbucket

Copy link
Copy Markdown
Owner

Summary

Addresses the two open issues against bladerunner's existing Incus base image (Debian Trixie genericcloud + cloud-init), as the first-step cleanup before adding new guest base-image types (#119/#120).

#57 — first-boot console invisible / reboot never fires

The bootstrap emitted br_stage breadcrumbs to /dev/console, but under VZ the captured device is the virtio console (/dev/hvc0) while Debian's default cmdline routes /dev/console → a non-existent ttyS0 — so first-boot progress vanished. To compensate, bootcmd forced a first-boot reboot to activate a console=hvc0 grub drop-in. That reboot is exactly what #57 reports never fires (cloud-init's pid-1 reaper aborts it), and it doubled cold-boot time.

The issue's own "Option A" (power_state reboot) is actually broken: power_state fires after runcmd, and cloud-init runcmd is per-instance — it won't re-run post-reboot — so the bootstrap would run once, invisibly, then reboot into a boot with no runcmd.

Fix: write breadcrumbs straight to /dev/hvc0 (present in late userspace regardless of the console= cmdline) with a /dev/console fallback, and delete the forced reboot. First-boot progress is visible immediately, no double-boot. The grub drop-in stays so the kernel's own console routes to hvc0 on subsequent natural boots.

  • internal/provision/cloudinit.go: hvc0 breadcrumbs; bootcmd keeps update-grub, drops the .boot1-rebooted reboot.
  • Tests: TestBuildCloudInit_NoFirstBootReboot (regression guard that the reboot stays gone), breadcrumb assertion now checks >/dev/hvc0, comments updated.

#45 — pre-baked guest image build never succeeded

The build-guest-image pipeline scaffolding landed long ago (#46/#50/#51) but has never produced a publishable image: every run dies on passt exited with status 1. libguestfs 1.52 on the GitHub-hosted runners can't bring up its appliance network, so virt-customize --install (which needs apt) aborts. The hosted-image opt-in (UseHostedGuestImage) therefore points at a guest-image-latest release that doesn't exist.

Fix: add a --method auto|guestfish|nbd selector to scripts/build-guest-image.sh and force nbd in CI. The qemu-nbd + chroot path runs apt over the host network namespace and never boots a libguestfs appliance, sidestepping passt. virt-sparsify (no network) still handles the compress step. Root partition is now probed by filesystem instead of hardcoding p1.

  • scripts/build-guest-image.sh, .github/workflows/build-guest-image.yml.

Testing

  • go test ./internal/provision/... pass; golangci-lint 0 issues; gofmt clean.
  • bash -n + shellcheck clean on the build script.
  • CI build validation: dispatching build-guest-image.yml on this branch to confirm the nbd path builds + publishes (passt fix can only be verified on the runner). Will report the result on this PR.

Notes / follow-ups

Write br_stage breadcrumbs straight to /dev/hvc0 (the VZ-captured virtio
console, present in late userspace regardless of the kernel console=
cmdline) with a /dev/console fallback, and drop the forced first-boot
reboot from bootcmd.

The reboot existed only to activate a console=hvc0 grub drop-in before
runcmd, but it never fired reliably (cloud-init's pid-1 reaper) and
doubled cold-boot time. Writing to hvc0 directly makes first-boot
progress visible with no reboot. The grub drop-in stays so the KERNEL's
own console routes to hvc0 on subsequent natural boots.

Fixes #57.
Every build-guest-image run failed on 'passt exited with status 1':
libguestfs 1.52 on the GitHub-hosted runners cannot bring up its
appliance network, so virt-customize --install (which needs apt) aborts.

Add a --method auto|guestfish|nbd selector and force 'nbd' in CI. The
qemu-nbd + chroot path runs apt over the host network namespace and
never boots a libguestfs appliance, sidestepping passt entirely.
virt-sparsify (no network) still handles the compress step. Also probe
the ext4 root partition by filesystem instead of hardcoding p1.

Progresses #45 (pipeline had never produced a publishable image).
The nbd path mounts fine and probes the ext4 root, but apt failed with
'Temporary failure resolving deb.debian.org': the Debian cloud image's
/etc/resolv.conf is a systemd-resolved symlink that dangles inside the
chroot. Copy the host resolver in (the chroot shares the host net
namespace) before apt, and restore the original symlink afterwards so
the baked image is unchanged.

Progresses #45.
Third build failure: 'Unable to locate package incus-ui-canonical'. That
package is not in Debian trixie main (it bundles minified JS -> contrib/
non-free), and apt-installing the Zabbly build would swap Debian's incus
to satisfy its Depends.

Mirror the proven cloud-init path: install 'incus incus-client' (+ socat
jq openssh-server chrony) from main as the required set, then bake the
web UI best-effort by downloading the Zabbly .deb and extracting it to
/opt/incus/ui (never installing it), with an INCUS_UI drop-in. The whole
UI step is non-fatal, so a missing Zabbly suite can't fail the build.

Progresses #45.
Fourth failure was the final step: virt-sparsify crashed with
'guestfs_launch failed' — the libguestfs appliance can't launch at all on
the GitHub runners (the same root cause as the passt error). Everything
before it now works: incus + client from main, the Zabbly UI extracted to
/opt/incus/ui, initramfs regenerated.

Skip virt-sparsify on the nbd path and compress with qemu-img convert -c
instead. Zero free space first (virt-sparsify's block-discard can't run)
so compression stays effective, apt-get clean in the chroot, and detach
qemu-nbd before converting so the compress reads a flushed image.

Progresses #45.
arm64 built and published the guest image cleanly; amd64 failed on a
transient 'Connection reset by peer' fetching a single .deb from the
mirror. Same code, different luck. Set Acquire::Retries=5 in the chroot
so apt retries transient CDN resets, and remove the config before sealing
the image. Closes the last gap in the build (#45).
@stuffbucket

Copy link
Copy Markdown
Owner Author

#45 build pipeline validated ✅

Dispatched build-guest-image.yml on this branch across six iterations, each clearing a distinct layer that had kept the pipeline from ever producing an image:

# Failure Fix
1 passt exited with status 1 (libguestfs appliance net) --method nbd (chroot over host net)
2 hardcoded p1 root probe ext4 partition
3 chroot DNS (deb.debian.org unresolvable) copy host resolver into chroot, restore after
4 Unable to locate package incus-ui-canonical install incus incus-client from main; bake UI via Zabbly .deb extract (matches cloud-init)
5 virt-sparsify: guestfs_launch failed zero free space + qemu-img convert -c; detach nbd first
6 amd64 transient Connection reset by peer Acquire::Retries=5

Final run 27981889504: Build (amd64) ✅ · Build (arm64) ✅ · Publish Release ✅

Published the first-ever guest images:

  • guest-image-v2026.06.22 + guest-image-latest pointer
  • bladerunner-guest-arm64.qcow2 (664 MB), bladerunner-guest-amd64.qcow2 (710 MB) + sha256 sidecars
  • The opt-in hosted path (HostedGuestImageTag = guest-image-latest) now resolves HTTP 200 for both arches.

The image bakes incus + incus-client (Debian main), the Incus web UI at /opt/incus/ui, chrony/watchdog/vsock-ntp units, vsock initramfs modules, and br-agent. UseHostedGuestImage stays opt-in pending an end-to-end boot test on a Mac before any default flip.

@stuffbucket stuffbucket merged commit 29d0a8c into main Jun 22, 2026
8 checks passed
@stuffbucket stuffbucket deleted the fix/incus-base-image branch June 22, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant