You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A reusable Proxmox LXC template (CT 200, "hal0-test-template") and a smoke driver script that clones it, runs `install.sh`, exercises a few hal0 subcommands, runs the uninstall path, and destroys the clone. Each smoke run starts from byte-identical state.
Why this exists
LXC 105 (`hal0`) is the live dev environment — full hardware passthrough, editable install, real models. It is the wrong target for "what does `curl … | bash` feel like for a new user?" testing because:
Polluted with dev state (registry, slot toml, lemonade config, MCP installs, models).
Can't easily reset between runs.
Mixing dev + install/uninstall iteration loses both signals.
We need an ephemeral target that exercises the tarball install path with the same hardware-access recipe as 105 so the iGPU / NPU show up to the installer. LXCs share the host kernel and gate `/dev/*` access via cgroup rules — both 105 and the test CT can hold cgroup rights to the iGPU / NPU simultaneously, with a brief contention window only when both try to load a model. For install + uninstall flow testing (no model warming required) there is no contention.
Golden template (CT 200)
Built once. Carries:
Vanilla distro (Debian 12 cloud or CachyOS — pick whichever matches LXC 105's base; verify by inspecting 105.conf)
The strix-halo passthrough recipe from `220.conf` (privileged, apparmor unconfined, `dev0`–`dev3` cgroup lines from `feedback / strix-halo-lxc-passthrough` memory)
`/mnt/ai-models` NFS bind-mount (read-only, from pve) so install doesn't have to re-download base models
7. assert clean post-uninstall state (no /opt/hal0, no /usr/lib/hal0/current, no /etc/hal0, no /var/lib/hal0, no systemd unit)
8. pct destroy --force (unless --keep)
9. emit a single JSON line to stdout: {clone_id, install_ok, smoke_ok, uninstall_ok, residue, elapsed_s}
```
Integrate the JSON-line output with the existing δ-tier harness at `tests/harness/` (memory: `hal0_test_harness`) so install / uninstall coverage gets aggregated alongside slot lifecycle coverage. New row type, same emitter.
Acceptance criteria
CT 200 created with the 105-equivalent passthrough recipe; `pct config 200` matches the `220.conf` pattern (privileged, apparmor unconfined, dev0–dev3 cgroup lines)
CT 200 marked as template (`pct template 200`)
Provisioning mechanism (cloud-init or hookscript) chosen and committed to `packaging/proxmox/hal0-test-template/`
First-boot leaves `/tmp/cloud-init-done`; sshd accepts the `halo` user with the provisioned key
`scripts/fresh-test-ct.sh` implements the clone → install → smoke → uninstall → destroy cycle and emits the JSON line
Uninstall residue assertion: zero hal0 artifacts left on disk, zero systemd units, zero open ports — fail loudly on residue
Harness integration: `make harness-install` (or similar) runs the script end-to-end, emits to `tests/harness/findings.jsonl`
Docs at `docs/internal/install-test-harness.md` covering how to bump the golden template (apt updates, base image refresh) and the contention caveat with LXC 105
One successful run from scratch on pve recorded in the PR description (clone_id, total elapsed time, JSON line)
Sub-task: verify pve SSH key is current (the stale `known_hosts` entry for `10.0.1.110` should be refreshed against the post-kernel-migration host — this issue is the natural place to fix it)
Sub-task: verify live NAS state matches the assumptions in `pve_ai_models_storage_separation` memory before mounting `/mnt/ai-models` (it should be writable on pve, exported read-only via NFS, with the symlinked HF cache at `/mnt/ai-models/huggingface`)
Blocked by
None — fully AFK from start. Unblocks #406 (install-mode reconciliation) for end-to-end verification of the tarball mode.
What to build
A reusable Proxmox LXC template (CT 200, "hal0-test-template") and a smoke driver script that clones it, runs `install.sh`, exercises a few hal0 subcommands, runs the uninstall path, and destroys the clone. Each smoke run starts from byte-identical state.
Why this exists
LXC 105 (`hal0`) is the live dev environment — full hardware passthrough, editable install, real models. It is the wrong target for "what does `curl … | bash` feel like for a new user?" testing because:
We need an ephemeral target that exercises the tarball install path with the same hardware-access recipe as 105 so the iGPU / NPU show up to the installer. LXCs share the host kernel and gate `/dev/*` access via cgroup rules — both 105 and the test CT can hold cgroup rights to the iGPU / NPU simultaneously, with a brief contention window only when both try to load a model. For install + uninstall flow testing (no model warming required) there is no contention.
Golden template (CT 200)
Built once. Carries:
Once the CT is provisioned and verified, mark as template: `pct template 200`.
First-boot provisioning
Choose one of:
Provisioning content (translate to whichever mechanism):
```yaml
users:
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys: [<paste from ~/.ssh/id_ed25519.pub>]
mounts:
packages: [curl, ca-certificates, jq, git, rsync, htop]
runcmd:
```
Clone-smoke harness
`scripts/fresh-test-ct.sh` — pure bash. Usage:
```
fresh-test-ct.sh [--vmid 999] [--keep] [--version ]
1. pct clone 200
2. pct start
3. wait for /tmp/cloud-init-done (timeout 90s)
4. ssh halo@ 'curl -fsSL https://hal0.dev/install.sh | bash'
5. ssh halo@ 'hal0 status && hal0 --version && curl localhost:8080/v1/health'
6. ssh halo@ 'hal0 uninstall --yes'
7. assert clean post-uninstall state (no /opt/hal0, no /usr/lib/hal0/current, no /etc/hal0, no /var/lib/hal0, no systemd unit)
8. pct destroy --force (unless --keep)
9. emit a single JSON line to stdout: {clone_id, install_ok, smoke_ok, uninstall_ok, residue, elapsed_s}
```
Integrate the JSON-line output with the existing δ-tier harness at `tests/harness/` (memory: `hal0_test_harness`) so install / uninstall coverage gets aggregated alongside slot lifecycle coverage. New row type, same emitter.
Acceptance criteria
Blocked by
None — fully AFK from start. Unblocks #406 (install-mode reconciliation) for end-to-end verification of the tarball mode.
Notes
🤖 Generated with Claude Code