Skip to content

fix(agent): drop baked dbus machine-id so regen yields a fresh id#7

Merged
CMGS merged 1 commit into
mainfrom
fix/dbus-machine-id
Jul 3, 2026
Merged

fix(agent): drop baked dbus machine-id so regen yields a fresh id#7
CMGS merged 1 commit into
mainfrom
fix/dbus-machine-id

Conversation

@CMGS

@CMGS CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

The bug

Found during cocoon clone-reseed e2e: cocoon vm reseed --machine-id (and clone auto-reseed) reported success and /etc/machine-id's mtime changed, but the content was the same id. regenMachineID truncates /etc/machine-id, then systemd-machine-id-setup runs — and its documented source order falls back to /var/lib/dbus/machine-id. Images that bake that as a regular file (not a symlink to /etc/machine-id) make setup re-adopt the stale id. Result: clones keep the source's machine-id, defeating the whole point of regen.

The CRNG entropy/reseed ioctls — the security-critical mechanism — were never affected; this is only the machine-id half.

Fix

Before running systemd-machine-id-setup, remove /var/lib/dbus/machine-id iff it's a regular file (Lstat + Mode().IsRegular() — a symlink into /etc/machine-id is correct and left untouched). setup then has no stale source to fall back to and generates a fresh id; dbus recreates its copy from the new /etc/machine-id on next start.

Verification

lint 0 issues × {linux,darwin,windows}, tests green. End-to-end machine-id-actually-rotates verification lands when this ships as v0.1.7 and gets baked into an os-image (chicken-and-egg with the guest agent version); the root cause is precisely the systemd source-order fallback, which this removes.

@CMGS

CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Hardened per review (thanks — the original swallowed non-ErrNotExist Lstat errors via err == nil, inconsistent with this file's own os.Remove(systemdRandomSeed) idiom two lines up, and wrong for a uniqueness-critical op):

  • Extracted dropStaleDBusMachineID(path): ignores fs.ErrNotExist, surfaces any other lstat error (regen fails loudly rather than proceeding to re-adopt the old id), removes only a regular file, leaves a symlink/other untouched.
  • Unit test TestDropStaleDBusMachineID (linux-tagged): regular-file-removed / symlink-preserved / missing-no-op.
  • Test executed on real linux (cross-compiled test binary run on the bare-metal testbed) — all 3 subtests + full agent package PASS. (The file is //go:build linux; a darwin go test silently skips it — verified it actually runs.)
  • make lint 0×3 GOOS (the linux file is covered by the GOOS=linux lint pass).

@CMGS CMGS force-pushed the fix/dbus-machine-id branch from fbde864 to 862edf4 Compare July 3, 2026 16:26
…sh id

regenMachineID truncated /etc/machine-id, but systemd-machine-id-setup's
source order falls back to a baked-in regular-file /var/lib/dbus/machine-id
and re-adopts the old id. dropStaleDBusMachineID removes that copy first;
a symlink into /etc/machine-id or a missing file is left alone, and any
other lstat error is surfaced so regen fails loudly instead of silently
reusing the old id. CRNG reseed was never affected.
@CMGS

CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Ran the full /code + /simplify + tighten pass (was skipped before — thanks for catching):

  • senior /code walk (self, top-to-bottom vs SKILL.md): const grouping, func placement (regen → helper → fallback in call order), error-wrap style, naming, modern-stdlib — all conform.
  • /simplify (three lenses, done inline given the ~20-line surface rather than spawning agents): reuse — no repo helper reimplemented; simplificationaligned the lstat error handling to the nested if err != nil { if ErrNotExist… } form used by the sibling os.Stat(machineIDPath) 5 lines up (was a flat errors.Is-first branch); efficiency — one lstat + conditional remove, nothing wasted.
  • tighten: godoc 4→3 lines (dropped the duplicated "old id"); test os.IsNotExisterrors.Is(err, fs.ErrNotExist) (soft-deprecated form modernized).
  • Re-verified after the edits: lint 0×3 GOOS, and the linux test binary re-run on the bare-metal testbed — 3 subtests + full agent package PASS.

@CMGS CMGS force-pushed the fix/dbus-machine-id branch from 862edf4 to 58b86ce Compare July 3, 2026 17:05
@CMGS CMGS merged commit afb3351 into main Jul 3, 2026
2 checks passed
@CMGS CMGS deleted the fix/dbus-machine-id branch July 3, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant