Skip to content

Production droplet has drifted from ansible-managed state — reconcile or stop pretending #38

@espadonne

Description

@espadonne

Background

shithub-prod was hand-built. Several PRs since launch have updated ansible roles or scripts that should live on the droplet but were never actually deployed there because nobody runs the ansible play. We keep getting bitten by the same pattern: PR lands → repo is correct → droplet is stale → next operator (or scheduled task) hits a silent failure.

Concrete drift caught so far

Why this keeps happening

  • No `inventory/production` file exists anywhere — Mac, droplet, or repo (correctly gitignored).
  • Ansible isn't installed locally.
  • The droplet was built before the roles were written, so even one-time `ansible-playbook` runs would risk clobbering hand-tuned state (sshd config, postgres tuning, the operator git user).
  • Every change ships via `ssh + scp + systemctl` instead.

Proposal

Pick one of the three honest paths. The current state is the worst of both worlds — roles exist as documentation but aren't authoritative.

Option A — make ansible authoritative

Build the inventory file, install ansible locally (or in CI), do a dry-run `--check --diff` against the droplet to surface the diff, reconcile by hand where needed, then commit to running the play after every merge. Long-term right answer.

Cost: significant up-front work to make the play idempotent against the hand-built state without breaking anything. Risk of clobbering during reconciliation.

Option B — drop ansible, document as bash + scp

Convert each role into a sequence of `ssh "..."` commands in a deploy script (`deploy/cutover/deploy-droplet.sh`). Operator runs it after merging anything that touches droplet state. Roles get archived for "this is what we'd do for a fresh droplet" reference.

Cost: less ceremony, no ansible dep. Loses some idempotence guarantees.

Option C (this issue) — short-term reconciliation pass

Independent of A/B, audit every file the roles claim to manage against what's on the droplet right now. Write a one-shot script that pushes the deltas. Mark it as a stopgap, not a permanent answer.

Then pick A or B as a follow-up.

Acceptance for this issue

  • Audit script that lists every file the ansible roles install/template, diffs each against the live droplet, reports drift. Read-only.
  • Apply fixes for current drift (manual scp/ssh is fine, document each).
  • Decide A or B as the long-term direction; file follow-up issue.
  • Add a CI check (or make target) that fails if a PR touches `deploy/ansible/` without a matching note in the PR body about how it was deployed.

Tackling now

Per the linked PR/conversation, this issue is being addressed in the immediately-following session pass rather than deferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions