Production droplet has drifted from ansible-managed state — reconcile or stop pretending

## Background

shithub-prod was hand-built. Several PRs since launch have updated ansible roles or scripts that *should* live on the droplet but were never actually deployed there because nobody runs the ansible play. We keep getting bitten by the same pattern: PR lands → repo is correct → droplet is stale → next operator (or scheduled task) hits a silent failure.

### Concrete drift caught so far

- **PR #21** updated \`backup-daily.sh\` and \`sync-cross-region.sh\` to use \`/etc/rclone-shithub.conf\`. The repo got the fix; the droplet kept the old path \`/root/.config/rclone/rclone.conf\`. Daily Spaces upload silently failed for ~2 weeks. Local dumps and WAL archive masked it.
- **PR #28** added the \`monitoring-client\` ansible role (Alloy + node_exporter). Role works but never ran against the droplet — had to install everything by hand over SSH.
- **PR #32** committed \`enable_compression = false\` in the alloy template. Live droplet has the fix because we manually edited \`config.alloy\`, but the next \`apt upgrade alloy\` or systemd reload won't reapply ansible state.
- **PR #34** moves \`/metrics\` out of the Compress middleware in shithubd. This deploys via the binary deploy pipeline (works), but related ansible role changes don't.

### Why this keeps happening

- No \`inventory/production\` file exists anywhere — Mac, droplet, or repo (correctly gitignored).
- Ansible isn't installed locally.
- The droplet was built before the roles were written, so even one-time \`ansible-playbook\` runs would risk clobbering hand-tuned state (sshd config, postgres tuning, the operator git user).
- Every change ships via \`ssh + scp + systemctl\` instead.

## Proposal

Pick one of the three honest paths. The current state is the worst of both worlds — roles exist as documentation but aren't authoritative.

### Option A — make ansible authoritative
Build the inventory file, install ansible locally (or in CI), do a dry-run \`--check --diff\` against the droplet to surface the diff, reconcile by hand where needed, then commit to running the play after every merge. Long-term right answer.

Cost: significant up-front work to make the play idempotent against the hand-built state without breaking anything. Risk of clobbering during reconciliation.

### Option B — drop ansible, document as bash + scp
Convert each role into a sequence of \`ssh \"...\"\` commands in a deploy script (\`deploy/cutover/deploy-droplet.sh\`). Operator runs it after merging anything that touches droplet state. Roles get archived for "this is what we'd do for a fresh droplet" reference.

Cost: less ceremony, no ansible dep. Loses some idempotence guarantees.

### Option C (this issue) — short-term reconciliation pass
Independent of A/B, audit every file the roles claim to manage against what's on the droplet right now. Write a one-shot script that pushes the deltas. Mark it as a stopgap, not a permanent answer.

Then pick A or B as a follow-up.

## Acceptance for this issue

- [ ] Audit script that lists every file the ansible roles install/template, diffs each against the live droplet, reports drift. Read-only.
- [ ] Apply fixes for current drift (manual scp/ssh is fine, document each).
- [ ] Decide A or B as the long-term direction; file follow-up issue.
- [ ] Add a CI check (or make target) that fails if a PR touches \`deploy/ansible/\` without a matching note in the PR body about how it was deployed.

## Tackling now
Per the linked PR/conversation, this issue is being addressed in the immediately-following session pass rather than deferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production droplet has drifted from ansible-managed state — reconcile or stop pretending #38

Background

Concrete drift caught so far

Why this keeps happening

Proposal

Option A — make ansible authoritative

Option B — drop ansible, document as bash + scp

Option C (this issue) — short-term reconciliation pass

Acceptance for this issue

Tackling now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Production droplet has drifted from ansible-managed state — reconcile or stop pretending #38

Description

Background

Concrete drift caught so far

Why this keeps happening

Proposal

Option A — make ansible authoritative

Option B — drop ansible, document as bash + scp

Option C (this issue) — short-term reconciliation pass

Acceptance for this issue

Tackling now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions