Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
2c89ed4
feat(cli): add pg_dump snapshot, restore, and retention helpers
level09 Apr 17, 2026
8539c7f
fix(cli): restart services if pg_restore fails during bayanat restore
level09 Apr 17, 2026
aa77971
refactor(cli): use while-read pattern in list_snapshots (matches prun…
level09 Apr 17, 2026
a1b8f4b
refactor(cli): make SNAPSHOT_RETENTION_COUNT readonly
level09 Apr 17, 2026
f3ead09
feat(cli): add update state file and PID lock primitives
level09 Apr 17, 2026
bfd5b31
feat(cli): add recover_state phase-dispatch for update crash recovery
level09 Apr 17, 2026
ebfbdfc
refactor(cli): mark state/lock file constants readonly
level09 Apr 17, 2026
c21913d
feat(cli): add update health probe helpers (socket + db + redis)
level09 Apr 17, 2026
f73b395
feat(cli): update pipeline PREPARE phase
level09 Apr 17, 2026
820c658
feat(cli): update pipeline MIGRATE phase
level09 Apr 17, 2026
d7dd110
feat(cli): update pipeline SWITCH, VERIFY, and ROLLBACK_CODE
level09 Apr 17, 2026
473b047
feat(cli): wire update/snapshots/restore subcommands and extend status
level09 Apr 17, 2026
f2e3c43
feat(installer): create /opt/bayanat/state directory for update CLI
level09 Apr 17, 2026
ea544b9
feat(installer): install bayanat-start-update wrapper and new sudoers…
level09 Apr 17, 2026
06a8b61
feat(api): add /health readiness endpoint for bayanat updater
level09 Apr 17, 2026
bacd7ca
feat(config): add AUTO_APPLY_PATCH_UPDATES default (off)
level09 Apr 17, 2026
211d362
feat(tasks): periodic update check with opt-in patch auto-apply
level09 Apr 17, 2026
639c6fa
feat(api): admin endpoints for update availability, start, and status
level09 Apr 17, 2026
a928922
feat(ui): UpdateBanner and UpdateProgressDialog components
level09 Apr 17, 2026
c2d9b21
feat(ui): snapshots list page (read-only)
level09 Apr 17, 2026
d4e07cd
docs: operator runbook for bayanat update
level09 Apr 17, 2026
a6e7235
fix(cli): recover mid-phase crashes and escalate snapshot failures
level09 Apr 17, 2026
0dc7af0
fix(cli): clean up leaked .partial snapshot files during retention prune
level09 Apr 17, 2026
5781cf5
fix(cli): preserve v-prefix on git tags; strip only for comparison
level09 Apr 18, 2026
6d7814f
fix(cli): generate POSTGRES_* in .env + add fallback defaults in pg h…
level09 Apr 18, 2026
4ffa99f
fix(tasks): read AUTO_APPLY_PATCH_UPDATES via settings.Config + wire …
level09 Apr 18, 2026
03bd58d
docs(runbook): use sudo for update/snapshots/restore (require_root)
level09 Apr 18, 2026
b4d5e27
fix(installer): use pg socket by default, self-install CLI to /usr/lo…
level09 Apr 18, 2026
3df979e
fix(cli): use socket peer auth for pg_dump/psql/pg_restore when local…
level09 Apr 18, 2026
487a1b2
fix(cli): chown cloned release to bayanat so _install_deps (uv sync) …
level09 Apr 18, 2026
db0e227
fix(cli): redirect snapshot log to stderr so STATE_SNAPSHOT captures …
level09 Apr 18, 2026
f658b23
fix(security): copy CLI from release (not $0) and require fresh auth …
level09 Apr 18, 2026
a69b612
fix(security): safe .env parser (no eval) + correct snapshots-UI rest…
level09 Apr 18, 2026
d04dbbd
test: add end-to-end auto-update test script for Hetzner VMs
level09 Apr 18, 2026
036c500
fix(tests): add AUTO_APPLY_PATCH_UPDATES to TestConfig
level09 Apr 19, 2026
06856c7
Potential fix for pull request finding 'CodeQL / Information exposure…
level09 Apr 19, 2026
8615ff1
fix(tests): call .run() on Celery tasks to bypass ContextTask app init
level09 Apr 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
573 changes: 567 additions & 6 deletions bayanat

Large diffs are not rendered by default.

128 changes: 128 additions & 0 deletions docs/deployment/auto-update-runbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Bayanat Auto-Update Runbook

Short operator reference for the `bayanat update` flow. Design notes live
in the development spec (not shipped with the repo).

## Triggering an update

- **One-click from UI:** an admin-role user clicks "Update now" from the
"Update available: X.Y.Z" banner in the nav bar.
- **From the shell (as root):** `sudo bayanat update [<tag>]`
(defaults to the latest GitHub release). The CLI requires root to stop
/ start services, write `/opt/bayanat`, and take snapshots.
- **Check only (no changes, no root):** `sudo -u bayanat bayanat update --check`.

The update runs as `bayanat-update.service`, a transient systemd unit
that outlives Flask restarts, SSH disconnects, and browser closes. Tail
live logs with:

```
sudo journalctl -u bayanat-update -f
```

## Opt-in auto-apply for patch releases

In the admin UI under System Administration, toggle "Auto-apply patch
releases" on. With the toggle on, any bump within the same minor line
(e.g. `4.1.0` to `4.1.1`) installs silently every 6 hours via the same
pipeline. Minor and major bumps (e.g. `4.1.x` to `4.2.0`) always notify
and wait for a manual click.

## Expected timing

| Phase | Duration | Production impact |
|---|---|---|
| PREPARE (fetch + deps) | 1-5 min | None, old version serves traffic |
| Stop services | ~3 s | 502 from Caddy begins |
| Snapshot (`pg_dump -Fc`) | 10-60 s | 502 |
| Migrate (`flask db upgrade`) | 1-30 s | 502 |
| Swap + start services | ~5 s | 502 |
| Verify (health probe) | 1-10 s | New version serving |
| **Total visible downtime** | **~30-90 s** | |

Caddy returns `502 Bad Gateway` during the maintenance window. Browsers
retry automatically; partners see a brief "service unavailable" view.

## If something goes wrong

### Migration failed (Alembic transaction rolled back)

Nothing to do. Services restart on the previous release automatically.
The UI shows the `error` field. Report the broken release; the previous
version keeps running.

### Health probe failed after swap (auto-rollback succeeded)

Nothing to do. The updater reverted the symlink and restarted on the
previous release. The pre-update snapshot is retained at
`/opt/bayanat/shared/backups/`.

### NEEDS_INTERVENTION

This state only happens when two independent failures compound: the new
release was broken AND rolling back did not reach a healthy state. The
maintenance flag stays up so users see a 502 instead of raw errors.
Recover:

```
sudo -u bayanat bayanat status # read-only; confirm state
sudo bayanat snapshots # list snapshots (needs root)
sudo bayanat restore pre-<ts>.dump # restores DB (needs root)
sudo systemctl start bayanat bayanat-celery
```

Then file a bug with journal logs from `journalctl -u bayanat-update`.

### Stuck state (process died, state file orphaned)

```
sudo bayanat update --recover
```

## Snapshots

- Location: `/opt/bayanat/shared/backups/pre-*.dump`
- Format: `pg_dump -Fc` (PostgreSQL custom format)
- Retention: last 5 snapshots OR last 30 days, whichever is greater
- Override retention: `export BAYANAT_SNAPSHOT_RETENTION_DAYS=60`
- List: `sudo bayanat snapshots` or visit `/admin/snapshots/` in the UI
(read-only)
- Restore: `sudo bayanat restore <name>` (prompts for confirmation;
stops services; pipes through `pg_restore --clean --if-exists`;
restarts services). Requires root. Not available from the web UI by
design.

## Files

| Path | Purpose |
|---|---|
| `/usr/local/bin/bayanat` | The CLI script |
| `/usr/local/sbin/bayanat-start-update` | Root wrapper the UI invokes via sudo |
| `/etc/sudoers.d/bayanat` | Granted commands for the `bayanat` user |
| `/opt/bayanat/state/update.json` | Current update state (sanitized JSON) |
| `/opt/bayanat/state/update.lock` | PID lock file |
| `/opt/bayanat/shared/backups/` | Pre-update snapshots |
| `/health` (Flask endpoint) | 200 = DB + Redis reachable |

## Admin UI surface

- Nav-bar banner chip: shows when `latest != current`
- Progress dialog: polls `/admin/api/updates/status` every 2 s during an
active update
- Settings toggle: System Administration -> "Auto-apply patch releases"
- Snapshots page: `/admin/snapshots/` (read-only list; restore stays on
the CLI)

## Manual CLI reference

Commands marked `(root)` require `sudo bayanat ...`; the others can run
as the app user via `sudo -u bayanat bayanat ...`.

```
bayanat update [<tag>] (root) default: latest GitHub release
bayanat update --check show current vs latest; no changes
bayanat update --recover (root) recover a stuck state file
bayanat snapshots (root) list pre-update snapshots
bayanat restore <name> (root) interactive restore from a snapshot
bayanat status version + services + update state
```
229 changes: 229 additions & 0 deletions e2e-auto-update.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
#!/usr/bin/env bash
#
# End-to-end test for the `bayanat update` pipeline on a disposable
# Hetzner VM. Provisions → installs → runs S1-S4 → teardown.
#
# Requires:
# - hcloud CLI authenticated to the right project (see `hcloud context list`)
# - ssh-agent loaded with the private key matching the registered hcloud key
# - a public test fork with tags v4.0.0 (baseline), v4.0.1 (additive),
# v4.0.2 (bad migration), v4.0.3 (/health 503 at runtime), v4.0.4 (recovery)
#
# Usage:
# ./e2e-auto-update.sh # full run: provision → S1-S4 → destroy
# KEEP_VM=1 ./e2e-auto-update.sh # leave VM running at end
# VM_IP=1.2.3.4 ./e2e-auto-update.sh # reuse an existing VM (skip provision)
# SCENARIOS="S1 S2" ./e2e-auto-update.sh # run a subset
# TEST_FORK=you/yourfork ./e2e-auto-update.sh
#
set -euo pipefail

# --- Config ---
TEST_FORK="${TEST_FORK:-level09/bayanat-update-test}"
SSH_KEY="${SSH_KEY:-level09@Black09}"
SERVER_TYPE="${SERVER_TYPE:-cpx22}"
LOCATION="${LOCATION:-nbg1}"
SCENARIOS="${SCENARIOS:-S1 S2 S3 S4}"
KEEP_VM="${KEEP_VM:-0}"
VM_IP="${VM_IP:-}"
SERVER_NAME=""

SSHOPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5"

log() { printf '\n\033[1;34m[%s] %s\033[0m\n' "$(date +%H:%M:%S)" "$*"; }
pass() { printf '\033[1;32m ✓ %s\033[0m\n' "$*"; }
fail() { printf '\033[1;31m ✗ %s\033[0m\n' "$*" >&2; exit 1; }

on_vm() { ssh $SSHOPTS "root@$VM_IP" "$@"; }

# --- Prereqs ---
command -v hcloud >/dev/null || { echo "hcloud CLI not found"; exit 2; }
command -v gh >/dev/null || { echo "gh CLI not found (needed to rewrite tags)"; exit 2; }
git ls-remote --tags "https://github.com/$TEST_FORK.git" >/dev/null \
|| { echo "test fork $TEST_FORK not reachable"; exit 2; }

# --- Tag ladder prep: hide upper tags so installer picks v4.0.0 ---
stash_upper_tags() {
log "Hiding upper tags on $TEST_FORK so installer picks v4.0.0"
for t in v4.0.1 v4.0.2 v4.0.3 v4.0.4; do
gh api -X DELETE "repos/$TEST_FORK/git/refs/tags/$t" 2>&1 \
| grep -v '^$' | head -1 || true
done
}

restore_upper_tags() {
log "Restoring upper tags v4.0.1..v4.0.4"
# Must push via a configured remote (SSH auth), not an HTTPS URL.
for t in v4.0.1 v4.0.2 v4.0.3 v4.0.4; do
if ! git show-ref --tags --verify --quiet "refs/tags/$t"; then
echo " LOCAL TAG MISSING: $t (run the rebase block in the README)"; continue
fi
git push test-fork "+refs/tags/$t:refs/tags/$t" 2>&1 | tail -1
done
# Sanity: confirm remote has them
local remote_tags
remote_tags=$(git ls-remote --tags test-fork | awk '{print $2}' | grep -E 'v4\.0\.[1-4]$' | wc -l | tr -d ' ')
[[ "$remote_tags" == "4" ]] || fail "expected 4 upper tags on remote, found $remote_tags"
pass "4 upper tags visible on remote"
}

# --- Provision ---
provision() {
SERVER_NAME="bayanat-update-test-$(date +%Y%m%d-%H%M%S)"
log "Provisioning Hetzner VM $SERVER_NAME ($SERVER_TYPE in $LOCATION)"
VM_IP=$(hcloud server create \
--name "$SERVER_NAME" \
--type "$SERVER_TYPE" \
--image ubuntu-24.04 \
--ssh-key "$SSH_KEY" \
--location "$LOCATION" \
-o json | python3 -c 'import json,sys; print(json.load(sys.stdin)["server"]["public_net"]["ipv4"]["ip"])')
log "IP: $VM_IP"
log "Waiting for SSH..."
until on_vm 'true' 2>/dev/null; do sleep 3; done
pass "SSH ready"
}

teardown() {
if [[ "$KEEP_VM" == "1" ]]; then
log "KEEP_VM=1 — leaving $SERVER_NAME (IP $VM_IP) alive"
return
fi
if [[ -n "$SERVER_NAME" ]]; then
log "Destroying $SERVER_NAME"
hcloud server delete "$SERVER_NAME" >/dev/null
pass "destroyed"
fi
}

# --- Install ---
install_baseline() {
log "Installing v4.0.0 via curl | sudo bash -s install (validates \$0-free install)"
on_vm 'echo "BAYANAT_REPO='"$TEST_FORK"'" >> /etc/environment'
on_vm 'curl -fsSL https://raw.githubusercontent.com/'"$TEST_FORK"'/v4.0.0/bayanat | BAYANAT_REPO='"$TEST_FORK"' sudo -E bash -s install localhost' \
>/tmp/e2e-install.log 2>&1 \
|| { tail -30 /tmp/e2e-install.log; fail "install failed"; }
pass "install succeeded"

# Work around the SETUP_COMPLETE gating until that lands in installer
on_vm 'echo "BAYANAT_CONFIG_FILE=/opt/bayanat/shared/config.json" >> /opt/bayanat/shared/.env
echo "{\"SETUP_COMPLETE\": true}" > /opt/bayanat/shared/config.json
chown bayanat:bayanat /opt/bayanat/shared/config.json
systemctl restart bayanat bayanat-celery'
sleep 3

local health
health=$(on_vm 'curl -s --unix-socket /opt/bayanat/current/bayanat.sock http://localhost/health')
[[ "$health" == *'"status":"ok"'* ]] || fail "/health not ok: $health"
pass "/health OK: $health"

local cur
cur=$(on_vm 'bayanat status | grep "Current version" | awk "{print \$3}"')
[[ "$cur" == "v4.0.0" ]] || fail "expected v4.0.0, got $cur"
pass "installed version: $cur"
}

# --- Scenario helpers ---
assert_version() {
local expected="$1"
local actual
actual=$(on_vm 'bayanat status | grep "Current version" | awk "{print \$3}"')
[[ "$actual" == "$expected" ]] || fail "expected version $expected, got $actual"
pass "version = $expected"
}

assert_state() {
local expected="$1"
local actual
actual=$(on_vm 'bayanat status | grep "Update state" | awk "{print \$3}"')
[[ "$actual" == "$expected" ]] || fail "expected state $expected, got $actual"
pass "update state = $expected"
}

assert_state_file_phase() {
local expected="$1"
local phase
phase=$(on_vm 'python3 -c "import json; print(json.load(open(\"/opt/bayanat/state/update.json\")).get(\"phase\",\"\"))"' 2>/dev/null || echo "")
[[ "$phase" == "$expected" ]] || fail "expected state file phase $expected, got '$phase'"
pass "state file phase = $expected"
}

assert_services_active() {
on_vm 'systemctl is-active --quiet bayanat bayanat-celery caddy' \
|| fail "services not all active"
pass "services all active"
}

clear_state_file() {
on_vm 'rm -f /opt/bayanat/state/update.json /opt/bayanat/state/update.lock'
}

run_update() {
local tag="$1"
log " -> bayanat update $tag"
on_vm 'sudo BAYANAT_REPO='"$TEST_FORK"' /usr/local/bin/bayanat update '"$tag" \
>/tmp/e2e-update.log 2>&1 || true # we inspect state, exit code is scenario-dependent
tail -5 /tmp/e2e-update.log | sed 's/^/ /'
}

# --- Scenarios ---
S1() {
log "S1: happy path v4.0.0 -> v4.0.1"
run_update v4.0.1
assert_version v4.0.1
assert_state IDLE
assert_services_active
on_vm 'sudo -u bayanat psql -d bayanat -c "\d bulletin" | grep -q auto_update_test' \
|| fail "auto_update_test column missing"
pass "auto_update_test column present"
}

S2() {
log "S2: bad migration v4.0.1 -> v4.0.2 -> NEEDS_INTERVENTION"
run_update v4.0.2
assert_version v4.0.1
assert_state_file_phase NEEDS_INTERVENTION
assert_services_active
clear_state_file
}

S3() {
log "S3: bad /health v4.0.1 -> v4.0.3 -> ROLLED_BACK"
run_update v4.0.3
assert_version v4.0.1
assert_state IDLE
assert_services_active
local health
health=$(on_vm 'curl -s --unix-socket /opt/bayanat/current/bayanat.sock http://localhost/health')
[[ "$health" == *'"status":"ok"'* ]] || fail "/health not ok after rollback"
pass "/health back to OK after rollback"
}

S4() {
log "S4: recovery v4.0.1 -> v4.0.4"
run_update v4.0.4
assert_version v4.0.4
assert_state IDLE
assert_services_active
on_vm 'sudo -u bayanat psql -d bayanat -c "\d bulletin" | grep -q auto_update_recovery_test' \
|| fail "auto_update_recovery_test column missing"
pass "auto_update_recovery_test column present"
}

# --- Main ---
trap 'restore_upper_tags; teardown' EXIT

if [[ -z "$VM_IP" ]]; then
stash_upper_tags
provision
install_baseline
restore_upper_tags
else
log "Reusing existing VM at $VM_IP"
fi

for s in $SCENARIOS; do
"$s"
done

log "ALL PASSED"
28 changes: 28 additions & 0 deletions enferno/admin/templates/admin/snapshots.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{% extends 'layout.html' %} {% block content %}

<v-main>
<v-container fluid>
<snapshots-list></snapshots-list>
</v-container>
</v-main>

{% endblock %} {% block js %}
<script src="/static/js/components/SnapshotsList.js?v=1"></script>
<script nonce="{{ csp_nonce() }}">
const {createApp} = Vue;
const {createVuetify} = Vuetify;
const vuetify = createVuetify(vuetifyConfig);

const app = createApp({
delimiters: delimiters,
mixins: [globalMixin],
data: () => ({}),
});
app.component('SnapshotsList', SnapshotsList);
app.use(router).use(vuetify);
router.isReady().then(() => {
app.mount('#app');
window.app = app;
});
</script>
{% endblock %}
2 changes: 2 additions & 0 deletions enferno/admin/templates/nav-bar.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
<v-spacer></v-spacer>

<template v-slot:append>
<update-banner @update-started="$refs.progressDialog && $refs.progressDialog.open()"></update-banner>
<update-progress-dialog ref="progressDialog"></update-progress-dialog>
{% if config.OCR_PROVIDER == 'google_vision' %}
<v-tooltip location="bottom">
<template v-slot:activator="{ props }">
Expand Down
Loading
Loading