Implement socket-activated zero-downtime deploy switchover by retlehs · Pull Request #67 · roots/wp-packages

retlehs · 2026-03-25T21:55:20Z

Summary

Implement systemd socket activation for zero-downtime deploy switchover
Add Go-side LISTEN_FDS detection with fallback to normal listen for local dev
Bump Caddy retry window (lb_try_duration 5s→10s, lb_try_interval 250ms→100ms) as safety net

Why

We were seeing brief 502/503 responses during deploy because restarting the service drops the listening socket. Socket activation (wppackages.socket) keeps the socket open across service restarts — incoming connections queue at the kernel instead of failing.

Builds on #95 which separated Litestream into its own service, unblocking socket activation (the old litestream -exec wrapper wouldn't pass through the socket fd).

Changes

internal/http/server.go — systemdListener() consumes the fd passed by systemd via LISTEN_FDS/LISTEN_PID; falls back to ListenAndServe when not socket-activated (local dev)
templates/wppackages.socket.j2 — new systemd socket unit listening on {{ go_listen_addr }}
templates/wppackages.service.j2 — adds Requires=wppackages.socket
tasks/main.yml — deploys and enables the socket unit before the service
Caddyfile.j2 — retry tuning as additional safety net

Test plan

Run provision and verify wppackages.socket is active (systemctl status wppackages.socket)
Confirm wppackages.service starts via socket activation (journalctl -u wppackages shows "using systemd socket activation")
Run deploy and monitor for 502/503 elimination during switchover
Verify local dev still works without socket activation (normal make dev)

🤖 Generated with Claude Code

swalkinshaw · 2026-04-03T03:48:02Z

🤔 not sure this is entirely correct or solves the problem. I think partly its because we're wrapping the Go command with litestream -exec when really we should separate out litestream into its own service. In fact litestream might not even pass through the socket fd to our binary...

Assuming we separate them, not even sure the readiness checks in Ansible are needed or the Caddy retries 🤔

swalkinshaw · 2026-04-03T03:48:54Z

Though the readiness checks aren't bad anyway just to be safe

Systemd socket activation keeps the listening socket open across service restarts so connections queue at the kernel instead of getting 503s from Caddy. The Go server detects LISTEN_FDS and uses the inherited fd, falling back to normal listen for local dev. Caddy retry window bumped as a safety net. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

retlehs self-assigned this Mar 25, 2026

retlehs changed the title ~~Increase Caddy retry window to reduce deploy 502s~~ Implement socket-activated zero-downtime deploy switchover Apr 2, 2026

retlehs mentioned this pull request Apr 4, 2026

Separate Litestream into its own systemd service #95

Merged

4 tasks

retlehs force-pushed the fix/zero-downtime-deploy branch from e8fc77d to a62be24 Compare April 4, 2026 16:07

swalkinshaw approved these changes Apr 4, 2026

View reviewed changes

retlehs merged commit e276195 into main Apr 4, 2026
5 checks passed

retlehs deleted the fix/zero-downtime-deploy branch April 4, 2026 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement socket-activated zero-downtime deploy switchover#67

Implement socket-activated zero-downtime deploy switchover#67
retlehs merged 1 commit intomainfrom
fix/zero-downtime-deploy

retlehs commented Mar 25, 2026 •

edited

Loading

Uh oh!

swalkinshaw commented Apr 3, 2026

Uh oh!

swalkinshaw commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

retlehs commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Test plan

Uh oh!

swalkinshaw commented Apr 3, 2026

Uh oh!

swalkinshaw commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

retlehs commented Mar 25, 2026 •

edited

Loading