feat: add Swarm mode with atomic Caddyfile rollout via configs#778
feat: add Swarm mode with atomic Caddyfile rollout via configs#778oneingan wants to merge 7 commits intolucaslorentz:masterfrom
Conversation
| if err != nil { | ||
| // If another instance created it concurrently, inspect again. | ||
| if errdefs.IsConflict(err) { | ||
| return ensureSwarmConfig(ctx, dockerClient, name, data, fullHash) |
There was a problem hiding this comment.
This recursive call has no depth limit. If ConfigCreate keeps returning a conflict (e.g. a genuine name collision with different data), this will stack overflow.
Consider a bounded loop like the one in ensureServiceCaddyfileConfig just below (line 190).
| if err != nil { | ||
| log.Error("Failed to convert caddyfile into json config", zap.Error(err)) | ||
| if dockerLoader.options.SwarmMode { | ||
| if err := dockerLoader.updateSwarmService(); err != nil { |
There was a problem hiding this comment.
updateSwarmService() runs on every update() cycle, not just when caddyfileChanged is true. This means every polling interval / Docker event triggers ServiceInspectWithRaw + ConfigList even when nothing changed.
The existing flow only does expensive work (Adapt + server push) inside the if caddyfileChanged block. Could this be guarded the same way?
| defaultSwarmConfigPrefix = "caddyfile" | ||
| defaultSwarmConfigHashLen = 32 | ||
| maxSwarmConfigSizeBytes = 1000 * 1024 | ||
| ) |
There was a problem hiding this comment.
These defaults are also defined in the flag declarations in cmd.go (lines 75-86). In the existing code, defaults live only in the flag definitions. Having two sources means they can silently drift apart.
There was a problem hiding this comment.
Example:
fs.Int("swarm-config-hash-len", 32,
"Length of sha256 hex used in generated Swarm config name (swarm mode only)")
Having a shared const would be nice
| docker config ls --format "{{.Name}}" | grep "^${CONFIG_PREFIX}-" | xargs -r docker config rm >/dev/null || true | ||
| } | ||
|
|
||
| cleanup_configs |
There was a problem hiding this comment.
cleanup_configs runs here at the start but not on exit. docker stack rm (which the main test runner calls) won't remove standalone Swarm config objects created by this mode — unlike other tests, this one creates Docker objects outside the stack.
Consider adding a trap:
trap cleanup_configs EXIT|
Reviewed with Claude Opus |
|
A concern I have is the lack of config garbage collection. Each Caddyfile change creates a new Swarm config object ( Would it be possible to tag the created configs with a label identifying them as managed by this controller, and include a cleanup routine that scans for old configs with that label and removes any that are no longer mounted on the target service? |
Implements the Swarm mode discussed in #773 (and the earlier atomic-write motivation from #766).
--mode=swarm: controller-only; renders a full Caddyfile from labels and rolls it out via immutable Swarm configs.--swarm-service) by swapping the mounted config at/etc/caddy/Caddyfile(or--swarm-caddyfile-target).docker service update, so disruption is governed by the service's update/rollback settings.examples/swarm.yamland a swarm-mode test script.Notes:
<prefix>-<sha256>; old configs are not garbage-collected yet (documented in README). A GC pass can be added as follow-up.AI assistance disclosure: this PR was prepared with help from OpenCode (AI coding assistant) using OpenAI model
gpt-5.2-xhigh.