Skip to content

Node#add_server: idempotent for returning members#30

Merged
antondalgren merged 1 commit into
mainfrom
add-server-idempotent
Jun 15, 2026
Merged

Node#add_server: idempotent for returning members#30
antondalgren merged 1 commit into
mainfrom
add-server-idempotent

Conversation

@antondalgren

Copy link
Copy Markdown
Contributor

Summary

`add_server` previously returned `false` for any already-known node id. That made it impossible to re-admit a returning member, and turned a benign rejoin into a hard error at the consumer's HTTP layer.

New behavior:

  • New id → adds as Learner, returns true (unchanged).
  • Known id, different non-empty address → updates the stored address and replicates the new configuration, returns true.
  • Known id, same address (or empty address passed) → no-op, returns true (no log entry).
  • node_id == @id → returns false (self-guard, mirrors `remove_server`).
  • Still returns false if not leader or a configuration change is already in flight.

Address updates preserve the peer's existing role — a returning voter stays a voter rather than being silently demoted to learner.

Why

Fixes the LavinMQ rejoin crash-loop: a wiped node whose id is still in the leader's configuration received `add_server` → `false` → HTTP 400 → exhausted its retry budget and never rejoined. The common rejoin case (same id, same address) is now a true no-op; an address change replicates correctly.

Notes

  • The idempotency key is the address. Same-address rejoin is a silent no-op (intended — that's the crash-loop fix); a different non-empty address replicates an address-only configuration update.
  • Bundles three cosmetic block-pass refactors ({ |p| p.foo }(&.foo)) in `cancel_pending_reads`, `persist_state`, `serialize_peers` — behavior-identical.

Test plan

  • `crystal spec spec/raft/node_spec.cr -Dpreview_mt -Dexecution_context` — 47/47
  • `crystal spec spec/raft/ -Dpreview_mt -Dexecution_context -Draft_debug` — 134/134
  • `crystal tool format --check` clean; ameba `--fail-level error` exit 0

🤖 Generated with Claude Code

- Known id + new non-empty address: replaces the peer in the list with
  the same id/role but new address, appends a configuration entry so
  the change replicates, returns true.
- Known id + same or empty address: no-op, returns true (no log entry).
- Known id == own id: returns false (self-guard, mirrors remove_server).
- New id: unchanged (adds as Learner, returns true).

Fixes the LavinMQ rejoin crash-loop where a wiped node whose id is still
in the leader's configuration received add_server→false→HTTP 400 and
exhausted its retry budget.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@antondalgren antondalgren merged commit 1c3297f into main Jun 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant