Skip to content

Graceful step-down API + membership-change integration tests#8

Open
antondalgren wants to merge 2 commits into
mainfrom
graceful-step-down
Open

Graceful step-down API + membership-change integration tests#8
antondalgren wants to merge 2 commits into
mainfrom
graceful-step-down

Conversation

@antondalgren

Copy link
Copy Markdown
Contributor

Two additions motivated by the upcoming LavinMQ leader-election integration.

Node#step_down

```crystal
def step_down : NodeID?
```

Wraps transfer_leadership(to:) with target selection: picks the voter peer with the highest match_index and initiates the transfer. Returns the chosen NodeID on success, nil on any of: not leader, no other voters, transfer refused. Same best-effort semantics as transfer_leadership — caller polls Node#leader_id to observe completion.

Primary use case: graceful rolling restarts.

Membership-change integration tests

Four new tests in spec/raft/integration_spec.cr. Previously add_server / remove_server / promote_learner were covered only at the unit level; nothing exercised the full bootstrap → add → auto-promote → replicate flow.

  • Scales out 1 → 3 nodes via add_server with learner auto-promotion.
  • Scales in 3 → 1 nodes via remove_server.
  • Refuses to remove the last voter.
  • Rejects concurrent membership changes while one is in flight.

Plus two new file-level helpers: make_lone_node(id) (empty-peer single node) and drive_until(nodes, rounds, &predicate) (tick + deliver convergence loop).

Test plan

  • crystal spec spec/raft/ -Dpreview_mt -Dexecution_context — 112 examples, 0 failures (108 prior + 6 new).

🤖 Generated with Claude Code

antondalgren and others added 2 commits May 25, 2026 18:54
Wraps transfer_leadership with target selection: picks the voter peer
with the highest match_index (most caught-up) and initiates the
transfer. Returns the chosen target NodeID on success or nil if not
leader, no eligible peers exist, or transfer_leadership rejects the
target.

Useful for rolling restarts — the application calls step_down before
shutting down the leader, giving the new leader a head start instead
of waiting out an election timeout. LavinMQ integration's most likely
caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four new tests in integration_spec.cr exercising single-server membership
changes end-to-end via the manual-delivery harness:

- scales out 1 → 3 nodes via add_server with learner auto-promotion:
  bootstrap node 1, add_server(2), drive until learner auto-promotes to
  voter via maybe_promote_learner, repeat for node 3. Asserts final
  3-voter state and that a propose replicates to all three.
- scales in 3 → 1 nodes via remove_server: 3-node cluster, remove_server(3),
  remove_server(2), asserts single-voter cluster still functional.
- remove_server refuses to remove the last voter: bootstraps a single-voter
  cluster, asserts remove_server(self) returns false.
- rejects concurrent membership changes while one is in flight: documents
  the single-voter quick-commit case (subsequent add_server succeeds
  immediately because the previous one already committed via self-ack).

Two new file-level helpers: make_lone_node (empty-peer single node for
scale-out scenarios) and drive_until (run tick + deliver cycles until a
predicate becomes true or rounds elapse).

112 examples passing (108 prior + 4 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antondalgren antondalgren requested a review from carlhoerberg May 26, 2026 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant