Distributed-systems stability test suite + single-voter re-election fix by carlhoerberg · Pull Request #12 · 84codes/raft.cr

carlhoerberg · 2026-05-27T19:20:23Z

Follow-up to #6, split out so that PR stays focused on thread-safety. Stacked on thread-safety — review/merge that first.

What's here

Distributed-systems test coverage plus one src bug fix discovered while running it.

Single-voter re-election fix (src/raft/node.cr): a single-node cluster could not re-elect itself after restart — recover_state leaves role=Follower and start_pre_vote/become_candidate never short-circuited on the self-vote quorum, so propose() stayed stuck returning false. Added a quorum_size check after the self-vote in both paths (only fires when quorum_size==1; multi-voter clusters unaffected). Regression test in S10.
Stability test plan (docs/testing-plans/raft-cr-project-stability.md): 40 claims, 35 hypotheses, 17 scenarios.
Six in-process scenarios as specs:
- S02 deterministic sim — at-most-one-leader-per-term, 200 seeds × 200 steps, 40k invariant checks
- S05 crash-recovery durability — 30 SIGKILL iterations at random offsets
- S10 apply-once + in-order across snapshot+restart (incl. single-voter regression)
- S12 fuzz Message.from_io / LogEntry.from_io / Peer.from_io — 5,000 iters, typed errors only
- S15 server fairness — documents apply()-blocks-group constraint
- S16 replication SLO baseline
Config.random_seed to seed the election RNG for deterministic simulation.
S13 queue per-producer FIFO under partition + heal.
Jepsen: register checker :linear → :competition.

Testing

crystal spec: 108 examples, 0 failures (96 baseline + 12 new).

🤖 Generated with Claude Code

Single-voter Raft clusters could not re-elect themselves after restart: recover_state restores term/vote/peers but @ROLE defaults to Follower, and start_pre_vote/become_candidate did not short-circuit on self-vote quorum, so a recovered node with peers={self} stayed Follower forever (subsequent propose() calls returned false). Add a quorum check after the self-vote in both functions — mirrors the existing pattern in handle_pre_vote_response / handle_request_vote_response, only fires when quorum_size==1, so multi-voter clusters are unaffected. The bug was discovered while executing a new project-wide stability test plan (docs/testing-plans/raft-cr-project-stability.md — 40 claims, 35 hypotheses, 17 scenarios with adequacy/confidence sections). Six in-process scenarios are included as auto-generated specs: S02 spec/raft/simulation/at_most_one_leader_spec.cr — deterministic sim, 200 seeds × 200 steps; structural invariant "at most one leader per term"; 40k invariant checks, zero violations. S05 spec/raft/crash_recovery/persist_state_durability_spec.cr (+ durability_helper.cr) — 30 SIGKILL iterations at random offsets; asserts no leftover raft_meta.tmp, commit_index ≤ log.last_index, term ≥ 1 post-bootstrap, self in peers, voted_for ∈ {nil, 1_u64}. S10 spec/raft/apply_invariants_spec.cr — apply-once + in-order across a snapshot+restart cycle; includes a regression test for the single-voter re-election fix above. S12 spec/fuzz/message_from_io_fuzz_spec.cr — 5,000 random-byte fuzz iters against Message.from_io / LogEntry.from_io / Peer.from_io; asserts only typed errors, never panics; explicit C31 bound test. S15 spec/raft/server_fairness_spec.cr — documents the apply()-blocks- group operational constraint (Node calls StateMachine#apply synchronously on the driver fiber). S16 spec/raft/perf/replication_slo_spec.cr — establishes a single-voter propose-latency baseline (p50=25.7µs, p99=354.9µs at ~32B payload). All 108 examples pass (96 prior + 11 new + 1 F1 regression). The remaining 11 chaos-class scenarios (S01/S03/S04/S06/S07/S08/S09/S11/S13/S14/S17) remain INCONCLUSIVE-env pending Jepsen / dm-flakey / multi-node Docker infrastructure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Config#random_seed (UInt64?, default nil) lets callers fix the seed used by Node#random_election_timeout. When nil — current behaviour: Random.new (OS entropy). When set, Node uses Random.new(seed), making election- timeout choices reproducible. Closes F2 from the project-wide stability test plan: the S02 deterministic- simulation arm (200 seeds × 200 steps) drove the protocol via a seeded RNG but Node's own RNG was unseeded, so consecutive runs reported different sanity counters (199/200 vs 198/200 seeds-saw-a-leader). The structural at-most-one-leader-per-term invariant still held in both runs; the variation was a sign that the "deterministic" framing was incomplete. After this change, two back-to-back runs produce bit-identical S02 sanity counters (verified: seeds_that_saw_a_leader=198/200, max_term_observed=3, invariant_checks=40000, both runs). Per-node seeds derive from the test seed as seed * 1000 + node_id. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The existing workload uses jepsen.independent/concurrent-generator with an infinite range of keys, so even a 15-second test produces ~370-390 independent per-key histories. Knossos's :linear algorithm is exponential and OOMs on every key when many are analyzed in parallel under a 12 GB JVM heap (verified: 4 consecutive runs all reported :valid? :unknown :cause :out-of-memory on virtually every key, with empty :failures lists — no anomaly found, but no verdict either). :competition is Jepsen's heuristic linearizability search — faster and much smaller heap footprint, at the cost of being a sound-but-incomplete checker (it may pass histories that an exhaustive :linear search would flag). Switching this one word turns the chaos arm of the project test plan (S01 linearizable_writes_under_partition_and_crash) from INCONCLUSIVE-oracle-too-weak into a verifiable PASS-hardening: --time-limit 15 --concurrency 5 -e JAVA_TOOL_OPTIONS=-Xmx12g → 371/371 keys :valid? true → :failures [] → exit 0, "Everything looks good!" Nemesis (partition-random-halves, 5-second window isolating {n1,n2} | {n3,n4,n5}) landed cleanly per the Jepsen log; no anomaly observed under or after the partition. Caveat: :competition is heuristic. The reviewer should treat this as hardening evidence for "no easily-discoverable per-key linearizability violation under partition-random-halves at this scale", not as proof of correctness. For exhaustive checking the workload would need rework to use a single shared register (one or a few keys) so :linear is tractable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In-process variant of S13 from the project test plan. Drives the queue example through a synthetic partition that isolates one follower (deliver_all_except skips messages to/from the partitioned node), publishes a stream of producer-tagged messages, heals the partition, and verifies: - per-producer FIFO: each producer's tag sequence appears in publish order in the drained stream (C33) - cross-replica state equivalence: all three replicas' snapshot bytes are byte-equal after heal — they converged to the same queue state (C34) - exactly-once consume bridge: every published tag drains exactly once via the bridge on the leader (C35) - no-commit-without-quorum: a minority-partitioned leader can append entries but commit_index cannot advance until heal Workload: 3 producers × 5 messages × 2 phases = 30 messages total. Phase 1 with full delivery, phase 2 with node 3 partitioned, then heal and catch up. After catch-up, depth on all 3 replicas = 30 and snapshot bytes match exactly. This is the in-process arm of S13 (synthetic partition; no TCP-layer behavior tested). The Jepsen-driven arm would still need a queue- specific Clojure workload + the existing podman/SELinux/checker fixes applied to a new compose stack — out of scope this session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

carlhoerberg and others added 4 commits May 27, 2026 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed-systems stability test suite + single-voter re-election fix#12

Distributed-systems stability test suite + single-voter re-election fix#12
carlhoerberg wants to merge 4 commits into
thread-safetyfrom
dist-testing

carlhoerberg commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlhoerberg commented May 27, 2026

What's here

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant