fix #3231 prevent snapshot corruption by nowheresly · Pull Request #3232 · moby/swarmkit

nowheresly · 2026-04-09T16:54:32Z

fixes dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send) #3231

- What I did

Fixed snapshot corruption that occurs when the swarm state is large enough that the raft message struct overhead exceeds GRPCMaxMsgSize.

- How I did it

Copied the Snapshot struct before slicing chunk data. The splitSnapshotData loop was assigning sub-slices of Snapshot.Data directly through a shared pointer, which mutated the original message's snapshot data across iterations. Each chunk now gets its own copy of the Snapshot struct so the original data remains intact for correct sub-slicing.

- How to test it

go test ./manager/state/raft/transport/ -run TestSplitSnapshotData
— verifies both normal snapshots and the large-metadata scenario that previously panicked/corrupted data.

- Description for the changelog

Prevent corruption of Raft snapshots when swarm state is large. Snapshot splitting now correctly copies the snapshot struct per chunk to avoid data corruption. (dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send) #3231)

nowheresly · 2026-04-18T06:20:15Z

@corhere any chance someone could review this PR before the next release of moby? Any help needed?

Copilot

Pull request overview

Fixes raft snapshot splitting in SwarmKit transport to avoid snapshot corruption/panics when raft messages become too large for the configured gRPC max message size.

Changes:

Add a lower-bound behavior to the computed snapshot chunk payload size used for splitting.
Avoid mutating the original raftpb.Message.Snapshot.Data across chunk iterations by copying the Snapshot struct per chunk.
Add regression tests for snapshot splitting and payload size behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`manager/state/raft/transport/peer.go`	Adjusts snapshot chunk sizing and fixes snapshot chunking to avoid pointer/slice mutation across iterations.
`manager/state/raft/transport/transport_test.go`	Adds tests covering snapshot reassembly correctness and payload size minimum behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T16:15:53Z

 func raftMessagePayloadSize(m *raftpb.Message) int {
-	return GRPCMaxMsgSize - raftMessageStructSize(m)
+	s := GRPCMaxMsgSize - raftMessageStructSize(m)
+	if s < 64<<10 {
+		s = 64 << 10
+	}
+	return s


raftMessagePayloadSize currently floors the payload to 64 KiB for any computed value < 64 KiB. If the struct overhead is close to (but still below) GRPCMaxMsgSize, this increases the chunk size beyond what fits in a gRPC message and can turn a previously-sendable snapshot into one that always hits ResourceExhausted. Consider only applying the 64 KiB fallback when the computed size is <= 0 (or otherwise keep the computed positive size), so chunking still respects GRPCMaxMsgSize whenever it’s possible to do so.

Copilot · 2026-04-20T16:15:53Z

+	t.Run("LargeMetadataDoesNotPanic", func(t *testing.T) {
+		// Simulate a snapshot where struct overhead exceeds GRPCMaxMsgSize.
+		// This is the scenario from the bug: many cluster objects cause
+		// large raft message metadata, making payloadSize negative without
+		// the fix.
+		data := make([]byte, 5<<20) // 5 MiB snapshot data
+		for i := range data {
+			data[i] = byte(i % 256)
+		}
+		m := raftpb.Message{
+			Type: raftpb.MsgSnap,
+			From: 1,
+			To:   2,
+			Snapshot: &raftpb.Snapshot{
+				Data: data,
+				Metadata: raftpb.SnapshotMetadata{
+					Index: uint64(len(data)),
+					// Large ConfState to push struct size above GRPCMaxMsgSize
+					ConfState: raftpb.ConfState{
+						Voters: make([]uint64, 1<<20),
+					},
+				},
+			},
+		}
+		// Must not panic
+		msgs := splitSnapshotData(ctx, &m)
+		assert.NotEmpty(t, msgs)


The test intends to simulate “struct overhead exceeds GRPCMaxMsgSize”, but ConfState.Voters is filled with zero values; protobuf varint encoding makes these extremely small (likely ~2 bytes/value including the tag), so this may not actually push the message overhead past 4 MiB. To make the test robust, assert the precondition (e.g., raftMessageStructSize(&m) > GRPCMaxMsgSize) or populate Voters with sufficiently large non-zero IDs so the encoded size truly exceeds the limit.

corhere

Clamped raftMessagePayloadSize to a minimum of 64 KiB. When a large cluster
state causes the raft message struct overhead to exceed GRPCMaxMsgSize, the
computed payload size went negative, leading to corruption during snapshot
splitting. The payload size is now floored at 64 KiB to guarantee forward
progress.

This doesn't make sense to me. No amount of splitting the snapshot data will yield a message small enough to send if the message overhead alone exceeds GRPCMaxMsgSize. Are you sure this is a real issue?

corhere · 2026-04-20T16:30:24Z

+	// Capture the full snapshot data before the loop, because
+	// we need to copy the Snapshot struct for each chunk to avoid
+	// mutating m.Snapshot.Data through the shared pointer.
+	fullData := m.Snapshot.Data


I don't understand why another copy of the slice is needed. Cloning raftMsg.Snapshot is sufficient to ensure that assigning to raftMsg.Snapshot.Data does not mutate m.Snapshot.Data, therefore fullData == m.Snapshot.Data for all loop iterations.

corhere · 2026-04-20T16:48:31Z

+	bigEntry := raftpb.Entry{Data: make([]byte, GRPCMaxMsgSize)}
+	m := &raftpb.Message{
+		Type:    raftpb.MsgSnap,
+		Entries: []raftpb.Entry{bigEntry},


How realistic is this scenario? Is it possible in practice for a MsgSnap message to be constructed with a non-empty Entries field?

The shared *Snapshot pointer is the real cause of moby#3231: the shallow copy of the raft message meant re-slicing raftMsg.Snapshot.Data also re-sliced m.Snapshot.Data, shrinking its capacity until a later iteration panicked with "slice bounds out of range". Cloning the Snapshot struct alone is sufficient. Drop the 64 KiB payload-size clamp (which could not produce a sendable message when struct overhead already exceeds GRPCMaxMsgSize) and the redundant fullData capture. Replace the tests with a single regression test that feeds a realistic MsgSnap of 3*GRPCMaxMsgSize and asserts the input message's Snapshot.Data is untouched after splitting. Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

nowheresly · 2026-04-20T18:46:45Z

Thanks @corhere for the review.
You're right on all three counts. The actual bug is the shared *Snapshot pointer causing m.Snapshot.Data to be re-sliced each iteration, which shrinks its capacity and panics on a later chunk. Cloning the Snapshot struct is sufficient to fix it; the fullData capture and the 64 KiB minimum were leftover from an earlier (wrong) hypothesis and I'll drop them. I'll also rewrite the test to reflect a realistic MsgSnap and assert the split chunks reassemble correctly without mutating the input message.

The shared *Snapshot pointer is the real cause of moby#3231: the shallow copy of the raft message meant re-slicing raftMsg.Snapshot.Data also re-sliced m.Snapshot.Data, shrinking its capacity until a later iteration panicked with "slice bounds out of range". Cloning the Snapshot struct alone is sufficient. Drop the 64 KiB payload-size clamp (which could not produce a sendable message when struct overhead already exceeds GRPCMaxMsgSize) and the redundant fullData capture. Replace the tests with a single regression test that feeds a realistic MsgSnap of 3*GRPCMaxMsgSize and asserts the input message's Snapshot.Data is untouched after splitting. Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>

corhere

The code looks great; no notes! I just need you to squash the changes down to a single commit and update the PR description before I can approve and merge. Thank you so much for identifying and fixing this bug!

nowheresly · 2026-04-21T07:11:33Z

Thank you @corhere .
Commits squashed.

corhere · 2026-04-21T17:57:26Z

@nowheresly one last thing before merging: the commit message. Could you please do the following?

remove the issue number from the message
- Mentioning issue numbers in commit messages results in the issue timeline getting spammed with a link to every copy of the commit in every public fork of Swarmkit on GitHub.
fix up the line wrapping and indentation
insert a blank line before the Signed-off-by trailer

nowheresly · 2026-04-21T18:00:12Z

just updated my commit. Please let me know if it's fine now.

corhere · 2026-04-21T18:02:30Z

@nowheresly lines in the commit message should be wrapped to 72 chars, like how the commit messages were formatted before squashing.

Clone raft Snapshot so that re-slicing Snapshot.Data does not mutate the initial Snapshot.Data through the shared pointer. Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>

nowheresly · 2026-04-21T18:06:23Z

Is it better now?

corhere

Perfect; thank you!

nowheresly mentioned this pull request Apr 10, 2026

dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send) #3231

Closed

corhere requested review from Copilot, corhere and dperny April 20, 2026 16:09

Copilot started reviewing on behalf of corhere April 20, 2026 16:13 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

corhere reviewed Apr 20, 2026

View reviewed changes

nowheresly force-pushed the fix3231 branch from b576267 to 680de74 Compare April 20, 2026 21:45

corhere reviewed Apr 20, 2026

View reviewed changes

nowheresly force-pushed the fix3231 branch 2 times, most recently from 5afb846 to 6d7f722 Compare April 21, 2026 07:10

corhere approved these changes Apr 21, 2026

View reviewed changes

nowheresly force-pushed the fix3231 branch from 6d7f722 to 86a9ce2 Compare April 21, 2026 17:59

fix prevent snapshot corruption

069a030

Clone raft Snapshot so that re-slicing Snapshot.Data does not mutate the initial Snapshot.Data through the shared pointer. Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>

nowheresly force-pushed the fix3231 branch from 86a9ce2 to 069a030 Compare April 21, 2026 18:05

corhere approved these changes Apr 21, 2026

View reviewed changes

corhere merged commit 12ce349 into moby:master Apr 21, 2026
9 checks passed

nowheresly deleted the fix3231 branch April 21, 2026 18:56

Conversation

nowheresly commented Apr 9, 2026 • edited by thaJeztah Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nowheresly commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

corhere Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

corhere Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

nowheresly commented Apr 20, 2026

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

nowheresly commented Apr 21, 2026

Uh oh!

corhere commented Apr 21, 2026

Uh oh!

nowheresly commented Apr 21, 2026

Uh oh!

corhere commented Apr 21, 2026

Uh oh!

nowheresly commented Apr 21, 2026

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nowheresly commented Apr 9, 2026 •

edited by thaJeztah

Loading