Skip to content

fix #3231 prevent snapshot corruption#3232

Merged
corhere merged 1 commit intomoby:masterfrom
nowheresly:fix3231
Apr 21, 2026
Merged

fix #3231 prevent snapshot corruption#3232
corhere merged 1 commit intomoby:masterfrom
nowheresly:fix3231

Conversation

@nowheresly
Copy link
Copy Markdown
Contributor

@nowheresly nowheresly commented Apr 9, 2026

- What I did

Fixed snapshot corruption that occurs when the swarm state is large enough that the raft message struct overhead exceeds GRPCMaxMsgSize.

- How I did it

Copied the Snapshot struct before slicing chunk data. The splitSnapshotData loop was assigning sub-slices of Snapshot.Data directly through a shared pointer, which mutated the original message's snapshot data across iterations. Each chunk now gets its own copy of the Snapshot struct so the original data remains intact for correct sub-slicing.

- How to test it

  • go test ./manager/state/raft/transport/ -run TestSplitSnapshotData
    — verifies both normal snapshots and the large-metadata scenario that previously panicked/corrupted data.

- Description for the changelog

@nowheresly
Copy link
Copy Markdown
Contributor Author

@corhere any chance someone could review this PR before the next release of moby? Any help needed?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes raft snapshot splitting in SwarmKit transport to avoid snapshot corruption/panics when raft messages become too large for the configured gRPC max message size.

Changes:

  • Add a lower-bound behavior to the computed snapshot chunk payload size used for splitting.
  • Avoid mutating the original raftpb.Message.Snapshot.Data across chunk iterations by copying the Snapshot struct per chunk.
  • Add regression tests for snapshot splitting and payload size behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
manager/state/raft/transport/peer.go Adjusts snapshot chunk sizing and fixes snapshot chunking to avoid pointer/slice mutation across iterations.
manager/state/raft/transport/transport_test.go Adds tests covering snapshot reassembly correctness and payload size minimum behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread manager/state/raft/transport/peer.go Outdated
Comment on lines +151 to +156
func raftMessagePayloadSize(m *raftpb.Message) int {
return GRPCMaxMsgSize - raftMessageStructSize(m)
s := GRPCMaxMsgSize - raftMessageStructSize(m)
if s < 64<<10 {
s = 64 << 10
}
return s
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raftMessagePayloadSize currently floors the payload to 64 KiB for any computed value < 64 KiB. If the struct overhead is close to (but still below) GRPCMaxMsgSize, this increases the chunk size beyond what fits in a gRPC message and can turn a previously-sendable snapshot into one that always hits ResourceExhausted. Consider only applying the 64 KiB fallback when the computed size is <= 0 (or otherwise keep the computed positive size), so chunking still respects GRPCMaxMsgSize whenever it’s possible to do so.

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +149
t.Run("LargeMetadataDoesNotPanic", func(t *testing.T) {
// Simulate a snapshot where struct overhead exceeds GRPCMaxMsgSize.
// This is the scenario from the bug: many cluster objects cause
// large raft message metadata, making payloadSize negative without
// the fix.
data := make([]byte, 5<<20) // 5 MiB snapshot data
for i := range data {
data[i] = byte(i % 256)
}
m := raftpb.Message{
Type: raftpb.MsgSnap,
From: 1,
To: 2,
Snapshot: &raftpb.Snapshot{
Data: data,
Metadata: raftpb.SnapshotMetadata{
Index: uint64(len(data)),
// Large ConfState to push struct size above GRPCMaxMsgSize
ConfState: raftpb.ConfState{
Voters: make([]uint64, 1<<20),
},
},
},
}
// Must not panic
msgs := splitSnapshotData(ctx, &m)
assert.NotEmpty(t, msgs)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test intends to simulate “struct overhead exceeds GRPCMaxMsgSize”, but ConfState.Voters is filled with zero values; protobuf varint encoding makes these extremely small (likely ~2 bytes/value including the tag), so this may not actually push the message overhead past 4 MiB. To make the test robust, assert the precondition (e.g., raftMessageStructSize(&m) > GRPCMaxMsgSize) or populate Voters with sufficiently large non-zero IDs so the encoded size truly exceeds the limit.

Copilot uses AI. Check for mistakes.
Comment thread manager/state/raft/transport/transport_test.go
Copy link
Copy Markdown
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamped raftMessagePayloadSize to a minimum of 64 KiB. When a large cluster
state causes the raft message struct overhead to exceed GRPCMaxMsgSize, the
computed payload size went negative, leading to corruption during snapshot
splitting. The payload size is now floored at 64 KiB to guarantee forward
progress.

This doesn't make sense to me. No amount of splitting the snapshot data will yield a message small enough to send if the message overhead alone exceeds GRPCMaxMsgSize. Are you sure this is a real issue?

Comment thread manager/state/raft/transport/peer.go Outdated
Comment on lines +174 to +177
// Capture the full snapshot data before the loop, because
// we need to copy the Snapshot struct for each chunk to avoid
// mutating m.Snapshot.Data through the shared pointer.
fullData := m.Snapshot.Data
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why another copy of the slice is needed. Cloning raftMsg.Snapshot is sufficient to ensure that assigning to raftMsg.Snapshot.Data does not mutate m.Snapshot.Data, therefore fullData == m.Snapshot.Data for all loop iterations.

bigEntry := raftpb.Entry{Data: make([]byte, GRPCMaxMsgSize)}
m := &raftpb.Message{
Type: raftpb.MsgSnap,
Entries: []raftpb.Entry{bigEntry},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How realistic is this scenario? Is it possible in practice for a MsgSnap message to be constructed with a non-empty Entries field?

nowheresly added a commit to nowheresly/swarmkit that referenced this pull request Apr 20, 2026
The shared *Snapshot pointer is the real cause of moby#3231: the shallow
copy of the raft message meant re-slicing raftMsg.Snapshot.Data also
re-sliced m.Snapshot.Data, shrinking its capacity until a later
iteration panicked with "slice bounds out of range".

Cloning the Snapshot struct alone is sufficient. Drop the 64 KiB
payload-size clamp (which could not produce a sendable message when
struct overhead already exceeds GRPCMaxMsgSize) and the redundant
fullData capture. Replace the tests with a single regression test
that feeds a realistic MsgSnap of 3*GRPCMaxMsgSize and asserts the
input message's Snapshot.Data is untouched after splitting.

Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nowheresly
Copy link
Copy Markdown
Contributor Author

Thanks @corhere for the review.
You're right on all three counts. The actual bug is the shared *Snapshot pointer causing m.Snapshot.Data to be re-sliced each iteration, which shrinks its capacity and panics on a later chunk. Cloning the Snapshot struct is sufficient to fix it; the fullData capture and the 64 KiB minimum were leftover from an earlier (wrong) hypothesis and I'll drop them. I'll also rewrite the test to reflect a realistic MsgSnap and assert the split chunks reassemble correctly without mutating the input message.

nowheresly added a commit to nowheresly/swarmkit that referenced this pull request Apr 20, 2026
The shared *Snapshot pointer is the real cause of moby#3231: the shallow
copy of the raft message meant re-slicing raftMsg.Snapshot.Data also
re-sliced m.Snapshot.Data, shrinking its capacity until a later
iteration panicked with "slice bounds out of range".

Cloning the Snapshot struct alone is sufficient. Drop the 64 KiB
payload-size clamp (which could not produce a sendable message when
struct overhead already exceeds GRPCMaxMsgSize) and the redundant
fullData capture. Replace the tests with a single regression test
that feeds a realistic MsgSnap of 3*GRPCMaxMsgSize and asserts the
input message's Snapshot.Data is untouched after splitting.

Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>
Copy link
Copy Markdown
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great; no notes! I just need you to squash the changes down to a single commit and update the PR description before I can approve and merge. Thank you so much for identifying and fixing this bug!

@nowheresly nowheresly force-pushed the fix3231 branch 2 times, most recently from 5afb846 to 6d7f722 Compare April 21, 2026 07:10
@nowheresly
Copy link
Copy Markdown
Contributor Author

Thank you @corhere .
Commits squashed.

@corhere
Copy link
Copy Markdown
Collaborator

corhere commented Apr 21, 2026

@nowheresly one last thing before merging: the commit message. Could you please do the following?

  • remove the issue number from the message
    • Mentioning issue numbers in commit messages results in the issue timeline getting spammed with a link to every copy of the commit in every public fork of Swarmkit on GitHub.
  • fix up the line wrapping and indentation
  • insert a blank line before the Signed-off-by trailer

@nowheresly
Copy link
Copy Markdown
Contributor Author

just updated my commit. Please let me know if it's fine now.

@corhere
Copy link
Copy Markdown
Collaborator

corhere commented Apr 21, 2026

@nowheresly lines in the commit message should be wrapped to 72 chars, like how the commit messages were formatted before squashing.

Clone raft Snapshot so that re-slicing Snapshot.Data does not mutate
the initial Snapshot.Data through the shared pointer.

Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>
@nowheresly
Copy link
Copy Markdown
Contributor Author

Is it better now?

Copy link
Copy Markdown
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect; thank you!

@corhere corhere merged commit 12ce349 into moby:master Apr 21, 2026
9 checks passed
@nowheresly nowheresly deleted the fix3231 branch April 21, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send)

3 participants