Fix parallelSync getting stuck when all workers fail#399
Conversation
71f30c9 to
8567592
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a deadlock scenario in the syncer’s parallelSync orchestration where syncing could stall indefinitely if every worker peer fails during block fetching, and adds a regression test plus a changeset entry.
Changes:
- Track active
parallelSyncworkers and abort with an error if no workers remain while requests are still incomplete. - Add
TestParallelSyncStallto reproduce the “all workers fail” scenario using a peer that serves headers but always fails block RPCs. - Add a knope changeset documenting the patch.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| syncer/parallel_sync.go | Adds active-worker tracking and an early exit to prevent indefinite blocking when no peers can make progress. |
| syncer/syncer_test.go | Adds a regression test that simulates a peer stalling during block fetch and verifies the sync loop continues. |
| .changeset/fix_parallelsync_stalling_when_all_workers_fail.md | Documents the bugfix as a patch changeset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 020acaec35
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
85edbf4 to
13dabd4
Compare
If for some reason all workers in
parallelSyncfail, it will never exit.Result of failing reproduction test here: https://github.com/SiaFoundation/coreutils/actions/runs/22136567973/job/63989707212?pr=399