fix(cluster): eliminate data race on StickyInvoker with atomic.Value#3439
fix(cluster): eliminate data race on StickyInvoker with atomic.Value#3439Aias00 wants to merge 7 commits into
Conversation
StickyInvoker in BaseClusterInvoker was a plain field read/written by multiple goroutines (e.g. failback retry goroutine + Invoke goroutine) without any synchronization, causing a data race on the RPC hot path. Replace the unexported StickyInvoker field with an atomic.Value, accessed via getStickyInvoker()/setStickyInvoker() helpers. This eliminates the race while keeping the lock-free read path fast. Key changes: - StickyInvoker base.Invoker -> stickyInvoker atomic.Value (unexported) - Add getStickyInvoker() / setStickyInvoker() helpers - Update IsAvailable() and DoSelect() to use atomic accessors - DoSelect reads sticky invoker into a local variable to avoid redundant atomic loads and ensure consistent logic within one call - Add concurrent tests (TestStickyConcurrentDoSelect, TestStickyConcurrentIsAvailableAndDoSelect) that pass with -race Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR aims to make BaseClusterInvoker’s sticky invoker selection safe under concurrent access and adds race-focused tests around sticky selection.
Changes:
- Replaced the
StickyInvokerfield with an atomic-backed storage and added getter/setter helpers. - Updated
IsAvailableandDoSelectto use the atomic sticky invoker accessors. - Added concurrent tests intended to surface data races in sticky invoker selection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| cluster/cluster/base/cluster_invoker.go | Switches sticky invoker storage to an atomic container and updates call sites to use it. |
| cluster/cluster/base/cluster_invoker_test.go | Adds concurrency tests intended to detect race conditions around sticky selection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| selectedInvoker = invoker.doSelectInvoker(lb, invocation, invokers, invoked) | ||
| if sticky { | ||
| invoker.StickyInvoker = selectedInvoker | ||
| invoker.setStickyInvoker(selectedInvoker) | ||
| } |
| // TestStickyConcurrentIsAvailableAndDoSelect verifies that concurrent | ||
| // IsAvailable and DoSelect calls do not cause a data race on StickyInvoker. |
…and type mismatch panics
Address Copilot review feedback:
1. atomic.Value.Store(nil) panics - wrap base.Invoker in stickyInvokerWrapper
struct so nil invoker is stored as {invoker: nil} (non-nil wrapper)
2. atomic.Value requires consistent concrete type - the wrapper struct
ensures the same type is always stored regardless of Invoker impl
3. Fix test TestStickyConcurrentIsAvailableAndDoSelect to actually call
IsAvailable() with a proper mock Directory (instead of getStickyInvoker)
4. Fix imports formatting per dubbo-go convention (separate import groups)
Co-Authored-By: Claude <noreply@anthropic.com>
Addressing Copilot Review FeedbackThanks for the thorough review! All three issues have been fixed in the latest push: 1.
|
Co-Authored-By: Claude <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3439 +/- ##
===========================================
+ Coverage 46.76% 53.75% +6.99%
===========================================
Files 295 491 +196
Lines 17172 37824 +20652
===========================================
+ Hits 8031 20334 +12303
- Misses 8287 15857 +7570
- Partials 854 1633 +779 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
这版把一个很小的 sticky 状态抽成了 别为了适配 |
好 我处理下 |
…ClusterInvoker from constructor - Replace atomic.Value + stickyInvokerWrapper + getter/setter with atomic.Pointer[base.Invoker] — idiomatic Go, nil-safe, type-safe, no wrapper needed - NewBaseClusterInvoker now returns *BaseClusterInvoker (pointer), preventing accidental value copies of the struct containing atomic fields - All 9 cluster implementations updated to use pointer embedding (*base.BaseClusterInvoker) to match the new constructor return type - StickyInvoker field remains exported (atomic.Pointer is safe for concurrent access by design) Co-Authored-By: Claude <noreply@anthropic.com>
|



Problem
StickyInvokerinBaseClusterInvokerwas a plainbase.Invokerfield read and written by multiple goroutines without any synchronization, causing a data race on the RPC hot path.Race details
DoSelect()setsStickyInvoker = nil(line 104) andStickyInvoker = selectedInvoker(line 115)DoSelect()readsStickyInvokermultiple times (lines 103, 108-110), andIsAvailable()reads it (lines 65-66)failbackClusterInvokercallsDoSelect()from both the mainInvoke()goroutine (line 143) and the retry goroutine spawned viago invoker.tryTimerTaskProc()(line 118, callingDoSelectat line 85). Even without failback, concurrentInvoke()calls on the same invoker would also race.Fix
Replace the unexported
StickyInvokerfield withsync/atomic.Value, accessed viagetStickyInvoker()/setStickyInvoker()helpers:StickyInvoker base.Invoker→stickyInvoker atomic.ValuegetStickyInvoker()/setStickyInvoker()IsAvailable()usesgetStickyInvoker()DoSelect()uses local variable + atomic accessorsTestStickyConcurrentDoSelectDoSelectwith sticky=trueTestStickyConcurrentIsAvailableAndDoSelectDoSelectBoth new tests pass with
go test -race.Semantic note
The sticky invoker mechanism is inherently "best-effort" — concurrent goroutines may select different invokers in the same call, but this is acceptable since sticky is merely a preference, not a correctness guarantee. The
atomic.Valueapproach eliminates the undefined behavior (Go data race) while preserving the sticky semantics.Test results
All other cluster package tests also pass with
-race(the pre-existingTestFailbackOutOfLimitrace on thetickerfield in failback is unrelated).