gc_state_manager: Make methods of GCStateManager able to timeout or cancel#10606
gc_state_manager: Make methods of GCStateManager able to timeout or cancel#10606MyonKeminta wants to merge 2 commits intotikv:masterfrom
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
|
@MyonKeminta: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10606 +/- ##
==========================================
+ Coverage 78.96% 78.98% +0.01%
==========================================
Files 532 533 +1
Lines 71883 72123 +240
==========================================
+ Hits 56766 56965 +199
- Misses 11093 11114 +21
- Partials 4024 4044 +20
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| // layer. It can be more efficient and avoid failures due to transaction conflict in most cases. | ||
| // The etcd transactions is still necessary considering the possibility of rare cases like PD leader changes. | ||
| mu syncutil.RWMutex | ||
| mu syncutil.CancellableRWMutex |
There was a problem hiding this comment.
Why do we need a cancellable mutex?
There was a problem hiding this comment.
As it may involve etcd IO in its critical section, and it may need a long time to successfully acquire the lock. Without the ability to cancel or timeout, the goroutines waiting for it would be unable to exit, and the memory used by them can't be released either.
| }) | ||
| } | ||
|
|
||
| func (m *GCStateManager) getGCStateImpl(ctx context.Context, keyspaceID uint32) (GCState, error) { |
There was a problem hiding this comment.
When the context is cancelled, it cannot cancel the transaction.
There was a problem hiding this comment.
This is a problem but it's unable to pass ctx to the StorageEndpoint API now. Seems it requires a large refactor.
|
|
||
| // RLock acquires a read lock or returns an error if ctx is canceled before the | ||
| // lock is acquired. | ||
| func (m *CancellableRWMutex) RLock(ctx context.Context) error { |
There was a problem hiding this comment.
The deadlock testing does not work anymore.
|
|
||
| func (m *CancellableRWMutex) init() { | ||
| m.initOnce.Do(func() { | ||
| m.inner = semaphore.NewWeighted(maxReaderCount) |
There was a problem hiding this comment.
Will the read be blocked more easily?
There was a problem hiding this comment.
The overhead of the lock itself is larger than the builtin RWMutex, but in the current use case I think it won't
What problem does this PR solve?
Issue Number: Close #10607
What is changed and how does it work?
In this PR:
RWMutexby using semaphore internallyOrderedSingleFlightGroupGCStateManager, Replace the originalRWLockwith theCancellableRWMutex, and set timeout for its APIsWe've encountered a case that some GC APIs are accumulating waiting for the mutex. By making these changes, the case should be able to be alleviated.
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note