fix(remoting): stop ReconnectTimerTask when client is closed#16176
Open
daguimu wants to merge 3 commits intoapache:3.3from
Open
fix(remoting): stop ReconnectTimerTask when client is closed#16176daguimu wants to merge 3 commits intoapache:3.3from
daguimu wants to merge 3 commits intoapache:3.3from
Conversation
- Guard against empty channel collection in AbstractTimerTask.run() to prevent spurious auto-cancel on server-side timer tasks - Remove redundant dead-code check in ReconnectTimerTask.doTask() - Add javadoc explaining HeaderExchangeClient.isClosed() semantics - Add test for empty channel collection edge case
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 3.3 #16176 +/- ##
============================================
- Coverage 60.80% 60.77% -0.04%
+ Complexity 11767 11752 -15
============================================
Files 1953 1953
Lines 89118 89123 +5
Branches 13444 13446 +2
============================================
- Hits 54190 54165 -25
- Misses 29367 29383 +16
- Partials 5561 5575 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After all providers go offline, the consumer's
ReconnectTimerTaskkeeps attempting to reconnect to the old provider IP indefinitely. In K8s environments where providers restart with new IPs, this generates continuous connection-refused errors and triggers false alerts.Root Cause
Three issues allow the reconnect timer to run indefinitely after a client should be considered closed:
AbstractTimerTask.run()always reschedules the timer viareput(), even when all channels are already closed. The task becomes an orphaned timer doing nothing but consuming resources.HeaderExchangeClient.isClosed()only checksHeaderExchangeChannel.closedbut not the underlyingClient.isClosed(). When the underlyingAbstractClient/NettyClientis closed independently (e.g., by protocol destroy), theHeaderExchangeClientstill reports as open.ReconnectTimerTask.doTask()has no defensive check for client closure before attempting reconnection. Even if the client is closed between therun()check and thedoTask()execution, reconnection proceeds.Fix
AbstractTimerTask.run(): When all channels report as closed, cancel the timer task instead of rescheduling. This prevents orphaned timers from running indefinitely.HeaderExchangeClient.isClosed(): Also checkclient.isClosed()so that if the underlying client is closed by any code path, the wrapper correctly reflects the closed state.ReconnectTimerTask.doTask(): Add a defensive check at the start — if the channel is aClientand reports as closed, cancel the timer and return immediately.Tests Added
AbstractTimerTaskTest.testAutoCancelWhenAllChannelsClosed()— verifies task stops executing and cancel flag is set after channel closesAbstractTimerTaskTest.testTaskContinuesWhenChannelIsOpen()— regression test ensuring normal scheduling worksHeaderExchangeClientTest.testIsClosedWhenUnderlyingClientClosed()— verifies isClosed() returns true when underlying client is closedHeaderExchangeClientTest.testIsNotClosedWhenBothOpen()— regression test for normal open stateReconnectTimerTaskTest.testStopReconnectWhenClientClosed()— verifies reconnect count stops increasing and timer is cancelled after client closesReconnectTimerTaskTest.testReconnectContinuesWhenNotClosed()— regression test ensuring reconnection works for transient disconnectionsImpact
Only affects
HeaderExchangeClient-based connections (Dubbo protocol). The newerAbstractNettyConnectionClient(Triple protocol) already handles this correctly viaNettyConnectionHandler. No behavioral change for normal reconnection scenarios — the timer still runs for transient disconnections where the client is not closed.Fixes #15880