Skip to content

fix(remoting): stop ReconnectTimerTask when client is closed#16176

Open
daguimu wants to merge 3 commits intoapache:3.3from
daguimu:fix/reconnect-timer-not-stop-after-client-closed-15880
Open

fix(remoting): stop ReconnectTimerTask when client is closed#16176
daguimu wants to merge 3 commits intoapache:3.3from
daguimu:fix/reconnect-timer-not-stop-after-client-closed-15880

Conversation

@daguimu
Copy link
Copy Markdown

@daguimu daguimu commented Mar 27, 2026

Problem

After all providers go offline, the consumer's ReconnectTimerTask keeps attempting to reconnect to the old provider IP indefinitely. In K8s environments where providers restart with new IPs, this generates continuous connection-refused errors and triggers false alerts.

Root Cause

Three issues allow the reconnect timer to run indefinitely after a client should be considered closed:

  1. AbstractTimerTask.run() always reschedules the timer via reput(), even when all channels are already closed. The task becomes an orphaned timer doing nothing but consuming resources.

  2. HeaderExchangeClient.isClosed() only checks HeaderExchangeChannel.closed but not the underlying Client.isClosed(). When the underlying AbstractClient/NettyClient is closed independently (e.g., by protocol destroy), the HeaderExchangeClient still reports as open.

  3. ReconnectTimerTask.doTask() has no defensive check for client closure before attempting reconnection. Even if the client is closed between the run() check and the doTask() execution, reconnection proceeds.

Fix

  • AbstractTimerTask.run(): When all channels report as closed, cancel the timer task instead of rescheduling. This prevents orphaned timers from running indefinitely.

  • HeaderExchangeClient.isClosed(): Also check client.isClosed() so that if the underlying client is closed by any code path, the wrapper correctly reflects the closed state.

  • ReconnectTimerTask.doTask(): Add a defensive check at the start — if the channel is a Client and reports as closed, cancel the timer and return immediately.

Tests Added

Change Point Test
AbstractTimerTask auto-cancel on all channels closed AbstractTimerTaskTest.testAutoCancelWhenAllChannelsClosed() — verifies task stops executing and cancel flag is set after channel closes
AbstractTimerTask continues when channel is open AbstractTimerTaskTest.testTaskContinuesWhenChannelIsOpen() — regression test ensuring normal scheduling works
HeaderExchangeClient.isClosed() checks underlying client HeaderExchangeClientTest.testIsClosedWhenUnderlyingClientClosed() — verifies isClosed() returns true when underlying client is closed
HeaderExchangeClient.isClosed() normal case HeaderExchangeClientTest.testIsNotClosedWhenBothOpen() — regression test for normal open state
ReconnectTimerTask stops on client close ReconnectTimerTaskTest.testStopReconnectWhenClientClosed() — verifies reconnect count stops increasing and timer is cancelled after client closes
ReconnectTimerTask continues when not closed ReconnectTimerTaskTest.testReconnectContinuesWhenNotClosed() — regression test ensuring reconnection works for transient disconnections

Impact

Only affects HeaderExchangeClient-based connections (Dubbo protocol). The newer AbstractNettyConnectionClient (Triple protocol) already handles this correctly via NettyConnectionHandler. No behavioral change for normal reconnection scenarios — the timer still runs for transient disconnections where the client is not closed.

Fixes #15880

daguimu added 3 commits March 27, 2026 11:27
- Guard against empty channel collection in AbstractTimerTask.run()
  to prevent spurious auto-cancel on server-side timer tasks
- Remove redundant dead-code check in ReconnectTimerTask.doTask()
- Add javadoc explaining HeaderExchangeClient.isClosed() semantics
- Add test for empty channel collection edge case
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 60.77%. Comparing base (b54059d) to head (94d98b6).
⚠️ Report is 7 commits behind head on 3.3.

Files with missing lines Patch % Lines
.../exchange/support/header/HeaderExchangeClient.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #16176      +/-   ##
============================================
- Coverage     60.80%   60.77%   -0.04%     
+ Complexity    11767    11752      -15     
============================================
  Files          1953     1953              
  Lines         89118    89123       +5     
  Branches      13444    13446       +2     
============================================
- Hits          54190    54165      -25     
- Misses        29367    29383      +16     
- Partials       5561     5575      +14     
Flag Coverage Δ
integration-tests-java21 32.11% <16.66%> (-0.07%) ⬇️
integration-tests-java8 32.22% <16.66%> (-0.03%) ⬇️
samples-tests-java21 32.13% <0.00%> (-0.04%) ⬇️
samples-tests-java8 29.72% <0.00%> (-0.01%) ⬇️
unit-tests-java11 59.05% <83.33%> (+0.01%) ⬆️
unit-tests-java17 58.51% <83.33%> (+0.01%) ⬆️
unit-tests-java21 58.49% <83.33%> (-0.04%) ⬇️
unit-tests-java25 58.46% <83.33%> (-0.05%) ⬇️
unit-tests-java8 59.01% <83.33%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] The consumer keep executing the reconnection, after all providers are offline

2 participants