[fix][client] Run the failover health probe off the Netty event-loop thread#26064
Open
merlimat wants to merge 2 commits into
Open
[fix][client] Run the failover health probe off the Netty event-loop thread#26064merlimat wants to merge 2 commits into
merlimat wants to merge 2 commits into
Conversation
…thread SameAuthParamsLookupAutoClusterFailover periodically probes broker health with a blocking getLookup(url).getBroker(...).get(3, SECONDS). The periodic task was scheduled on a Netty EventLoopGroup (EventLoopUtil.newEventLoopGroup(1, ...)), so the blocking probe ran on an event-loop thread. Use a plain single-thread ScheduledExecutorService (Executors.newSingleThreadScheduledExecutor) for the periodic health check, matching the sibling AutoClusterFailover, so the blocking probe no longer occupies a Netty event-loop thread. scheduleAtFixedRate and shutdownNow are unchanged; the executor is dedicated solely to this check.
… fix its broker test Follow-up to the executor change in this PR, fixing the CI failure in org.apache.pulsar.broker.SameAuthParamsLookupAutoClusterFailoverTest. 1. The broker integration test reflects the private 'executor' field and typed it as io.netty.channel.EventLoopGroup; the field is now a ScheduledExecutorService, so the reflective cast threw ClassCastException. Update the three type references in the test to ScheduledExecutorService (it only uses execute/submit). 2. The test schedules the health check every 100ms while one service (a dead dummy proxy) blocks its probe for ~3s. On a plain single-threaded ScheduledExecutorService, scheduleAtFixedRate runs such slow checks back-to-back (catch-up) and monopolizes the thread, starving the task the test submits to the executor (a Netty EventLoopGroup interleaves immediate tasks, which is why it passed before). Use scheduleWithFixedDelay so a gap is left after each check; this is also better for a blocking health probe, which fixed-rate would otherwise issue continuously while a service is down.
void-ptr974
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
SameAuthParamsLookupAutoClusterFailoverperiodically probes broker health with a blockinggetLookup(url).getBroker(...).get(3, SECONDS). The periodic task was scheduled on a NettyEventLoopGroup(EventLoopUtil.newEventLoopGroup(1, ...)), so the blocking probe ran on a Netty event-loop thread.Modifications
Use a plain single-thread
ScheduledExecutorService(Executors.newSingleThreadScheduledExecutor) for the periodic health check — matching the siblingAutoClusterFailover— so the blocking probe no longer occupies a Netty event-loop thread.scheduleAtFixedRateandshutdownNoware unchanged, and the executor is dedicated solely to this check.Verifying this change
Covered by existing
SameAuthParamsLookupAutoClusterFailoverTest(4 tests) — passes.Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes