[fix][meta] Run ledger-underreplication notification callbacks off the metadata-store listener thread#26065
Open
merlimat wants to merge 1 commit into
Conversation
…e metadata-store listener thread PulsarLedgerUnderreplicationManager.handleNotification is registered as the metadata-store change listener and runs on the metadata-store notification thread, inside synchronized(this). When a relevant z-node changed, it invoked the registered BookKeeper GenericCallbacks (replicationEnabledCallbacks / lostBookieRecoveryDelayCallbacks) inline. Those callbacks can perform synchronous metadata-store reads (the class documents this near notifyUnderReplicationLedgerChanged), so they blocked the metadata-store notification thread -- and held the manager's monitor -- while doing metadata IO. Run the registered callbacks on a dedicated single-threaded executor instead of the notification thread. The callback list is still snapshotted and cleared synchronously under the monitor, and notifyAll() (the primary wait/notify used by the blocking poll methods) stays synchronous; only the callback invocation is offloaded, so it runs off the notification thread and without holding the monitor. The single thread preserves notification ordering, and close() shuts the executor down. Callbacks re-read current state when they re-register, so this is eventually consistent.
void-ptr974
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
PulsarLedgerUnderreplicationManager.handleNotificationis registered as the metadata-store change listener and runs on the metadata-store notification thread, insidesynchronized(this). When a relevant z-node changed, it invoked the registered BookKeeperGenericCallbacks (replicationEnabledCallbacks/lostBookieRecoveryDelayCallbacks) inline. Those callbacks can perform synchronous metadata-store reads (the class itself documents this nearnotifyUnderReplicationLedgerChanged), so they blocked the metadata-store notification thread — and held the manager's monitor — while doing metadata I/O.Modifications
Run the registered callbacks on a dedicated single-threaded executor instead of the notification thread. The callback list is still snapshotted and cleared synchronously under the monitor, and
notifyAll()(the primary wait/notify used by the blocking poll methods) stays synchronous; only the callback invocation is offloaded, so it runs off the notification thread and without holding the monitor. The single thread preserves notification ordering, andclose()shuts the executor down. Callbacks re-read current state when they re-register, so this is eventually consistent.Verifying this change
Covered by existing
LedgerUnderreplicationManagerTest(87 executions across metadata-store implementations), which awaits callbacks viaAwaitility/latches — passes.Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes