Before Creating the Bug Report
Runtime platform environment
All platforms
RocketMQ version
develop branch (latest)
JDK Version
JDK 8+
Describe the Bug
In MQClientInstance.java, the brokerVersionTable (a ConcurrentHashMap) is accessed using a non-atomic check-then-act pattern in multiple locations:
Location 1 - sendHeartbeatToBroker() (line ~657):
if (!this.brokerVersionTable.containsKey(brokerName)) {
this.brokerVersionTable.put(brokerName, new ConcurrentHashMap<>(4));
}
this.brokerVersionTable.get(brokerName).put(addr, version);
Location 2 - sendHeartbeatToAllBrokerV2() (line ~737):
Same pattern as Location 1.
Location 3 - findBrokerVersion() (line ~1304):
if (this.brokerVersionTable.containsKey(brokerName)) {
if (this.brokerVersionTable.get(brokerName).containsKey(brokerAddr)) {
return this.brokerVersionTable.get(brokerName).get(brokerAddr);
}
}
Between containsKey() and get(), another thread could remove the entry, causing a NullPointerException when the result of get() is dereferenced.
Notably, a similar operation in the same class at line ~814 already uses the correct pattern:
ConcurrentHashMap<String, Integer> inner = MQClientInstance.this.brokerVersionTable
.computeIfAbsent(brokerName, k -> new ConcurrentHashMap<>(4));
inner.put(brokerAddr, version);
Steps to Reproduce
The race condition occurs under concurrent heartbeat sending and broker disconnection. When a broker is removed from the version table concurrently with a heartbeat send or version lookup, a NullPointerException can occur.
What Did You Expect to See?
Thread-safe access to brokerVersionTable using atomic operations like computeIfAbsent and local variable caching.
What Did You See Instead?
Non-atomic check-then-act pattern that can result in NullPointerException under concurrent access.
Additional Context
Related to previously reported #8743 (closed by stale bot without fix).
Before Creating the Bug Report
I found a bug, not just asking a question, which should be created in GitHub Discussions.
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
All platforms
RocketMQ version
develop branch (latest)
JDK Version
JDK 8+
Describe the Bug
In
MQClientInstance.java, thebrokerVersionTable(aConcurrentHashMap) is accessed using a non-atomic check-then-act pattern in multiple locations:Location 1 -
sendHeartbeatToBroker()(line ~657):Location 2 -
sendHeartbeatToAllBrokerV2()(line ~737):Same pattern as Location 1.
Location 3 -
findBrokerVersion()(line ~1304):Between
containsKey()andget(), another thread could remove the entry, causing a NullPointerException when the result ofget()is dereferenced.Notably, a similar operation in the same class at line ~814 already uses the correct pattern:
Steps to Reproduce
The race condition occurs under concurrent heartbeat sending and broker disconnection. When a broker is removed from the version table concurrently with a heartbeat send or version lookup, a NullPointerException can occur.
What Did You Expect to See?
Thread-safe access to
brokerVersionTableusing atomic operations likecomputeIfAbsentand local variable caching.What Did You See Instead?
Non-atomic check-then-act pattern that can result in NullPointerException under concurrent access.
Additional Context
Related to previously reported #8743 (closed by stale bot without fix).