Pre-check
Search before asking
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo: 3.3.2
JDK: openjdk version "1.8.0_452"
OS: SUSE Linux Enterprise Server 15 SP7 (x86_64) - Kernel
Registry: Nacos
Protocol: tri
Migration mode: FORCE_APPLICATION
Steps to reproduce this issue
Environment
- Dubbo version:
3.3.2
- Registry:
Nacos
- Protocol:
tri
- Migration mode:
FORCE_APPLICATION
- One consumer JVM contains multiple references of the same interface but with different groups
enable-empty-protection=true
Consumer configuration:
dubbo.application.name=mng-consumer
dubbo.application.service-discovery.migration=FORCE_APPLICATION
dubbo.application.shutwait=30000
dubbo.application.enable-empty-protection=true
dubbo.reference.check=false
dubbo.reference.filter=-authenticationPrepare,-contextHolderParametersSelectedTransfer
dubbo.consumer.parameters.params-filter=-authenticationResolver,-authenticationExceptionTranslator
dubbo.consumer.parameters.router=-tag
dubbo.consumer.protocol=tri
dubbo.consumer.timeout=180000
dubbo.protocols.tri.name=tri
dubbo.protocols.tri.triple.max-response-body-size=52428800
dubbo.protocols.tri.triple.max-body-size=52428800
Scenario
In one consumer JVM, there are multiple references of the same interface but with different groups, for example:
cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi
cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi
Reproduction pattern
This issue does not happen every time. It seems to happen when startup enters the path where interface-app mapping is initially empty.
Observed startup log:
No interface-apps mapping found in local cache, stop subscribing, will automatically wait for mapping listener callback:
... group=cbsp-limt1&interface=com.szfesc.cbsp.limt.api.LimitReProcServiceApi ...
After that, mapping callback and Nacos app subscription logs can still be observed, for example:
[DUBBO] Received mapping notification from meta server, {serviceKey: com.szfesc.cbsp.limt.api.LimitReProcServiceApi, apps: [cbsp-limt1]}
[SUBSCRIBE-SERVICE] service:cbsp-limt1, group:RPC_GROUP, clusters:
new ips(1) service: RPC_GROUP@@cbsp-limt1 -> [...]
However, one concrete consumer reference may still remain unavailable in qos output:
As Consumer side:
+---------------------------------------------------------+----------+
| Consumer Service Name | NUM |
+---------------------------------------------------------+----------+
|cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(0)|
+---------------------------------------------------------+----------+
|cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(2)|
+---------------------------------------------------------+----------+
Restarting the consumer process makes it recover.
Important observation
For the affected reference, I do NOT see the final address notify log:
Notify service cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi:tri with urls ...
This suggests the affected concrete reference may not be successfully attached into the final application-level notify chain, even though mapping callback and Nacos app-level subscription do happen.
What you expected to happen
I expect that after mapping callback arrives, all concrete consumer references of the same interface in the JVM should eventually complete the application-level subscribe flow and receive address notifications correctly.
In the example above, both references should eventually become non-zero in qos output, instead of one staying at nacos-A(0) permanently until restart.
Anything else
Why I think this is a Dubbo bug instead of a user configuration problem
- Providers are visible and healthy in Nacos.
- Other references in the same JVM are normal.
- The issue only affects one concrete reference while another reference of the same interface in the same process works.
- Restarting the consumer fixes it.
- This behavior looks like a startup race / recovery issue in application-level service discovery.
Relevant logs
Startup entered mapping-miss path:
No interface-apps mapping found in local cache, stop subscribing, will automatically wait for mapping listener callback:
... group=cbsp-limt1&interface=com.szfesc.cbsp.limt.api.LimitReProcServiceApi ...
Mapping callback was received:
[DUBBO] Received mapping notification from meta server, {serviceKey: com.szfesc.cbsp.limt.api.LimitReProcServiceApi, apps: [cbsp-limt1]}
Nacos app-level subscription happened:
[SUBSCRIBE-SERVICE] service:cbsp-limt1, group:RPC_GROUP, clusters:
new ips(1) service: RPC_GROUP@@cbsp-limt1 -> [{"instanceId":"10.111.0.195#20000#null#cbsp-limt1","ip":"10.111.0.195","port":20000,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"RPC_GROUP@@cbsp-limt1","metadata":{"dubbo.metadata-service.url-params":"{\"prefer.serialization\":\"hessian2,fastjson2\",\"version\":\"2.0.0\",\"dubbo\":\"2.0.2\",\"release\":\"3.3.2\",\"side\":\"provider\",\"port\":\"20000\",\"protocol\":\"tri\"}","dubbo.endpoints":"[{\"port\":20000,\"protocol\":\"tri\"}]","dubbo.metadata.revision":"c88088292faa51b8a2f2b0d9dfa6250b","dubbo.metadata.storage-type":"local","meta-v":"2.0.0","timestamp":"1778636485744"},"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000,"ipDeleteTimeout":30000}]
QOS output when issue happens:
As Consumer side:
+---------------------------------------------------------+----------+
| Consumer Service Name | NUM |
+---------------------------------------------------------+----------+
| cbsp-bp01/com.szfesc.cbsp.bp.api.BpServiceJsonApi |nacos-A(4)|
+---------------------------------------------------------+----------+
| cbsp-bp02/com.szfesc.cbsp.bp.api.BpServiceJsonApi |nacos-A(4)|
+---------------------------------------------------------+----------+
| cbsp-bpclt01/com.szfesc.cbsp.bpclt.api.BpJsonCtrlApi |nacos-A(2)|
+---------------------------------------------------------+----------+
| cbsp-bpclt02/com.szfesc.cbsp.bpclt.api.BpJsonCtrlApi |nacos-A(2)|
+---------------------------------------------------------+----------+
|cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(0)|
+---------------------------------------------------------+----------+
|cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(2)|
+---------------------------------------------------------+----------+
Current suspicion
This may be related to the recovery path of application-level service discovery when mapping is initially empty.
Suspicious code areas:
ServiceDiscoveryRegistry.doSubscribe()
ServiceDiscoveryRegistry.DefaultMappingListener.onEvent()
ServiceDiscoveryRegistry.subscribeURLs()
ServiceInstancesChangedListener.addListenerAndNotify()
ServiceInstancesChangedListener.notifyAddressChanged()
ServiceNameMapping.buildMappingKey()
NacosMetadataReport.getServiceAppMapping()
Workaround
Changing:
dubbo.application.service-discovery.migration=FORCE_APPLICATION
to:
dubbo.application.service-discovery.migration=FORCE_INTERFACE
can avoid this issue, but this is only a workaround.
Frequency
This issue is not 100% reproducible on every startup. It appears under certain startup timing conditions, especially when mapping is initially empty and later recovered by callback.
If needed, I can provide more logs and help test a patch.
Do you have a (mini) reproduction demo?
Are you willing to submit a pull request to fix on your own?
Code of Conduct
Pre-check
Search before asking
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo: 3.3.2
JDK: openjdk version "1.8.0_452"
OS: SUSE Linux Enterprise Server 15 SP7 (x86_64) - Kernel
Registry: Nacos
Protocol: tri
Migration mode: FORCE_APPLICATION
Steps to reproduce this issue
Environment
3.3.2NacostriFORCE_APPLICATIONenable-empty-protection=trueConsumer configuration:
Scenario
In one consumer JVM, there are multiple references of the same interface but with different groups, for example:
cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApicbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApiReproduction pattern
This issue does not happen every time. It seems to happen when startup enters the path where interface-app mapping is initially empty.
Observed startup log:
After that, mapping callback and Nacos app subscription logs can still be observed, for example:
However, one concrete consumer reference may still remain unavailable in qos output:
Restarting the consumer process makes it recover.
Important observation
For the affected reference, I do NOT see the final address notify log:
This suggests the affected concrete reference may not be successfully attached into the final application-level notify chain, even though mapping callback and Nacos app-level subscription do happen.
What you expected to happen
I expect that after mapping callback arrives, all concrete consumer references of the same interface in the JVM should eventually complete the application-level subscribe flow and receive address notifications correctly.
In the example above, both references should eventually become non-zero in qos output, instead of one staying at
nacos-A(0)permanently until restart.Anything else
Why I think this is a Dubbo bug instead of a user configuration problem
Relevant logs
Startup entered mapping-miss path:
Mapping callback was received:
Nacos app-level subscription happened:
QOS output when issue happens:
Current suspicion
This may be related to the recovery path of application-level service discovery when mapping is initially empty.
Suspicious code areas:
ServiceDiscoveryRegistry.doSubscribe()ServiceDiscoveryRegistry.DefaultMappingListener.onEvent()ServiceDiscoveryRegistry.subscribeURLs()ServiceInstancesChangedListener.addListenerAndNotify()ServiceInstancesChangedListener.notifyAddressChanged()ServiceNameMapping.buildMappingKey()NacosMetadataReport.getServiceAppMapping()Workaround
Changing:
dubbo.application.service-discovery.migration=FORCE_APPLICATIONto:
dubbo.application.service-discovery.migration=FORCE_INTERFACEcan avoid this issue, but this is only a workaround.
Frequency
This issue is not 100% reproducible on every startup. It appears under certain startup timing conditions, especially when mapping is initially empty and later recovered by callback.
If needed, I can provide more logs and help test a patch.
Do you have a (mini) reproduction demo?
Are you willing to submit a pull request to fix on your own?
Code of Conduct