Skip to content

[Bug] Dubbo 3.3.2: FORCE_APPLICATION may leave one of multiple same-interface different-group consumer references with nacos-A(0) after startup when mapping is initially empty #16269

@HanlyL

Description

@HanlyL

Pre-check

  • I am sure that all the content I provide is in English.

Search before asking

  • I had searched in the issues and found no similar issues.

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

Dubbo: 3.3.2
JDK: openjdk version "1.8.0_452"
OS: SUSE Linux Enterprise Server 15 SP7 (x86_64) - Kernel
Registry: Nacos
Protocol: tri
Migration mode: FORCE_APPLICATION

Steps to reproduce this issue

Environment

  • Dubbo version: 3.3.2
  • Registry: Nacos
  • Protocol: tri
  • Migration mode: FORCE_APPLICATION
  • One consumer JVM contains multiple references of the same interface but with different groups
  • enable-empty-protection=true

Consumer configuration:

dubbo.application.name=mng-consumer
dubbo.application.service-discovery.migration=FORCE_APPLICATION
dubbo.application.shutwait=30000
dubbo.application.enable-empty-protection=true

dubbo.reference.check=false
dubbo.reference.filter=-authenticationPrepare,-contextHolderParametersSelectedTransfer
dubbo.consumer.parameters.params-filter=-authenticationResolver,-authenticationExceptionTranslator
dubbo.consumer.parameters.router=-tag
dubbo.consumer.protocol=tri
dubbo.consumer.timeout=180000

dubbo.protocols.tri.name=tri
dubbo.protocols.tri.triple.max-response-body-size=52428800
dubbo.protocols.tri.triple.max-body-size=52428800

Scenario

In one consumer JVM, there are multiple references of the same interface but with different groups, for example:

  • cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi
  • cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi

Reproduction pattern

This issue does not happen every time. It seems to happen when startup enters the path where interface-app mapping is initially empty.

Observed startup log:

No interface-apps mapping found in local cache, stop subscribing, will automatically wait for mapping listener callback:
... group=cbsp-limt1&interface=com.szfesc.cbsp.limt.api.LimitReProcServiceApi ...

After that, mapping callback and Nacos app subscription logs can still be observed, for example:

[DUBBO] Received mapping notification from meta server, {serviceKey: com.szfesc.cbsp.limt.api.LimitReProcServiceApi, apps: [cbsp-limt1]}
[SUBSCRIBE-SERVICE] service:cbsp-limt1, group:RPC_GROUP, clusters:
new ips(1) service: RPC_GROUP@@cbsp-limt1 -> [...]

However, one concrete consumer reference may still remain unavailable in qos output:

As Consumer side:
+---------------------------------------------------------+----------+
|                  Consumer Service Name                  |    NUM   |
+---------------------------------------------------------+----------+
|cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(0)|
+---------------------------------------------------------+----------+
|cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(2)|
+---------------------------------------------------------+----------+

Restarting the consumer process makes it recover.

Important observation

For the affected reference, I do NOT see the final address notify log:

Notify service cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi:tri with urls ...

This suggests the affected concrete reference may not be successfully attached into the final application-level notify chain, even though mapping callback and Nacos app-level subscription do happen.

What you expected to happen

I expect that after mapping callback arrives, all concrete consumer references of the same interface in the JVM should eventually complete the application-level subscribe flow and receive address notifications correctly.

In the example above, both references should eventually become non-zero in qos output, instead of one staying at nacos-A(0) permanently until restart.

Anything else

Why I think this is a Dubbo bug instead of a user configuration problem

  • Providers are visible and healthy in Nacos.
  • Other references in the same JVM are normal.
  • The issue only affects one concrete reference while another reference of the same interface in the same process works.
  • Restarting the consumer fixes it.
  • This behavior looks like a startup race / recovery issue in application-level service discovery.

Relevant logs

Startup entered mapping-miss path:

No interface-apps mapping found in local cache, stop subscribing, will automatically wait for mapping listener callback:
... group=cbsp-limt1&interface=com.szfesc.cbsp.limt.api.LimitReProcServiceApi ...

Mapping callback was received:

[DUBBO] Received mapping notification from meta server, {serviceKey: com.szfesc.cbsp.limt.api.LimitReProcServiceApi, apps: [cbsp-limt1]}

Nacos app-level subscription happened:

[SUBSCRIBE-SERVICE] service:cbsp-limt1, group:RPC_GROUP, clusters:
new ips(1) service: RPC_GROUP@@cbsp-limt1 -> [{"instanceId":"10.111.0.195#20000#null#cbsp-limt1","ip":"10.111.0.195","port":20000,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"RPC_GROUP@@cbsp-limt1","metadata":{"dubbo.metadata-service.url-params":"{\"prefer.serialization\":\"hessian2,fastjson2\",\"version\":\"2.0.0\",\"dubbo\":\"2.0.2\",\"release\":\"3.3.2\",\"side\":\"provider\",\"port\":\"20000\",\"protocol\":\"tri\"}","dubbo.endpoints":"[{\"port\":20000,\"protocol\":\"tri\"}]","dubbo.metadata.revision":"c88088292faa51b8a2f2b0d9dfa6250b","dubbo.metadata.storage-type":"local","meta-v":"2.0.0","timestamp":"1778636485744"},"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000,"ipDeleteTimeout":30000}]

QOS output when issue happens:

As Consumer side:
+---------------------------------------------------------+----------+
|                  Consumer Service Name                  |    NUM   |
+---------------------------------------------------------+----------+
|    cbsp-bp01/com.szfesc.cbsp.bp.api.BpServiceJsonApi    |nacos-A(4)|
+---------------------------------------------------------+----------+
|    cbsp-bp02/com.szfesc.cbsp.bp.api.BpServiceJsonApi    |nacos-A(4)|
+---------------------------------------------------------+----------+
|   cbsp-bpclt01/com.szfesc.cbsp.bpclt.api.BpJsonCtrlApi  |nacos-A(2)|
+---------------------------------------------------------+----------+
|   cbsp-bpclt02/com.szfesc.cbsp.bpclt.api.BpJsonCtrlApi  |nacos-A(2)|
+---------------------------------------------------------+----------+
|cbsp-limt1/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(0)|
+---------------------------------------------------------+----------+
|cbsp-limt2/com.szfesc.cbsp.limt.api.LimitReProcServiceApi|nacos-A(2)|
+---------------------------------------------------------+----------+

Current suspicion

This may be related to the recovery path of application-level service discovery when mapping is initially empty.

Suspicious code areas:

  • ServiceDiscoveryRegistry.doSubscribe()
  • ServiceDiscoveryRegistry.DefaultMappingListener.onEvent()
  • ServiceDiscoveryRegistry.subscribeURLs()
  • ServiceInstancesChangedListener.addListenerAndNotify()
  • ServiceInstancesChangedListener.notifyAddressChanged()
  • ServiceNameMapping.buildMappingKey()
  • NacosMetadataReport.getServiceAppMapping()

Workaround

Changing:

dubbo.application.service-discovery.migration=FORCE_APPLICATION

to:

dubbo.application.service-discovery.migration=FORCE_INTERFACE

can avoid this issue, but this is only a workaround.

Frequency

This issue is not 100% reproducible on every startup. It appears under certain startup timing conditions, especially when mapping is initially empty and later recovered by callback.

If needed, I can provide more logs and help test a patch.

Do you have a (mini) reproduction demo?

  • Yes, I have a minimal reproduction demo to help resolve this issue more effectively!

Are you willing to submit a pull request to fix on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedEverything needs help from contributors

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions