Skip to content

fix(cluster): fix goroutine leak in forking cluster invoker#3436

Open
Aias00 wants to merge 3 commits into
apache:developfrom
Aias00:worktree-fix-forking-goroutine-leak
Open

fix(cluster): fix goroutine leak in forking cluster invoker#3436
Aias00 wants to merge 3 commits into
apache:developfrom
Aias00:worktree-fix-forking-goroutine-leak

Conversation

@Aias00

@Aias00 Aias00 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Problem

The forking cluster invoker (cluster/cluster/forking/cluster_invoker.go) spawns multiple goroutines to invoke providers concurrently, but only consumes the first result via Poll(1, timeout). The remaining goroutines are never cancelled, causing a goroutine leak:

  1. No context cancellation: the original ctx is passed to all forked goroutines, so even after the first result is returned, the other Invoke calls continue running indefinitely.
  2. Queue never disposed: resultQ is never closed/disposed, so goroutines that complete after the first result still successfully Put into a queue that nobody reads from, and the queue itself is never cleaned up.
  3. Excessive error logging: when the queue is eventually GC'd or when remaining goroutines try to Put after disposal, the error-level log creates noise in normal timeout scenarios.

Fix

  • Use context.WithCancel to create a cancellable forkCtx; defer cancel() so all forked goroutines are signaled to stop when Invoke returns.
  • Pass forkCtx instead of ctx to forked goroutines so their Invoke calls respect the cancellation.
  • Call resultQ.Dispose() immediately after Poll to release queue resources and make remaining goroutines' Put calls return ErrDisposed promptly.
  • Downgrade Put error log from Errorf to Debugf since ErrDisposed is an expected condition after the queue is disposed.

Testing

All existing tests pass with no regression:

ok  dubbo.apache.org/dubbo-go/v3/cluster/cluster/forking
ok  dubbo.apache.org/dubbo-go/v3/cluster/...  (all sub-packages)

Co-Authored-By: Claude noreply@anthropic.com

The forking cluster invoker spawns multiple goroutines to invoke
providers concurrently, but only consumes the first result via
Poll(1, timeout). The remaining goroutines are never cancelled:

1. No context cancellation: the original ctx is passed to all forked
   goroutines, so even after the first result is returned, the other
   Invoke calls continue running indefinitely.

2. Queue never disposed: resultQ is never closed/disposed, so goroutines
   that complete after the first result still successfully Put into a
   queue that nobody reads from, and the queue itself is never cleaned
   up.

3. Excessive error logging: when the queue is eventually GC'd or when
   remaining goroutines try to Put after disposal, the error-level log
   creates noise in normal timeout scenarios.

Fix:
- Use context.WithCancel to create a cancellable forkCtx; defer cancel()
  so all forked goroutines are signaled to stop when Invoke returns.
- Pass forkCtx instead of ctx to forked goroutines so their Invoke calls
  respect the cancellation.
- Call resultQ.Dispose() immediately after Poll to release queue resources
  and make remaining goroutines' Put calls return ErrDisposed promptly.
- Downgrade Put error log from Errorf to Debugf since ErrDisposed is an
  expected condition after the queue is disposed.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 17, 2026 06:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds cancellation and queue disposal to the forking cluster invoker to reduce noisy logs and help forked goroutines exit sooner after an invocation completes.

Changes:

  • Introduces a derived cancellable context (forkCtx) for forked invocations.
  • Disposes the result queue after polling to force remaining Put calls to fail fast.
  • Downgrades Put failure logging to debug with a rationale comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +75 to +76
forkCtx, cancel := context.WithCancel(ctx)
defer cancel() // cancel forkCtx when Invoke returns, signaling all forked goroutines to stop
Comment on lines +82 to +84
// ErrDisposed is expected when another goroutine's result was already
// consumed and the queue disposed; log at debug level to avoid noise.
logger.Debugf("[Cluster][Forking] resultQ put failed, the queue is probably disposed: %v", err)
@@ -72,16 +72,25 @@ func (invoker *forkingClusterInvoker) Invoke(ctx context.Context, invocation pro
}

resultQ := queue.New(1)
@codecov-commenter

codecov-commenter commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.58%. Comparing base (60d1c2a) to head (2fa4349).
⚠️ Report is 861 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3436      +/-   ##
===========================================
+ Coverage    46.76%   53.58%   +6.81%     
===========================================
  Files          295      493     +198     
  Lines        17172    38393   +21221     
===========================================
+ Hits          8031    20571   +12540     
- Misses        8287    16171    +7884     
- Partials       854     1651     +797     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants