Skip to content

fix(#3453): resolve concurrency, leak, and panic issues from code audit#3461

Open
HarshMehta112 wants to merge 2 commits into
apache:developfrom
HarshMehta112:main
Open

fix(#3453): resolve concurrency, leak, and panic issues from code audit#3461
HarshMehta112 wants to merge 2 commits into
apache:developfrom
HarshMehta112:main

Conversation

@HarshMehta112

Copy link
Copy Markdown
Contributor

Fixes #3453

Addresses six source-grounded findings around concurrency correctness, memory/goroutine
leaks, and panic-based failure handling. Each fix is narrowly scoped with focused tests.

1. RoundRobin selection now atomic under concurrency

cluster/loadbalance/roundrobin/loadbalance.go
The weighted-round-robin select+update sequence was only per-field atomic, so concurrent
callers could observe each other's intermediate currentWeight and skew distribution.
Added a per-key sync.Mutex on cachedInvokers so the full select+update transaction is
serialized per service+method (contention bounded to the same key).

2. Getty server no longer panics on config errors

remoting/getty/getty_server.go
Replaced panic(err) in config marshal/unmarshal, CheckValidity(), and newSession()
with structured error logging / error returns. Added safeYAMLMarshal to recover the
panic yaml.v3 raises on unmarshalable types (channels, funcs) and surface it as an error.
Malformed protocol config no longer escalates to process-level failure.

3. Graceful shutdown result publication hardened

graceful_shutdown/shutdown.go
Added shutdownResultMu guarding all reads/writes of the package-level shutdownResult,
removing reliance on subtle ordering assumptions between the background shutdown goroutine
and concurrent Shutdown() readers.

4. TPS limiter cache is now bounded

filter/tps/limiter/method_service.go
The limiter map grew for process lifetime as distinct service+method combos accumulated.
Wrapped each strategy in a tpsLimitEntry tracking last-access time, switched to sync.Map,
and added a singleton background runCleanup goroutine that evicts entries idle past a TTL.

5. Script-router JS runtime no longer leaks cross-request state

cluster/router/script/instance/js_instance.go
Pooled goja.Runtime instances were never returned to the pool, and request/script globals
survived reuse. Added defer reset()+Put() in Run(). reset() snapshots the built-in
global keys at construction and deletes every global outside that set — clearing both request
bindings and user-script-defined globals while preserving built-ins.

6. Forking cluster cancels losing parallel branches

cluster/cluster/forking/cluster_invoker.go
Losing parallel invocations ran to completion after the winner returned, wasting downstream
work. Derived a cancelable forkCtx; defer cancel() stops sibling requests as soon as the
first accepted result wins.

Tests

  • TestRoundRobinByWeightConcurrent — 50 goroutines × 30 selections, race-clean
  • TestInitServerDoesNotPanicOnUnmarshalableParams — channel in Params no longer panics
  • TestShutdownResultConsistentUnderConcurrentReaders, TestShutdownPanicSetsResultError
  • TestTpsLimitEntryUpdatesLastAccess, TestTpsLimiterStaleEntryEviction
  • TestJsInstanceResetClearsBindings (incl. user-defined global), TestJsInstanceRunReturnsRuntimeToPool
  • TestForkingInvokeCancelsLosingBranches — loser branch ctx canceled after winner succeeds

All 74 tests across the 6 packages pass; go vet and gofmt clean.

Scope

Covers audit items 1–6 only. Item 7 (broader panic/fatal policy audit) is left for follow-up
PRs per the issue's suggested slicing. Items A/B/C are already fixed upstream.

Checklist

  • I confirm the target branch is develop
  • Code has passed local testing
  • I have added tests that prove my fix is effective or that my feature works

…erformance issues

Signed-off-by: Harsh Mehta <harshmehta010102@gmail.com>
…uster tests

Signed-off-by: Harsh Mehta <harshmehta010102@gmail.com>
@sonarqubecloud

Copy link
Copy Markdown

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.75758% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.09%. Comparing base (8b5bbcf) to head (5cfc4a8).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
filter/tps/limiter/method_service.go 70.83% 6 Missing and 1 partial ⚠️
remoting/getty/getty_server.go 61.53% 5 Missing ⚠️
cluster/router/script/instance/js_instance.go 75.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3461      +/-   ##
===========================================
+ Coverage    53.57%   54.09%   +0.52%     
===========================================
  Files          460      461       +1     
  Lines        35182    35530     +348     
===========================================
+ Hits         18847    19220     +373     
+ Misses       14857    14791      -66     
- Partials      1478     1519      +41     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Alanxtl

Alanxtl commented Jun 29, 2026

Copy link
Copy Markdown
Member
  1. comment in the issue before summitting pr
  2. split this pr into 6 small prs, this one is too large to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.3.2 version 3.3.2 ☢️ Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code audit: memory leaks, goroutine leaks, deadlocks and performance issues

3 participants