fix: reuse HTTP connections in fallback service to prevent port exhaustion by penalosa · Pull Request #6115 · cloudflare/workerd

penalosa · 2026-02-19T15:29:06Z

Summary

The FallbackServiceClient (used for module resolution in local dev via vitest-pool-workers) maintains a persistent background thread with a long-lived KJ HTTP client, but connections were never actually reused. This caused ~4000 TIME_WAIT sockets per test run, leading to port exhaustion (EADDRNOTAVAIL) after 2-4 consecutive runs.

Root Cause

Two issues prevented connection reuse:

301 redirect response bodies were never drained. KJ's HTTP client calls abortRead() on unread response bodies, which sets broken = true on the connection. This makes canReuse() return false, so the connection pool discards the connection and opens a new TCP socket for every subsequent request. The fallback service returns 301 redirects frequently for module aliasing.
Stale pooled connections caused DISCONNECTED errors. Between requests, the background thread blocks on kj::MutexGuarded::when() (OS-level mutex wait), not the KJ event loop. KJ's watchForClose() can only detect server-side connection closure via the event loop, so if the server closes an idle connection during the mutex wait, the next request uses a dead connection.

Changes

Drain response bodies on 301 redirects (readAllBytes().wait()) in both V1 and V2 paths so KJ can return the connection to the pool
Retry once on kj::Exception::Type::DISCONNECTED to handle stale pooled connections transparently
Wire up FallbackServiceClient in both V1 (server.c++) and V2 (workerd-api.c++) code paths — previously both paths were using per-request threads via the free tryResolve() function

Validation

Before fix (npm workerd): Port exhaustion after 2 runs (16,273 TIME_WAIT sockets)
After fix: 20 consecutive runs all passing (51/51 tests each), TIME_WAIT stable at ~1,000-1,300
Test suite: 51 vitest files with singleWorker: false, stressing concurrent module resolution

Fixes cloudflare/workers-sdk#7954

codecov-commenter · 2026-02-19T15:58:19Z

Codecov Report

❌ Patch coverage is 1.43541% with 206 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.60%. Comparing base (4f0e03e) to head (574c0da).
⚠️ Report is 24 commits behind head on main.

Files with missing lines	Patch %	Lines
src/workerd/server/fallback-service.c++	0.00%	193 Missing ⚠️
src/workerd/server/workerd-api.c++	0.00%	7 Missing ⚠️
src/workerd/server/server.c++	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6115      +/-   ##
==========================================
- Coverage   70.61%   70.60%   -0.01%     
==========================================
  Files         409      409              
  Lines      109295   109442     +147     
  Branches    18072    18037      -35     
==========================================
+ Hits        77178    77276      +98     
- Misses      21269    21359      +90     
+ Partials    10848    10807      -41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…stion The FallbackServiceClient uses a persistent background thread with a long-lived KJ HTTP client to serve all module resolution requests, but connections were not actually being reused because: 1. 301 redirect response bodies were never drained. KJ's HTTP client calls abortRead() on unread bodies, setting broken=true on the connection, which makes canReuse() return false. Every subsequent request opened a new TCP socket. 2. Between requests, the background thread blocks on MutexGuarded::when() (OS-level mutex), not the KJ event loop. KJ's watchForClose() can only detect server-side connection closure via the event loop, so stale pooled connections caused DISCONNECTED exceptions. Fixes: - Drain response bodies on 301 redirects (readAllBytes().wait()) - Add retry-once on kj::Exception::Type::DISCONNECTED for stale connections - Wire up FallbackServiceClient in both V1 (server.c++) and V2 (workerd-api.c++) Without this fix, each test run creates ~4000 TIME_WAIT sockets, causing port exhaustion after 2-4 consecutive runs of vitest-pool-workers test suites. With the fix, TIME_WAIT stays at ~1000-1300 across 20+ runs.

codspeed-hq · 2026-02-19T17:05:10Z

Merging this PR will degrade performance by 15.89%

❌ 1 regressed benchmark
✅ 69 untouched benchmarks
⏩ 129 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`jsonResponse[Response]`	39.8 µs	47.4 µs	-15.89%

_{Comparing penalosa/fallback-service-thread-pool (574c0da) with main (02c8669)}

129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

jasnell · 2026-02-20T14:17:24Z

src/workerd/server/fallback-service.c++

+  }
+
+  // Block until the background thread has processed our request.
+  return state.when([](const SharedState& s) { return s.responseReady || s.shutdown; },


I'll need to experiment with it a bit to verify everything but this is good. One thing we might consider, however, is having a small pool of these clients so that multiple concurrent loads aren't blocking each other and bottlenecking things quite as much.

jasnell · 2026-02-20T14:27:12Z

src/workerd/server/fallback-service.c++

+
+        // Retry once on disconnect (stale pooled connection).
+        for (int attempt = 0; attempt < 2; attempt++) {
+          try {


Suggested change

try {

KJ_TRY {

Harris recently introduce a new utility here, KJ_TRY { } KJ_CATCH(exception) { } that reduces the boilerplate for this pattern. Using this, you won't need to worry about the auto exception = kj::getCaughtExceptionAsKj(); pattern.

Recommend making the change here and elsewhere in this pr where you use this pattern.

src/workerd/server/fallback-service.c++

github-actions · 2026-02-20T19:23:57Z

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

…eptionAsKj Replace the boilerplate pattern of catch(...) followed by kj::getCaughtExceptionAsKj() with the newer KJ_TRY/KJ_CATCH macros that handle the conversion automatically.

penalosa force-pushed the penalosa/fallback-service-thread-pool branch from 51abe5b to ae5977b Compare February 19, 2026 16:47

penalosa force-pushed the penalosa/fallback-service-thread-pool branch from ae5977b to e148d48 Compare February 20, 2026 11:49

jasnell reviewed Feb 20, 2026

View reviewed changes

src/workerd/server/fallback-service.c++ Outdated Show resolved Hide resolved

penalosa force-pushed the penalosa/fallback-service-thread-pool branch from a537526 to 7fca872 Compare February 20, 2026 21:32

refactor: use KJ_TRY/KJ_CATCH instead of raw try/catch + getCaughtExc…

574c0da

…eptionAsKj Replace the boilerplate pattern of catch(...) followed by kj::getCaughtExceptionAsKj() with the newer KJ_TRY/KJ_CATCH macros that handle the conversion automatically.

penalosa force-pushed the penalosa/fallback-service-thread-pool branch from 7fca872 to 574c0da Compare February 20, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: reuse HTTP connections in fallback service to prevent port exhaustion#6115

fix: reuse HTTP connections in fallback service to prevent port exhaustion#6115
penalosa wants to merge 2 commits intomainfrom
penalosa/fallback-service-thread-pool

penalosa commented Feb 19, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 19, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

jasnell Feb 20, 2026

Uh oh!

jasnell Feb 20, 2026 •

edited

Loading

Uh oh!

penalosa Feb 20, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

penalosa commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Validation

Uh oh!

codecov-commenter commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 15.89%

Performance Changes

Footnotes

Uh oh!

jasnell Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

jasnell Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penalosa Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

penalosa commented Feb 19, 2026 •

edited

Loading

codecov-commenter commented Feb 19, 2026 •

edited

Loading

codspeed-hq bot commented Feb 19, 2026 •

edited

Loading

jasnell Feb 20, 2026 •

edited

Loading

github-actions bot commented Feb 20, 2026 •

edited

Loading