Skip to content

fix(http-client): reuse http.Transport to prevent ephemeral port exhaustion#167

Open
sk4mi wants to merge 2 commits into
crossplane-contrib:mainfrom
sk4mi:fix/reuse-http-transport
Open

fix(http-client): reuse http.Transport to prevent ephemeral port exhaustion#167
sk4mi wants to merge 2 commits into
crossplane-contrib:mainfrom
sk4mi:fix/reuse-http-transport

Conversation

@sk4mi
Copy link
Copy Markdown

@sk4mi sk4mi commented Mar 24, 2026

Use a process-level cache of http.Client instances keyed by TLS config and timeout. This ensures connection pooling across reconcile cycles, preventing TCP connection leaks that exhaust the ephemeral port range.

  • Cache http.Client/Transport in a package-level sync.Map keyed by SHA-256 of TLS config + timeout
  • Fix response.Body.Close() ordering with defer to prevent leaks if io.ReadAll() fails
  • All TLS configurations (plain, InsecureSkipVerify, custom CA/certs) are cached uniformly

Description of your changes

This PR fixes ephemeral TCP port exhaustion caused by creating a new http.Client and http.Transport on every SendRequest() call. Since NewClient() is called per-reconcile via Connect(), any per-instance state is destroyed each cycle, so transports were never reused — leaving idle connections in TCP TIME_WAIT until the entire ephemeral port range was exhausted.

What changed:

  • Introduced a process-level sync.Map cache (httpClientCache) that stores http.Client instances keyed by a SHA-256 hash of their TLS configuration + timeout. This ensures transport reuse across reconcile cycles for all connection types (plain HTTP, InsecureSkipVerify, custom CA bundles, client certificates).
  • Fixed response.Body.Close() to use defer immediately after the response is received, preventing body/connection leaks if io.ReadAll() fails.

Reviewers should note:

  • The cache is intentionally unbounded — in practice, the number of distinct TLS config + timeout combinations is very small (typically 1-3). Adding eviction would add complexity with no real benefit.
  • LoadOrStore is used instead of Load/Store to handle the race where two goroutines create clients for the same key simultaneously — only one wins, the other is discarded.
  • The Client interface and all external behavior are unchanged. This is a purely internal optimization.

Production context: This was observed on a production cluster where a single provider-http pod accumulated 28,280 open sockets, exhausting the Linux ephemeral port range (32768-60999) and causing dial tcp: connect: cannot assign requested address errors at ~16/min.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

The fix was validated at multiple levels:

Unit tests (new):

  • TestHTTPClientCacheReuse: Two separate client instances (simulating consecutive reconcile cycles) with identical TLS config receive the same cached http.Client from the process-level sync.Map, confirming cross-reconcile transport reuse.
  • TestTransportReuse: Sends 50 sequential HTTP requests through a single client instance and verifies all succeed — proving connection reuse and no port leak under sustained load.

Existing tests (unchanged, all pass):

  • TestSendRequest covers GET, POST, custom headers, auth token, InsecureSkipVerify, and invalid TLS — all go through the new cache path.
  • TestSendRequestIntegration performs end-to-end requests against httptest.NewTLSServer with real TLS handshakes.
  • TestHTTPClientConfiguration confirms timeout propagation and proxy settings are preserved on cached clients.
  • TestBuildTLSConfig (10 sub-tests) validates all TLS config permutations (nil, empty, CA bundle, client certs, InsecureSkipVerify).

CI gate:

  • make reviewable test passes (golangci-lint, go vet, all project tests).

@sk4mi sk4mi force-pushed the fix/reuse-http-transport branch from a599209 to 563f376 Compare March 24, 2026 11:37
sk4mi added 2 commits March 25, 2026 10:15
…ustion

Use a process-level cache of http.Client instances keyed by TLS config
and timeout. This ensures connection pooling across reconcile cycles,
preventing TCP connection leaks that exhaust the ephemeral port range.

- Cache http.Client/Transport in a package-level sync.Map keyed by
  SHA-256 of TLS config + timeout
- Fix response.Body.Close() ordering with defer to prevent leaks
  if io.ReadAll() fails
- All TLS configurations (plain, InsecureSkipVerify, custom CA/certs)
  are cached uniformly

Signed-off-by: Leo Garel <leo.garel@airnity.com>
… and client caching behavior

Signed-off-by: Leo Garel <leo.garel@airnity.com>
@sk4mi sk4mi closed this Mar 25, 2026
@sk4mi sk4mi force-pushed the fix/reuse-http-transport branch from d5d7377 to 7d354a6 Compare March 25, 2026 09:15
@sk4mi sk4mi reopened this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant