Skip to content

fix: disable connection pooling to prevent stale connections after SnapStart#671

Merged
bnusunny merged 1 commit intomainfrom
fix-snapstart-connection-pool
Mar 20, 2026
Merged

fix: disable connection pooling to prevent stale connections after SnapStart#671
bnusunny merged 1 commit intomainfrom
fix-snapstart-connection-pool

Conversation

@bnusunny
Copy link
Copy Markdown
Contributor

@bnusunny bnusunny commented Mar 20, 2026

Problem

hyper-util's connection pool uses std::time::Instant (CLOCK_MONOTONIC) to track idle connection age. After Lambda SnapStart restore, CLOCK_MONOTONIC can be inconsistent — Instant::now() may appear before the stored idle_at timestamp, so saturating_duration_since returns zero and the pool thinks stale connections are fresh.

This causes intermittent IncompleteMessage errors on the first request after restore, as the adapter tries to send a request on a dead connection.

This is a known upstream issue with no fix planned:

Fix

When AWS_LAMBDA_INITIALIZATION_TYPE=snap-start is detected, disable connection pooling via pool_max_idle_per_host(0). Since the adapter communicates with the web app over localhost, the overhead of creating a new TCP connection per request is negligible.

For non-SnapStart Lambda functions, the existing 4-second idle timeout pool is preserved — no behavior change.

Alternative considered

PR #569 proposed tracking idle time with SystemTime and recreating the entire Client when a timeout is detected. That approach:

  • Duplicates hyper's pool logic at a higher level
  • Adds mutable state (last_invoke) to the Service::call path
  • Doesn't coordinate correctly across clones (relevant for Lambda Managed Instances)
  • Exposes an internal detail as a user-facing env var (AWS_LWA_CLIENT_IDLE_TIMEOUT_MS)

Conditionally disabling pooling for SnapStart is simpler and targets the actual problem.

Testing

All 62 tests pass. No new tests needed — this is a config change conditional on an environment variable set by the Lambda runtime.

Closes #604

…apStart

hyper-util's connection pool uses std::time::Instant (CLOCK_MONOTONIC)
to track idle connection age. After Lambda SnapStart restore (or
freeze/thaw cycles), CLOCK_MONOTONIC can be inconsistent, causing the
pool to reuse dead connections and resulting in IncompleteMessage errors.

This is a known upstream issue (hyperium/hyper#3810, rust-lang/rust#79462)
with no fix planned in hyper-util or Rust stdlib.

Disable connection pooling entirely via pool_max_idle_per_host(0). Since
the adapter communicates with localhost, the overhead of creating a new
TCP connection per request is negligible (microseconds).

Closes #604
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

📊 Benchmark Comparison

Benchmark Details
  • Baseline: main branch
  • Comparison: This PR
  • Threshold: 10% regression triggers warning

@bnusunny bnusunny merged commit 7c4afe6 into main Mar 20, 2026
7 checks passed
@bnusunny bnusunny deleted the fix-snapstart-connection-pool branch March 20, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread panic errors when using sam local

1 participant