fix: disable connection pooling to prevent stale connections after SnapStart#671
Merged
fix: disable connection pooling to prevent stale connections after SnapStart#671
Conversation
…apStart hyper-util's connection pool uses std::time::Instant (CLOCK_MONOTONIC) to track idle connection age. After Lambda SnapStart restore (or freeze/thaw cycles), CLOCK_MONOTONIC can be inconsistent, causing the pool to reuse dead connections and resulting in IncompleteMessage errors. This is a known upstream issue (hyperium/hyper#3810, rust-lang/rust#79462) with no fix planned in hyper-util or Rust stdlib. Disable connection pooling entirely via pool_max_idle_per_host(0). Since the adapter communicates with localhost, the overhead of creating a new TCP connection per request is negligible (microseconds). Closes #604
562d065 to
c06a65c
Compare
📊 Benchmark ComparisonBenchmark Details
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
hyper-util's connection pool uses
std::time::Instant(CLOCK_MONOTONIC) to track idle connection age. After Lambda SnapStart restore,CLOCK_MONOTONICcan be inconsistent —Instant::now()may appear before the storedidle_attimestamp, sosaturating_duration_sincereturns zero and the pool thinks stale connections are fresh.This causes intermittent
IncompleteMessageerrors on the first request after restore, as the adapter tries to send a request on a dead connection.This is a known upstream issue with no fix planned:
Fix
When
AWS_LAMBDA_INITIALIZATION_TYPE=snap-startis detected, disable connection pooling viapool_max_idle_per_host(0). Since the adapter communicates with the web app over localhost, the overhead of creating a new TCP connection per request is negligible.For non-SnapStart Lambda functions, the existing 4-second idle timeout pool is preserved — no behavior change.
Alternative considered
PR #569 proposed tracking idle time with
SystemTimeand recreating the entireClientwhen a timeout is detected. That approach:last_invoke) to theService::callpathAWS_LWA_CLIENT_IDLE_TIMEOUT_MS)Conditionally disabling pooling for SnapStart is simpler and targets the actual problem.
Testing
All 62 tests pass. No new tests needed — this is a config change conditional on an environment variable set by the Lambda runtime.
Closes #604