perf: optimize DynamoDB client timeouts and retry config#408
perf: optimize DynamoDB client timeouts and retry config#408
Conversation
There was a problem hiding this comment.
Pull request overview
Optimizes DynamoDB client behavior by tuning HTTP timeouts per request phase and customizing the AWS SDK retry configuration, aiming to reduce hangs and improve performance/reliability for in-region calls.
Changes:
- Introduces a dedicated
newDynamoHTTPClient()with phase-specific transport timeouts and updated connection pooling settings. - Adds optional SDK retry logging via
LOG_SDK_RETRIES. - Configures a custom DynamoDB SDK retryer (attempts/backoff/rate limiting).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Transport: &http.Transport{ | ||
| MaxIdleConns: 200, | ||
| MaxIdleConnsPerHost: 50, | ||
| DialContext: dialer.DialContext, | ||
| TLSHandshakeTimeout: 1100 * time.Millisecond, | ||
| ResponseHeaderTimeout: 5 * time.Second, | ||
| MaxIdleConns: 300, |
There was a problem hiding this comment.
ResponseHeaderTimeout is hard-capped at 5s. Some datastore operations (e.g., ClearServerData does a Query plus multiple TransactWriteItems batches) can legitimately take longer than 5s to produce response headers under load/throttling; this would surface as client-side timeouts and trigger SDK retries, potentially amplifying load and causing avoidable transient failures. Consider increasing this value and/or making it configurable (or relying on per-operation context deadlines) so long-running maintenance paths aren’t prematurely timed out.
| dialer := &net.Dialer{ | ||
| Timeout: 1100 * time.Millisecond, | ||
| KeepAlive: 30 * time.Second, | ||
| } |
There was a problem hiding this comment.
The 1.1s timeout value is duplicated (Dialer.Timeout and TLSHandshakeTimeout). Extracting a named constant (e.g., connectPhaseTimeout) would avoid accidental drift if one value is adjusted later and makes the tuning intent easier to maintain.
No description provided.