Skip to content

perf: optimize DynamoDB client timeouts and retry config#408

Open
mschfh wants to merge 1 commit intomasterfrom
mschfh-sdk-opts
Open

perf: optimize DynamoDB client timeouts and retry config#408
mschfh wants to merge 1 commit intomasterfrom
mschfh-sdk-opts

Conversation

@mschfh
Copy link
Copy Markdown
Contributor

@mschfh mschfh commented Apr 9, 2026

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes DynamoDB client behavior by tuning HTTP timeouts per request phase and customizing the AWS SDK retry configuration, aiming to reduce hangs and improve performance/reliability for in-region calls.

Changes:

  • Introduces a dedicated newDynamoHTTPClient() with phase-specific transport timeouts and updated connection pooling settings.
  • Adds optional SDK retry logging via LOG_SDK_RETRIES.
  • Configures a custom DynamoDB SDK retryer (attempts/backoff/rate limiting).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mschfh mschfh marked this pull request as ready for review April 10, 2026 13:48
@mschfh mschfh requested a review from Copilot April 10, 2026 13:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 93 to +97
Transport: &http.Transport{
MaxIdleConns: 200,
MaxIdleConnsPerHost: 50,
DialContext: dialer.DialContext,
TLSHandshakeTimeout: 1100 * time.Millisecond,
ResponseHeaderTimeout: 5 * time.Second,
MaxIdleConns: 300,
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseHeaderTimeout is hard-capped at 5s. Some datastore operations (e.g., ClearServerData does a Query plus multiple TransactWriteItems batches) can legitimately take longer than 5s to produce response headers under load/throttling; this would surface as client-side timeouts and trigger SDK retries, potentially amplifying load and causing avoidable transient failures. Consider increasing this value and/or making it configurable (or relying on per-operation context deadlines) so long-running maintenance paths aren’t prematurely timed out.

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +90
dialer := &net.Dialer{
Timeout: 1100 * time.Millisecond,
KeepAlive: 30 * time.Second,
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1.1s timeout value is duplicated (Dialer.Timeout and TLSHandshakeTimeout). Extracting a named constant (e.g., connectPhaseTimeout) would avoid accidental drift if one value is adjusted later and makes the tuning intent easier to maintain.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants