Skip to content

Implements Retries 2.1 Updates#6925

Merged
aws-sdk-java-automation merged 23 commits intomasterfrom
feature/master/2026-new-retries
May 1, 2026
Merged

Implements Retries 2.1 Updates#6925
aws-sdk-java-automation merged 23 commits intomasterfrom
feature/master/2026-new-retries

Conversation

@dagnir
Copy link
Copy Markdown
Contributor

@dagnir dagnir commented Apr 30, 2026

Motivation and Context

This commit implements the retries 2.1 revision of SDK retries.

Modifications

This feature implements the v2.1 retry specification for the AWS SDK for Java v2. It introduces an opt-in feature gate (AWS_NEW_RETRIES_2026) that, when enabled, changes retry behavior across the SDK:

  • Default retry mode changes from LEGACY to STANDARD
  • Base backoff delay reduced from 100ms to 50ms
  • Differentiated token bucket costs: transient errors cost 14 tokens, throttling errors cost 5 tokens (previously both cost 5)
  • max_attempts profile property is now honored (previously only the env var/system property was checked)
  • x-amz-retry-after header replaces Retry-After for suggested delay, with clamping logic (min = strategy-computed, max = strategy-computed + 5s)
  • Backoff on acquire failure for long-polling operations (instead of immediately failing)
  • LimitExceededException treated as retryable + throttling
  • Long-polling operations (SQS ReceiveMessage, SFN GetActivityTask, SWF PollForActivityTask/PollForDecisionTask) get special backoff behavior on token exhaustion
  • Service-specific changes: DynamoDB max attempts reduced from 9 to 4; STS adds IdpCommunicationErrorException as retryable

This commit also bumps the version to 2.44.0.

The individual changes in the branch were reviewed separately:

Testing

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license

dagnir and others added 18 commits April 10, 2026 16:12
This property will be used to gate the retries update (v 2.1) behind a
feature flag. This option defaults to false.
Introduce AWS_NEW_RETRIES_2026 feature gate
* Use AWS_NEW_RETRIES_2026 during mode resolution

* Review comments
* 2.1 behavior in standard, adaptive strats

Support the 2.1 behavior changes in adaptive and standard strategies.
This includes the change in constant values based on 2.0 and 2.1, and
the application of a different cost for throttling retries in 2.1.

The 2.1 behavior is implemented as an overload of the builder() method
that accepts a boolean to select between 2.0 and 2.1. The no-arg version
defaults to false, i.e. 2.0.

* Review comments
* Factor new retries option in client builder

This commit updates the retry mode resolution behavior for SDK clients:

 - When resolving the RetryStrategy to use, also determine whether retry
   2.1 behavior should be enabled for that retry strategy.
 - If the `newRetries2026Default` property is set in
   customization.config for a service, ensure that this is treated as
   the default option for `AWS_NEW_RETRIES_2026` when building the
   client if not set anywhere else.

* Fix test

* Fix test
* Support backoff for long-polling

For operations that are long-polling, support backoffs in the Standard
retry strategy even there is insufficient retry quota.

* Review comments

* Fix checkstyle
This commit
 - Adds an isLongPolling() property to ClientExecutionParams
 - Adds SdkInternalExecutionAttribute.IS_LONG_POLLING that reflects the
   value from ClientExecutionParams during execution
 - Sets the longPolling() property on teh RefreshRetryTokenRequest when
   refreshing, based on the value of the IS_LONG_POLLING exec attribute
* Set longPoll ops for SQS,SWF,SFN

Add a customization to set the long polling trait for the following
service operations:

 - SQS#ReceiveMessage
 - SFN#GetActivityTask
 - SWF#PollForActivityTask
 - SWF#PollForDecisionTask

* Checkstyle fix

* Review comments

* Fix checkstyle
This commit adds the test cases defined in the retries spec for
Standard. There are two sets, for v2.0 and v2.1.

All of the v2.0 test cases are present, but v2.1 tests don't currently
have the x-amz-retry-after header tests (yet).
Retry strategies can optionally return a backoff even when token refresh
fails. Add support for this in the sync and async retry stages.

In RetryableStage and AsyncRetryableStage, check the value of
TokenAcquisitionFailedException#delay() backoff if given.
* Use x-amz-retry-after for suggested delay

In retries 2.1, 'x-amz-retry-after' must be parsed from the last
response and if present, should be used as the suggested delay (i.e.
suggested backoff) when attempting another retry.

This changes makes updates to RetryableStage and AsyncRetryableStage to
parse the header and pass it to the RetryStrategy. As part of this
change, we also need to plumb whether retries 2.1 is enabled from the
SDK client.

* Review comments
This commit fixes two issues with the original retries implementation:

 - Honor the 'max_retries' property in the profile file if present
 - Treat 'LimitExceededException' as a throttling exception
In 2.1:
 - base delay : 25ms (existing)
 - max attempts: 4 (down from 9)
For standard 2.1, there is additional treatment of the service suggested
backoff so it's at least as long as the strategy computed value, and not
longer than 5s more than the strategy computed value.
@dagnir dagnir added the api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team label Apr 30, 2026
@dagnir dagnir marked this pull request as ready for review April 30, 2026 19:43
@dagnir dagnir requested a review from a team as a code owner April 30, 2026 19:43
dagnir and others added 4 commits April 30, 2026 12:44
This will allow us to better track opt-in behavior in the UA.
This commit adds a fix to STS to retry on IDPCommunicationError. This is
implemented as a customization for the retry strategy lookup for the
client. This is gated behind the AWS_NEW_RETRIES_2026 flag.
Comment thread .changes/next-release/feature-AWSSDKforJavav2-8b47f8f.json Outdated
@dagnir dagnir enabled auto-merge May 1, 2026 01:23
@aws-sdk-java-automation aws-sdk-java-automation merged commit b69b75f into master May 1, 2026
8 of 11 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators May 1, 2026
@dagnir dagnir deleted the feature/master/2026-new-retries branch May 1, 2026 17:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants