diff --git a/source/client-backpressure/client-backpressure.md b/source/client-backpressure/client-backpressure.md index 388815724a..c78877adbf 100644 --- a/source/client-backpressure/client-backpressure.md +++ b/source/client-backpressure/client-backpressure.md @@ -110,16 +110,10 @@ overload error, including those not eligible for retry under the updateMany, create collection, getMore, and generic runCommand. The new command execution method obeys the following rules: -1. `attempt` is the execution attempt number (starting with 0). Note that `attempt` includes retries for errors that - are not overload errors (this might include attempts under other retry policies, see +1. `attempt` is the execution attempt number (starting with 0). Note that `attempt` includes retries for errors that are + not overload errors (this might include attempts under other retry policies, see [Interactions with Other Retry Policies](./client-backpressure.md#interaction-with-other-retry-policies)). -2. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens. - - The value is 0.1 and non-configurable. -3. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens. -4. If a retry attempt fails with an error that is not an overload error, drivers MUST deposit 1 token. - - An error that does not contain the `SystemOverloadedError` error label indicates that the server is healthy enough - to handle requests. For the purposes of retry budget tracking, this counts as a success. -5. A retry attempt will only be permitted if: +2. A retry attempt will only be permitted if: 1. The error is a retryable overload error. 2. We have not reached `MAX_RETRIES`. - The value of `MAX_RETRIES` is 5 and non-configurable. @@ -128,35 +122,50 @@ rules: 3. (CSOT-only): There is still time for a retry attempt according to the [Client Side Operations Timeout](../client-side-operations-timeout/client-side-operations-timeout.md) specification. - 4. A token can be consumed from the token bucket. - 5. The command is a write and [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) is enabled or the + 4. The command is a write and [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) is enabled or the command is a read and [retryReads](../retryable-reads/retryable-reads.md#retryreads) is enabled. - To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and - [retryReads](../retryable-reads/retryable-reads.md#retryreads) must be enabled. See + [retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See [Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand) -6. A retry attempt consumes 1 token from the token bucket. -7. If the request is eligible for retry (as outlined in step 5), the client MUST apply exponential backoff according to - the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` +3. If the request is eligible for retry (as outlined in step 2 above and step 4 in the + [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST apply + exponential backoff according to the following formula: + `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` - `jitter` is a random jitter value between 0 and 1. - `BASE_BACKOFF` is constant 100ms. - `MAX_BACKOFF` is 10000ms. - This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter. -8. If the request is eligible for retry (as outlined in step 5), the client MUST add the previously used server's - address to the list of deprioritized server addresses for +4. If the request is eligible for retry (as outlined in step 2 above and step 4 in the + [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST add the + previously used server's address to the list of deprioritized server addresses for [server selection](../server-selection/server-selection.md). -9. If the request is eligible for retry (as outlined in step 5) and is a retryable write: +5. If the request is eligible for retry (as outlined in step 2 above and step 4 in the + [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below) and is a retryable write: 1. If the command is a part of a transaction, the instructions for command modification on retry for commands in transactions MUST be followed, as outlined in the [transactions](../transactions/transactions.md#interaction-with-retryable-writes) specification. 2. If the command is a not a part of a transaction, the instructions for command modification on retry for retryable writes MUST be followed, as outlined in the [retryable writes](../retryable-writes/retryable-writes.md) specification. -10. If the request is not eligible for any retries, then the client MUST propagate errors following the behaviors +6. If the request is not eligible for any retries, then the client MUST propagate errors following the behaviors described in the [retryable reads](../retryable-reads/retryable-reads.md), - [retryable writes](../retryable-writes/retryable-writes.md) and the - [transactions](../transactions/transactions.md) specifications. + [retryable writes](../retryable-writes/retryable-writes.md) and the [transactions](../transactions/transactions.md) + specifications. - For the purposes of error propagation, `runCommand` is considered a write. +##### Adaptive retry requirements + +If adaptive retries are enabled, the following rules MUST also be obeyed: + +1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens. + - The value is 0.1 and non-configurable. +2. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens. +3. If a retry attempt fails with an error that is not an overload error, drivers MUST deposit 1 token. + - An error that does not contain the `SystemOverloadedError` error label indicates that the server is healthy enough + to handle requests. For the purposes of retry budget tracking, this counts as a success. +4. A retry attempt will only be permitted if a token can be consumed from the token bucket. +5. A retry attempt consumes 1 token from the token bucket. + #### Interaction with Other Retry Policies The retry policy in this specification is separate from the other retry policies defined in the @@ -164,7 +173,7 @@ The retry policy in this specification is separate from the other retry policies specifications. Drivers MUST ensure: - Only overload errors consume tokens from the token bucket before retrying. -- When a failed attempt is retried, backoff must be applied if and only if the error is an overload error. +- When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error. - If an overload error is encountered: - Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes `MAX_RETRIES`. @@ -196,11 +205,12 @@ def execute_command_retryable(command, ...): server = select_server(deprioritized_servers) connection = server.getConnection() res = execute_command(connection, command) - # Deposit tokens into the bucket on success. - tokens = RETRY_TOKEN_RETURN_RATE - if attempt > 0: - tokens += 1 - token_bucket.deposit(tokens) + if adaptive_retry: + # Deposit tokens into the bucket on success. + tokens = RETRY_TOKEN_RETURN_RATE + if attempt > 0: + tokens += 1 + token_bucket.deposit(tokens) return res except PyMongoError as exc: is_retryable = (is_retryable_write(command, exc) @@ -209,7 +219,7 @@ def execute_command_retryable(command, ...): is_overload = exc.contains_error_label("SystemOverloadedError") # if a retry fails with an error which is not an overload error, deposit 1 token - if attempt > 0 and not is_overload: + if adaptive_retry and attempt > 0 and not is_overload: token_bucket.deposit(1) # Raise if the error is non-retryable. @@ -234,7 +244,7 @@ def execute_command_retryable(command, ...): if time.monotonic() + backoff > _csot.get_deadline(): raise - if not token_bucket.consume(1): + if adaptive_retry and not token_bucket.consume(1): raise time.sleep(backoff) @@ -242,16 +252,19 @@ def execute_command_retryable(command, ...): ### Token Bucket -The overload retry policy introduces a per-client [token bucket](https://en.wikipedia.org/wiki/Token_bucket) to limit -overload error retry attempts. Although the server rejects excess commands as quickly as possible, doing so costs CPU -and creates extra contention on the connection pool which can eventually negatively affect goodput. To reduce this risk, -the token bucket will limit retry attempts during a prolonged overload. +The overload retry policy introduces an opt-in per-client [token bucket](https://en.wikipedia.org/wiki/Token_bucket) to +limit overload error retry attempts. Although the server rejects excess commands as quickly as possible, doing so costs +CPU and creates extra contention on the connection pool which can eventually negatively affect goodput. To reduce this +risk, the token bucket will limit retry attempts during a prolonged overload. + +The token bucket MUST be disabled by default and can be enabled through the +[adaptiveRetries=True](../uri-options/uri-options.md) connection and client options. The token bucket starts at its maximum capacity of 1000 for consistency with the server. -Each MongoClient instance MUST have its own token bucket. The token bucket MUST be created when the MongoClient is -initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token bucket implementation is -thread-safe as it may be accessed concurrently by multiple operations. +Each MongoClient instance MUST have its own token bucket. When adaptive retries are enabled, the token bucket MUST be +created when the MongoClient is initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token +bucket implementation is thread-safe as it may be accessed concurrently by multiple operations. #### Pseudocode @@ -449,4 +462,6 @@ retrying a write command when only `retryReads` is enabled. ## Changelog +- 2026-02-20: Disable token buckets by default. + - 2026-01-09: Initial version. diff --git a/source/client-backpressure/tests/README.md b/source/client-backpressure/tests/README.md index 22efe29b20..5daf4348cb 100644 --- a/source/client-backpressure/tests/README.md +++ b/source/client-backpressure/tests/README.md @@ -62,9 +62,67 @@ Drivers should test that retries do not occur immediately when a SystemOverloade Drivers should test that retry token buckets are created at their maximum capacity and that that capacity is enforced. -1. Let `client` be a `MongoClient`. +1. Let `client` be a `MongoClient` with `adaptiveRetries=True`. 2. Assert that the client's retry token bucket is at full capacity and that the capacity is `DEFAULT_RETRY_TOKEN_CAPACITY`. 3. Using `client`, execute a successful `ping` command. 4. Assert that the successful command did not increase the number of tokens in the bucket above `DEFAULT_RETRY_TOKEN_CAPACITY`. + +#### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times + +Drivers should test that without adaptive retries enabled, overload errors are retried a maximum of five times. + +1. Let `client` be a `MongoClient` with command event monitoring enabled. + +2. Let `coll` be a collection. + +3. Configure the following failpoint: + + ```javascript + { + configureFailPoint: 'failCommand', + mode: 'alwaysOn', + data: { + failCommands: ['find'], + errorCode: 462, // IngressRequestRateLimitExceeded + errorLabels: ['SystemOverloadedError', 'RetryableError'] + } + } + ``` + +4. Perform a find operation with `coll` that fails. + +5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. + +6. Assert that the total number of started commands is MAX_RETRIES + 1 (6). + +#### Test 4: Adaptive Retries are Limited by Token Bucket Tokens + +Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket. + +1. Let `client` be a `MongoClient` with `adaptiveRetries=True` and command event monitoring enabled. + +2. Set `client`'s retry token bucket to have 2 tokens. + +3. Let `coll` be a collection. + +4. Configure the following failpoint: + + ```javascript + { + configureFailPoint: 'failCommand', + mode: {times: 3}, + data: { + failCommands: ['find'], + errorCode: 462, // IngressRequestRateLimitExceeded + errorLabels: ['SystemOverloadedError', 'RetryableError'] + } + } + ``` + +5. Perform a find operation with `coll` that fails. + +6. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. + +7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries. diff --git a/source/uri-options/tests/client-backpressure-options.json b/source/uri-options/tests/client-backpressure-options.json new file mode 100644 index 0000000000..3fcf2c86b0 --- /dev/null +++ b/source/uri-options/tests/client-backpressure-options.json @@ -0,0 +1,35 @@ +{ + "tests": [ + { + "description": "adaptiveRetries=true is parsed correctly", + "uri": "mongodb://example.com/?adaptiveRetries=true", + "valid": true, + "warning": false, + "hosts": null, + "auth": null, + "options": { + "adaptiveRetries": true + } + }, + { + "description": "adaptiveRetries=false is parsed correctly", + "uri": "mongodb://example.com/?adaptiveRetries=false", + "valid": true, + "warning": false, + "hosts": null, + "auth": null, + "options": { + "adaptiveRetries": false + } + }, + { + "description": "adaptiveRetries with invalid value causes a warning", + "uri": "mongodb://example.com/?adaptiveRetries=invalid", + "valid": true, + "warning": true, + "hosts": null, + "auth": null, + "options": null + } + ] +} diff --git a/source/uri-options/tests/client-backpressure-options.yml b/source/uri-options/tests/client-backpressure-options.yml new file mode 100644 index 0000000000..534261205f --- /dev/null +++ b/source/uri-options/tests/client-backpressure-options.yml @@ -0,0 +1,27 @@ +tests: + - + description: "adaptiveRetries=true is parsed correctly" + uri: "mongodb://example.com/?adaptiveRetries=true" + valid: true + warning: false + hosts: ~ + auth: ~ + options: + adaptiveRetries: true + - + description: "adaptiveRetries=false is parsed correctly" + uri: "mongodb://example.com/?adaptiveRetries=false" + valid: true + warning: false + hosts: ~ + auth: ~ + options: + adaptiveRetries: false + - + description: "adaptiveRetries with invalid value causes a warning" + uri: "mongodb://example.com/?adaptiveRetries=invalid" + valid: true + warning: true + hosts: ~ + auth: ~ + options: ~ diff --git a/source/uri-options/uri-options.md b/source/uri-options/uri-options.md index b5b9903b79..cd9e07184b 100644 --- a/source/uri-options/uri-options.md +++ b/source/uri-options/uri-options.md @@ -72,6 +72,7 @@ to URI options apply here. | Name | Accepted Values | Default Value | Optional to implement? | Description | | ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| adaptiveRetries | "true" or "false" | "false" | no | Whether to enable adaptive reries as described by the [client backpressure spec](../client-backpressure/client-backpressure.md#token-bucket) | | appname | any string that meets the criteria listed in the [handshake spec](../mongodb-handshake/handshake.md#client-application-name) | no appname specified | no | Passed into the server in the client metadata as part of the connection handshake | | authMechanism | any string; valid values are defined in the [auth spec](../auth/auth.md#supported-authentication-methods) | None; default values for authentication exist for constructing authentication credentials per the [auth spec](../auth/auth.md#supported-authentication-methods), but there is no default for the URI option itself. | no | The authentication mechanism method to use for connection to the server | | authMechanismProperties | comma separated key:value pairs, e.g. "opt1:val1,opt2:val2" | no properties specified | no | Additional options provided for authentication (e.g. to enable hostname canonicalization for GSSAPI) |