-
Notifications
You must be signed in to change notification settings - Fork 244
DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout #1919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e823c79
8606361
c9d33f5
03d852b
5e7f32d
0b5e354
89f3aea
3d06dcd
04e5bff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -122,7 +122,8 @@ rules: | |
| 2. A retry attempt will only be permitted if: | ||
| 1. The error is a retryable overload error. | ||
| 2. We have not reached `MAX_RETRIES`. | ||
| - The value of `MAX_RETRIES` is 5 and non-configurable. | ||
| - The default value of `MAX_RETRIES` is 2. Drivers MUST expose `maxAdaptiveRetries` as a configurable option for | ||
| this maximum. | ||
| - This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within | ||
| the timeout to avoid retry storms. | ||
| 3. (CSOT-only): There is still time for a retry attempt according to the | ||
|
|
@@ -133,20 +134,17 @@ rules: | |
| - To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and | ||
| [retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See | ||
| [Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand) | ||
| 3. If the request is eligible for retry (as outlined in step 2 above and step 4 in the | ||
| [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST apply | ||
| exponential backoff according to the following formula: | ||
| `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` | ||
| 3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff | ||
| according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` | ||
| - `jitter` is a random jitter value between 0 and 1. | ||
| - `BASE_BACKOFF` is constant 100ms. | ||
| - `MAX_BACKOFF` is 10000ms. | ||
| - This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter. | ||
| 4. If the request is eligible for retry (as outlined in step 2 above and step 4 in the | ||
| [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST add the | ||
| previously used server's address to the list of deprioritized server addresses for | ||
| [server selection](../server-selection/server-selection.md). | ||
| 5. If the request is eligible for retry (as outlined in step 2 above and step 4 in the | ||
| [adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below) and is a retryable write: | ||
| - This results in delays of 100ms and 200ms before accounting for jitter. | ||
| 4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the | ||
| client MUST add the previously used server's address to the list of deprioritized server addresses for | ||
| [server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a | ||
| configurable boolean option that defaults to `false`. | ||
|
Comment on lines
+143
to
+146
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the no knobs mantra, we need a very good reason to add The specification gives no justification behind the a)
b)
The justification does not seem correct to me, and I don't think we should add the a)The alleged interpretation of
Any application that uses b)The b) part is even weaker:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current contract we've made with users utilizing Overload retargeting significantly increases availability during overload, but it does increase the risk of getting stale data when used with
The larger discussion here is what exactly
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add to the Q&A?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where did you find this "only if" guarantee? The documentation does not give such a guarantee, and has a different, which I quoted in my previous comment. Furthermore, given that "unavailable" is not defined (nor should it be), we are free to decide and change what "unavailable" is however we want - it won't break the guarantees of the
Yes, that is true. Moreover, network-related issues, server becoming unresponsive due to being overloaded (has been happening, and may continue to happen, but hopefully rarely) have the same effect. All of this is part of the From a comment in a different thread:
I think, that is incorrect, and the documentation seem to be explicit about that (see below).
If an application needs a guaranteed non-stale and durable read, then it may use the
If you think that their current documented meaning is wrong, then figuring this out is definitely within the scope, as the justification for proposing the Let's put the above aside, for a moment, and consider what the This option is equivalent to introducing a new read preference field, controlling how strongly the primary/secondary is to be preferred in I don't think that usefulness of such a change have been demonstrated, and think that user requests are a necessity for considering it to be introduced. However, at least from the API standpoint, such an approach seems sound. TL;DR
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I forgot to say that similarly to the other proposed option, I discussed this one with some people from the Java driver team. They also think that |
||
| 5. If the request is eligible for retry (as outlined in step 2 above) and is a retryable write: | ||
| 1. If the command is a part of a transaction, the instructions for command modification on retry for commands in | ||
| transactions MUST be followed, as outlined in the | ||
| [transactions](../transactions/transactions.md#interaction-with-retryable-writes) specification. | ||
|
|
@@ -159,26 +157,12 @@ rules: | |
| specifications. | ||
| - For the purposes of error propagation, `runCommand` is considered a write. | ||
|
|
||
| ##### Adaptive retry requirements | ||
|
|
||
| If adaptive retries are enabled, the following rules MUST also be obeyed: | ||
|
|
||
| 1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens. | ||
| - The value is 0.1 and non-configurable. | ||
| 2. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens. | ||
| 3. If a retry attempt fails with an error that is not an overload error, drivers MUST deposit 1 token. | ||
| - An error that does not contain the `SystemOverloadedError` error label indicates that the server is healthy enough | ||
| to handle requests. For the purposes of retry budget tracking, this counts as a success. | ||
| 4. A retry attempt will only be permitted if a token can be consumed from the token bucket. | ||
| 5. A retry attempt consumes 1 token from the token bucket. | ||
|
|
||
| #### Interaction with Other Retry Policies | ||
|
|
||
| The retry policy in this specification is separate from the other retry policies defined in the | ||
| [retryable reads](../retryable-reads/retryable-reads.md) and [retryable writes](../retryable-writes/retryable-writes.md) | ||
| specifications. Drivers MUST ensure: | ||
|
|
||
| - Only overload errors consume tokens from the token bucket before retrying. | ||
| - When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error. | ||
| - If an overload error is encountered: | ||
| - Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes | ||
|
|
@@ -198,8 +182,7 @@ included, such as error handling with `NoWritesPerformed` labels. | |
| BASE_BACKOFF = 0.1 # 100ms | ||
| MAX_BACKOFF = 10 # 10000ms | ||
|
|
||
| RETRY_TOKEN_RETURN_RATE = 0.1 | ||
| MAX_RETRIES = 5 | ||
| MAX_RETRIES = 2 | ||
|
|
||
| def execute_command_retryable(command, ...): | ||
| deprioritized_servers = [] | ||
|
|
@@ -211,23 +194,13 @@ def execute_command_retryable(command, ...): | |
| server = select_server(deprioritized_servers) | ||
| connection = server.getConnection() | ||
| res = execute_command(connection, command) | ||
| if adaptive_retry: | ||
| # Deposit tokens into the bucket on success. | ||
| tokens = RETRY_TOKEN_RETURN_RATE | ||
| if attempt > 0: | ||
| tokens += 1 | ||
| token_bucket.deposit(tokens) | ||
| return res | ||
| except PyMongoError as exc: | ||
| is_retryable = (is_retryable_write(command, exc) | ||
| or is_retryable_read(command, exc) | ||
| or (exc.contains_error_label("RetryableError") and exc.contains_error_label("SystemOverloadedError"))) | ||
| is_overload = exc.contains_error_label("SystemOverloadedError") | ||
|
|
||
| # if a retry fails with an error which is not an overload error, deposit 1 token | ||
| if adaptive_retry and attempt > 0 and not is_overload: | ||
| token_bucket.deposit(1) | ||
|
|
||
| # Raise if the error is non-retryable. | ||
| if not is_retryable: | ||
| raise | ||
|
|
@@ -238,8 +211,10 @@ def execute_command_retryable(command, ...): | |
|
|
||
| if attempt > allowed_retries: | ||
| raise | ||
|
|
||
| deprioritized_servers.append(server.address) | ||
|
|
||
| # enableOverloadRetargeting is true | ||
| if overload_retargeting: | ||
| deprioritized_servers.append(server.address) | ||
|
|
||
| if is_overload: | ||
| jitter = random.random() # Random float between [0.0, 1.0). | ||
|
|
@@ -250,59 +225,9 @@ def execute_command_retryable(command, ...): | |
| if time.monotonic() + backoff > _csot.get_deadline(): | ||
| raise | ||
|
|
||
| if adaptive_retry and not token_bucket.consume(1): | ||
| raise | ||
|
|
||
| time.sleep(backoff) | ||
| ``` | ||
|
|
||
| ### Token Bucket | ||
|
|
||
| The overload retry policy introduces an opt-in per-client [token bucket](https://en.wikipedia.org/wiki/Token_bucket) to | ||
| limit overload error retry attempts. Although the server rejects excess commands as quickly as possible, doing so costs | ||
| CPU and creates extra contention on the connection pool which can eventually negatively affect goodput. To reduce this | ||
| risk, the token bucket will limit retry attempts during a prolonged overload. | ||
|
|
||
| The token bucket MUST be disabled by default and can be enabled through the | ||
| [adaptiveRetries=True](../uri-options/uri-options.md) connection and client options. | ||
|
|
||
| The token bucket starts at its maximum capacity of 1000 for consistency with the server. | ||
|
|
||
| Each MongoClient instance MUST have its own token bucket. When adaptive retries are enabled, the token bucket MUST be | ||
| created when the MongoClient is initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token | ||
| bucket implementation is thread-safe as it may be accessed concurrently by multiple operations. | ||
|
|
||
| #### Pseudocode | ||
|
|
||
| The token bucket is implemented via a thread safe counter. For languages without atomics, this can be implemented via a | ||
| lock, for example: | ||
|
|
||
| ```python | ||
| DEFAULT_RETRY_TOKEN_CAPACITY = 1000 | ||
| class TokenBucket: | ||
| """A token bucket implementation for rate limiting.""" | ||
| def __init__( | ||
| self, | ||
| capacity: float = DEFAULT_RETRY_TOKEN_CAPACITY, | ||
| ): | ||
| self.lock = Lock() | ||
| self.capacity = capacity | ||
| self.tokens = capacity | ||
|
|
||
| def consume(self, n: float) -> bool: | ||
| """Consume n tokens from the bucket if available.""" | ||
| with self.lock: | ||
| if self.tokens >= n: | ||
| self.tokens -= n | ||
| return True | ||
| return False | ||
|
|
||
| def deposit(self, n: float) -> None: | ||
| """Deposit n tokens back into the bucket.""" | ||
| with self.lock: | ||
| self.tokens = min(self.capacity, self.tokens + n) | ||
| ``` | ||
|
|
||
| #### Handshake changes | ||
|
|
||
| Drivers conforming to this spec MUST add `"backpressure": True` to the | ||
|
|
@@ -466,8 +391,43 @@ Additionally, both `retryReads` and `retryWrites` are enabled by default, so for | |
| retried. This approach also prevents accidentally retrying a read command when only `retryWrites` is enabled, or | ||
| retrying a write command when only `retryReads` is enabled. | ||
|
|
||
| ### Why make `maxAdaptiveRetries` configurable? | ||
|
|
||
| Modelling and the underpinning theory for backpressure shows that the n-retries approach (retry up to N times on | ||
| overload errors without a token bucket) can introduce retry storms as overload increases. However, the specifics of the | ||
| workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an | ||
| additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many | ||
| additional retries, while others may want to break out of the loop much earlier. | ||
|
|
||
| The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a | ||
| benefit rather than a negative during overload. However, savvy users, the users expected to be most affected by overload | ||
| and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on | ||
| a per-workload basis produces better results. Additionally, there are situations where disabling overload retries | ||
| entirely is optimal. Without a knob, those situations will cause users to either have a strictly worse experience with a | ||
| new driver, or force them to downgrade to an older driver to avoid the issue. These are two strong motivations to add a | ||
| knob for `maxAdaptiveRetries`. | ||
|
|
||
| ### Why make `enableOverloadRetargeting` configurable? | ||
|
|
||
| The current contract we've made with users utilizing `primaryPreferred` is that reads will only go to a secondary if the | ||
| primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary | ||
| is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload | ||
| error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result | ||
| in stale data, enabling overload retargeting increases the chance that users of `primaryPreferred` will get stale data | ||
| when they did not previously. This is a semantic change, and so retargeting is disabled by default, with a knob to | ||
| enable it. | ||
|
|
||
| Overload retargeting significantly increases availability during overload, but it does increase the risk of getting | ||
| stale data when used with `primaryPreferred`. Users of `primaryPreferred` may widely end up preferring that behavior. If | ||
| that is the case, overload retargeting may be enabled by default in the future. | ||
|
|
||
| `secondaryPreferred` does not have this same staleness issue, but it still materially changes what the preference means | ||
| from "almost always secondary" to "sometimes primary". | ||
|
|
||
| ## Changelog | ||
|
|
||
| - 2026-03-30: Introduce phase 1 support without token buckets. | ||
|
|
||
| - 2026-02-20: Disable token buckets by default. | ||
|
|
||
| - 2026-01-09: Initial version. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,7 +14,8 @@ be manually implemented by each driver. | |
|
|
||
| #### Test 1: Operation Retry Uses Exponential Backoff | ||
|
|
||
| Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. | ||
| Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. This test MUST be | ||
| executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option. | ||
|
|
||
| 1. Let `client` be a `MongoClient` | ||
| 2. Let `collection` be a collection | ||
|
|
@@ -49,29 +50,19 @@ Drivers should test that retries do not occur immediately when a SystemOverloade | |
|
|
||
| 5. Execute step 3 again. | ||
|
|
||
| 6. Compare the two time between the two runs. | ||
| 6. Compare the time between the two runs. | ||
|
|
||
| ```python | ||
| assertTrue(with_backoff_time - no_backoff_time >= 2.1) | ||
| assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 0.3 seconds)) < 0.3 seconds) | ||
| ``` | ||
|
|
||
| The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two | ||
| runs. | ||
|
|
||
| #### Test 2: Token Bucket Capacity is Enforced | ||
|
|
||
| Drivers should test that retry token buckets are created at their maximum capacity and that that capacity is enforced. | ||
|
|
||
| 1. Let `client` be a `MongoClient` with `adaptiveRetries=True`. | ||
| 2. Assert that the client's retry token bucket is at full capacity and that the capacity is | ||
| `DEFAULT_RETRY_TOKEN_CAPACITY`. | ||
| 3. Using `client`, execute a successful `ping` command. | ||
| 4. Assert that the successful command did not increase the number of tokens in the bucket above | ||
| `DEFAULT_RETRY_TOKEN_CAPACITY`. | ||
| The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the | ||
| two runs. | ||
|
Comment on lines
+59
to
+60
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following the logic used in the previous version of this test, wouldn't this make the comparison value in the assertion 0? (i.e. backoff time minus buffer window)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, correct.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned that this test will be flaky because of the small time values. WDYT about restoring the original logic and just setting
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could add another test to explicitly verify proper backoff calculation that configures
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed over slack, we decided to update the prose test to assert on a window of values rather than a lower bound. |
||
|
|
||
| #### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times | ||
|
|
||
| Drivers should test that without adaptive retries enabled, overload errors are retried a maximum of five times. | ||
| Drivers should test that overload errors are retried a maximum of MAX_RETRIES times. This test MUST be executed against | ||
| a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option. | ||
|
|
||
| 1. Let `client` be a `MongoClient` with command event monitoring enabled. | ||
|
|
||
|
|
@@ -95,24 +86,24 @@ Drivers should test that without adaptive retries enabled, overload errors are r | |
|
|
||
| 5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. | ||
|
|
||
| 6. Assert that the total number of started commands is MAX_RETRIES + 1 (6). | ||
| 6. Assert that the total number of started commands is MAX_RETRIES + 1 (3). | ||
|
|
||
| #### Test 4: Adaptive Retries are Limited by Token Bucket Tokens | ||
| #### Test 4: Overload Errors are Retried a Maximum of maxAdaptiveRetries times when configured | ||
isabelatkinson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket. | ||
| Drivers should test that overload errors are retried a maximum of `maxAdaptiveRetries` times, when configured. This test | ||
| MUST be executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` | ||
| option. | ||
|
|
||
| 1. Let `client` be a `MongoClient` with `adaptiveRetries=True` and command event monitoring enabled. | ||
| 1. Let `client` be a `MongoClient` with `maxAdaptiveRetries=1` and command event monitoring enabled. | ||
|
|
||
| 2. Set `client`'s retry token bucket to have 2 tokens. | ||
|
|
||
| 3. Let `coll` be a collection. | ||
| 2. Let `coll` be a collection. | ||
|
|
||
| 4. Configure the following failpoint: | ||
| 3. Configure the following failpoint: | ||
|
|
||
| ```javascript | ||
| { | ||
| configureFailPoint: 'failCommand', | ||
| mode: {times: 3}, | ||
| mode: 'alwaysOn', | ||
| data: { | ||
| failCommands: ['find'], | ||
| errorCode: 462, // IngressRequestRateLimitExceeded | ||
|
|
@@ -121,8 +112,8 @@ Drivers should test that when enabled, adaptive retries are limited by the numbe | |
| } | ||
| ``` | ||
|
|
||
| 5. Perform a find operation with `coll` that fails. | ||
| 4. Perform a find operation with `coll` that fails. | ||
|
|
||
| 6. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. | ||
| 5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. | ||
|
|
||
| 7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries. | ||
| 6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the no knobs mantra, we need a very good reason to add
maxAdaptiveRetries.The specification gives no justification behind the
maxAdaptiveRetriesknob and its default valuefalse. I think, the justification should be added. The only justifications I found are in Decision: Driver Backpressure Retry Behavior Defaults1:1 If the document has other justifications and I missed them - I apologize, I haven't read it fully. If there are other justifications, and you know of them, could you please state them explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know from our modelling and from the underpinning theory that the n-retries approach (retry up to N times on overload errors, no token bucket) can introduce retry storms as overload increases. However, the specifics of the workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many additional retries, while others may want to break out of the loop much earlier.
The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a benefit rather than a negative during overload. However, savvy users, the users we expect to be most affected by overload and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on a per-workload basis produces better results. Additionally, there are situations where disabling overload retries entirely is optimal. Without a knob, those situations will cause users to either have a strictly worse experience with a new driver, or force them to downgrade to an older driver to avoid the issue. These are two strong motivations to add a knob for
maxAdaptiveRetries.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not lose these motivations in the PR. Can you add to the Q&A?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You explicitly say that the proposed option is for "savvy users", who you separate from "most users". We discussed this proposed API change within the Java driver team, and this seems as clear of a case for applying the aforementioned no knobs mantra, as there can be - it explicitly instructs not to add a knob if a typical user would have no idea how to choose a correct value. If the mantra is not applicable in this case, then in what case could it possibly be applicable?
Granted, some options may be needed even if their usage requires a user to be advanced, like
readConcernLevel, because the functionality they expose is inherently advanced. But the proposed option is not one of those, as it is about performance tuning.Could you please elaborate on what are those situations? Please keep in mind that an application can't change its decision about retries caused by retryable overload errors at runtime. An application will have to either choose to always have them disabled, or always have them enabled. The benefit has to be quite significant, to justify us introducing a performance knob. But even then, I would argue that the decision about such a performance knob should based on real data and/or explicit user needs.
TL;DR
maxAdaptiveRetriesoption clearly violates the no knobs mantra.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because savvy (i.e. power) users are likely doing more complex and heavy workloads with more load, and are therefore at a higher risk of causing overload on their servers.
Both of the proposed knobs are easy to choose correct values for once a user observes the behavior with the defaults.
Neither
maxAdaptiveRetriesorenableOverloadRetargetingare purely for performance tuning.maxAdaptiveRetriesis really configuring the length of the retry window during periods of overload. This window size changes what a given workload produces during overload. Our simulations show that higher maximum retry counts reduce error rates at the cost of higher latency and additional server load. The specifics of a user's application dictate where the threshold for cutting off retries is. Any default value we choose at this point will produce poor results for some users: without additional data gathered after the release of this feature, we are guessing.enableOverloadRetargetingis similarly both a behavioral and performance configuration. For example, take a three-node replica set with a primary and two secondaries where all operations are performed withprimaryPreferred. When overload occurs on the primary without overload retargeting enabled, all retries will go to the primary. There is no chance of stale data, but error rates will be higher, and the retries will cause additional load on the primary.When overload occurs on the primary with overload retargeting enabled, retries will go to a secondary. This increases the risk of stale data, but improves the capacity of the system significantly while also allowing the primary to recover from overload more quickly. However, this is meaningfully different from the current behavior of
primaryPreferred, so enabling it by default risks breaking user applications.One situation where users may want to disable overload retries entirely is a non-critical workload that runs against a consistently overloaded cluster also used by critical workloads. Overload retries on the non-critical workload would worsen the performance of the critical workloads and lengthen the periods of overload, so disabling them is likely optimal.
There is a separate project following this release to gather retry attempt telemetry. This will enable us to determine the best follow-ups for improving backpressure based on real data rather than guessing. These knobs also make it possible to see if other defaults would work better in real-world situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the decision to include these knobs has already been made at the initiative level (me, Matt, Patrick, and the distinguished engineers). Your concerns are completely valid and appreciated: similar questions were raised during the decision-making process, with answers summarized in the added FAQ sections. If those sections (or others associated with the new knobs) are missing important context or aren't as clear as they need to be, I'm happy to work with you to ensure they're fully captured in both the spec and the user documentation.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q:
A:
Are you saying that savvy users are more likely to build a system that ends up being overloaded in production? This must mean they are more likely to incorrectly estimate the load / incorrectly choose hardware / incorrectly design for the required load than the typical users. I fail to see how this explanation may work out.
This only confirms that one has to do elaborate testing under overload to choose the value.
The same could be said about read/write retries we had before the current project, yet there is no option for specifying the number of retry attempts.
If a deployment is consistently overloaded, then nothing may save the situation. The best we may hope for is that the
mongod/mongosservers won't become unresponsive like they are doing now (it's impossible to prevent this completely, we may only hope to rise the bar of the request rate at which they become unresponsive). I don't believe that a reasonable user running a production such that it is consistently overloaded is a thing, because it is almost equivalent to the system not working1. Where did this scenario come from?P.S. Please, let's not mix the conversation about unrelated things in the same thread. I replied about
enableOverloadRetargetingin the thread corresponding to that option.1 The response times grow exponentially as throughput (how much work per unit of time the system is doing currently) approaches the raw throughput (the maximum amount of work per unit of time the system is capable of doing), and, therefore, one should always strive to run a production system such that the throughput stays noticeably smaller than the raw throughput, which avoids overloading the system.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision about the specification requirements that this PR changes have also been made. Not only made, but the changes were approved as the specification. Yet we are here changing them a few month after merging them.
It is always possible for my manager, @katcharov, or someone with a rank above him, to override and instruct me to back off. Until then, I am doing my job (reviewing the PR I was assigned to review) by expressing concerns and objections.