Skip to content

Update deferred retry backoff to quartic formula and default max retries to 20#271

Open
anuj-pal27 wants to merge 6 commits intorage-rb:mainfrom
anuj-pal27:deferred-retry-strategy-20
Open

Update deferred retry backoff to quartic formula and default max retries to 20#271
anuj-pal27 wants to merge 6 commits intorage-rb:mainfrom
anuj-pal27:deferred-retry-strategy-20

Conversation

@anuj-pal27
Copy link
Copy Markdown
Contributor

Description

This PR updates the default retry behavior for Rage::Deferred so failed tasks are retried for longer instead of exhausting too quickly.

What changed

  • Increased default max retries to 20

  • Updated default retry delay formula to:

    (attempt**4) + 10 + (rand(15) * attempt)

Why this change

Previously, retries could finish in just a few minutes, which is often too short for real incidents like temporary outages or bad deploys.

With this new formula, retries are spaced out more over time, so tasks have a better chance to succeed once the issue is fixed.

Test updates

I also updated tests in spec/rage/deferred/task_spec.rb to match the new behavior:

  • New delay expectations for attempts 0..4
  • Updated max retry boundary checks for MAX_ATTEMPTS = 20

Closes #251.

@anuj-pal27
Copy link
Copy Markdown
Contributor Author

hii @rsamoilov can you please review this PR when you have time. Thanks!

Copy link
Copy Markdown
Member

@rsamoilov rsamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anuj-pal27

This looks great! Max attempts set to 20 also makes sense.

As a general improvement, do you mind adding the will_retry_in? method to the deferred metadata? I think you'll need to cache the result of __next_retry_in, but otherwise this change would significantly improve observability.

@anuj-pal27
Copy link
Copy Markdown
Contributor Author

anuj-pal27 commented Apr 14, 2026

Thanks for the guidance @rsamoilov ! I’ll add will_retry_in to deferred metadata and cache the __next_retry_in result so it isn’t recomputed. I’ll push an update shortly.

@anuj-pal27
Copy link
Copy Markdown
Contributor Author

Hi @rsamoilov , quick update on the will_retry_in implementation approach:

I’m planning to add caching only for will_retry_in.
To avoid nil ambiguity (not computed yet vs computed and no retry), I’ll initialize the cache slot with a sentinel value.

Also, instead of writing directly to context[7] from Metadata, I plan to add Context.set_will_retry_in(...) so Metadata doesn’t depend on array index details and context structure stays encapsulated.

Does this approach look good to you?

@rsamoilov
Copy link
Copy Markdown
Member

Hi @anuj-pal27,

I don't think you need to update Context.

However, will_retry_in will depend on the result of __next_retry_in. The problem here is that __next_retry_in depends on __default_backoff that has a random component in it, and calling __next_retry_in multiple times will return different values.

Imagine a user monitors task failures and publishes the time when the task will be retried to their monitoring system. They would do it by calling Deferred::Metadata#will_retry_in, but then the queue will also call __next_retry_in and reschedule the task based on the value it returned. If the return value of __next_retry_in isn't memoized, the user will end up with an incorrect value in their monitoring system.

@anuj-pal27
Copy link
Copy Markdown
Contributor Author

Hey @rsamoilov — I’ve updated the changes. Quick summary:

  • Added Fiber-local memoization in __next_retry_in so the retry interval is computed once per attempt and reused.
  • This ensures meta.will_retry_in and queue rescheduling use the same value (important because default backoff includes rand).
  • Added Deferred::Metadata#wil_retry_in so tasks can read the retry delay during execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Deferred] Improve default retry strategy

2 participants