Skip to content

feat(deferred): Improve default retry strategy#257

Open
xingzihai wants to merge 1 commit intorage-rb:mainfrom
xingzihai:improve-default-retry-strategy
Open

feat(deferred): Improve default retry strategy#257
xingzihai wants to merge 1 commit intorage-rb:mainfrom
xingzihai:improve-default-retry-strategy

Conversation

@xingzihai
Copy link
Copy Markdown

Summary

This PR improves the default retry strategy for Rage::Deferred tasks as described in #251.

Changes

  1. Increased MAX_ATTEMPTS from 5 to 15

    • With 5 retries, tasks exhausted all attempts in ~2.7 minutes
    • With 15 retries, tasks have ~2 days before exhausting all attempts
    • This gives developers more time to discover and fix underlying issues
  2. Changed backoff formula from exponential to polynomial

    • Old: rand(5 * 2**attempt) + 1 (exponential growth)
    • New: (attempt**4) + 10 + (rand(15) * attempt) (polynomial growth)
    • This spaces retries further apart with each attempt

Retry timing comparison

Attempt Old (avg) New (avg)
1 ~3s ~18s
2 ~6s ~41s
3 ~11s ~2m
4 ~21s ~5m
5 ~41s ~11m
10 - ~2.8h
15 - ~14.1h

Total retry time

  • Old: ~2.7 minutes (5 retries)
  • New: ~2 days (15 retries)

Testing

Updated spec tests to verify the new backoff intervals:

  • __next_retry_in(0, nil) returns between 11-25 seconds
  • __next_retry_in(1, nil) returns between 26-54 seconds
  • __next_retry_in(15, nil) returns between 50635-50845 seconds (~14h)
  • __next_retry_in(16, nil) returns nil (max exceeded)

Closes #251

- Change MAX_ATTEMPTS from 5 to 15 for ~2 days total retry time
- Change backoff formula from exponential to polynomial:
  (attempt+1)**4 + 10 + rand(15) * (attempt+1)
- Remove BACKOFF_INTERVAL constant (no longer needed)
- Update tests to match new backoff behavior

This gives developers more time to discover and fix underlying
issues before retry attempts are exhausted.

Closes rage-rb#251
Copy link
Copy Markdown
Contributor

@anuj-pal27 anuj-pal27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two changes needed to align with issue #251 and maintainer feedback:

  • Set MAX_ATTEMPTS to 20 (not 15).
    In the issue discussion, the direction moved toward 20 to better cover real-world outages (especially weekend/longer incidents). 15 improves things, but 20 gives a much safer retry window by default, which is the main goal of #251.
  • Don’t increment attempt in __default_backoff: Using attempt + 1 shifts the retry timing and skips the intended first tier (~18s avg), making retry 1 behave like retry 2.

We should keep the formula exactly as specified:

def __default_backoff(attempt)
  (attempt**4) + 10 + (rand(15) * attempt)
end

@rsamoilov — is this understanding correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Deferred] Improve default retry strategy

2 participants