You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pull request introduces a comprehensive set of resilience features to the Lighthouse SDK, significantly
improving its reliability and robustness in production environments. The key additions are automatic retries
with exponential backoff, configurable rate limiting, and request timeouts.
Summary of Changes
Automatic Retries: The SDK now automatically retries failed API requests caused by network errors or
specific server-side issues (e.g., 5xx status codes, 429 rate limiting). This is handled by a new withRetry
utility that uses an exponential backoff strategy with jitter to prevent overwhelming the API.
Rate Limiting: A token bucket rate limiter has been implemented to control the request frequency, helping
to avoid hitting API rate limits.
Request Timeouts: All API requests now include a timeout mechanism to prevent them from hanging
indefinitely.
New Utilities:
resilientFetch: A drop-in replacement for fetch that incorporates all the new resilience features.
resilientUpload: A specialized version for file uploads that supports progress tracking while also
being resilient.
This is a draft pr and I will be awaiting for review to finalize testing.
Add comprehensive resilience features with retry, rate limiting, and timeouts
Implement exponential backoff retry mechanism with configurable conditions
Create token bucket rate limiter for API request throttling
Replace standard fetch with resilient alternatives across SDK
Diagram Walkthrough
flowchart LR
A["Standard fetch"] --> B["resilientFetch"]
B --> C["Rate Limiter"]
B --> D["Retry Logic"]
B --> E["Timeout Control"]
F["File Upload"] --> G["resilientUpload"]
G --> H["Progress Tracking"]
G --> C
G --> D
Loading
File Walkthrough
Relevant files
Enhancement
9 files
getAllKeys.ts
Replace fetch with resilientFetch for IPNS operations
The refillTokens method modifies shared state but isn't synchronized. Multiple concurrent requests could lead to token calculation errors if they call waitForToken simultaneously.
asyncwaitForToken(): Promise<void>{this.refillTokens()if(this.tokens>=1){this.tokens-=1return}// Calculate wait time for next tokenconstwaitTime=(1/this.refillRate)*1000awaitnewPromise(resolve=>setTimeout(resolve,waitTime))// Recursively wait if still no tokens availablereturnthis.waitForToken()}
The dynamic import of the retry module could cause unexpected behavior if the module fails to load at runtime. Consider using static imports for critical functionality.
The retry logic assumes all errors have status or code properties. This might cause unexpected behavior when handling errors that don't match the expected structure.
constdefaultRetryCondition=(error: RetryableError): boolean=>{// Network errorsif(error.code&&['ECONNRESET','ETIMEDOUT','ENOTFOUND','ECONNREFUSED'].includes(error.code)){returntrue}// HTTP status codes that should be retriedif(error.status){returnerror.status===429||// Too Many Requestserror.status===502||// Bad Gatewayerror.status===503||// Service Unavailableerror.status===504// Gateway Timeout}returnfalse}
The error object created on failed responses doesn't match the FetchError type used elsewhere in the codebase. Use FetchError instead of a generic Error to maintain consistency with the retry system.
xhr.onload = () => {
const headers = new Headers()
xhr
.getAllResponseHeaders()
.trim()
.split(/[\r\n]+/)
.forEach((line) => {
const parts = line.split(': ')
const header = parts.shift()
const value = parts.join(': ')
if (header) headers.set(header, value)
})
const response = new Response(xhr.response, {
status: xhr.status,
statusText: xhr.statusText,
headers: headers,
})
if (!response.ok) {
- const error = new Error(`Request failed with status code ${xhr.status}`)- ;(error as any).status = xhr.status+ const error = new FetchError(+ `Request failed with status code ${xhr.status}`,+ xhr.status+ )
reject(error)
} else {
resolve(response)
}
}
Apply / Chat
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies an inconsistency in error handling and proposes using the custom FetchError type, which is crucial for the new retry mechanism to work correctly.
Medium
Fix error handling
The error handling assumes error always has a status property, but if a network or other non-HTTP error occurs, error.status will be undefined. Add a check to ensure the error object has the expected structure.
} catch (error: any) {
- if (error.status === 400) {+ if (error && error.status === 400) {
throw new Error("Proof Doesn't exist yet")
}
- throw new Error(error.message)+ throw new Error(error?.message || 'Unknown error occurred')
}
Apply / Chat
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that the error object may not have a status or message property, and the proposed change makes the error handling more robust against unexpected error types.
Medium
Prevent stack overflow risk
The recursive implementation of waitForToken() could cause a stack overflow for high traffic scenarios. Replace the recursion with a loop to prevent potential stack overflow errors when many tokens are needed in succession.
async waitForToken(): Promise<void> {
- this.refillTokens()-- if (this.tokens >= 1) {- this.tokens -= 1- return+ while (true) {+ this.refillTokens()++ if (this.tokens >= 1) {+ this.tokens -= 1+ return+ }++ // Calculate wait time for next token+ const waitTime = (1 / this.refillRate) * 1000+ await new Promise(resolve => setTimeout(resolve, waitTime))
}
-- // Calculate wait time for next token- const waitTime = (1 / this.refillRate) * 1000- await new Promise(resolve => setTimeout(resolve, waitTime))-- // Recursively wait if still no tokens available- return this.waitForToken()
}
Apply / Chat
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies that async recursion is not optimized and could lead to stack overflow, proposing a more robust iterative solution with a while loop.
Low
More
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
This pull request introduces a comprehensive set of resilience features to the Lighthouse SDK, significantly
improving its reliability and robustness in production environments. The key additions are automatic retries
with exponential backoff, configurable rate limiting, and request timeouts.
Summary of Changes
specific server-side issues (e.g., 5xx status codes, 429 rate limiting). This is handled by a new withRetry
utility that uses an exponential backoff strategy with jitter to prevent overwhelming the API.
to avoid hitting API rate limits.
indefinitely.
being resilient.
This is a draft pr and I will be awaiting for review to finalize testing.
PR closes #134
PR Type
Enhancement
Description
Add comprehensive resilience features with retry, rate limiting, and timeouts
Implement exponential backoff retry mechanism with configurable conditions
Create token bucket rate limiter for API request throttling
Replace standard fetch with resilient alternatives across SDK
Diagram Walkthrough
File Walkthrough
9 files
Replace fetch with resilientFetch for IPNS operationsAdd resilient fetch with custom retry conditionsReplace fetchWithTimeout with resilientUpload for file uploadsExport resilience utilities and configuration presetsImplement token bucket rate limiter with burst capacityCreate resilient fetch wrapper with retry and timeoutImplement resilient upload with progress tracking supportAdd exponential backoff retry mechanism with jitterExport new resilience utilities from utils module1 files
Create configuration presets for different API operation types