Skip to content

[TESTING] Chore: Add RPC Fallback and Retry Mechanism#991

Open
Sharqiewicz wants to merge 7 commits intostagingfrom
feat/add-rpc-fallback
Open

[TESTING] Chore: Add RPC Fallback and Retry Mechanism#991
Sharqiewicz wants to merge 7 commits intostagingfrom
feat/add-rpc-fallback

Conversation

@Sharqiewicz
Copy link
Member

@Sharqiewicz Sharqiewicz commented Jan 2, 2026

Summary

This PR introduces a robust RPC fallback and retry mechanism for all EVM interactions, improving reliability when RPC endpoints are temporarily unavailable or rate-limited.

Why We Implemented This

Problem: RPC endpoints can fail intermittently due to:

  • Rate limiting
  • Network congestion
  • Temporary outages
  • Geographic routing issues

Previously, a single RPC failure would cause the entire operation to fail, requiring manual retry by the user or degraded UX.

What's Changed

1. New smartFallbackTransport (packages/shared)

A custom viem transport that provides:

Feature Description
Fast Failover First cycle through RPCs has no delay - try each RPC quickly
Exponential Backoff On subsequent cycles, delay increases (1.5x multiplier)
Centered Jitter ±20% randomization prevents thundering herd
Non-retryable Detection Errors like "revert" or "invalid params" fail immediately
Max Delay Cap Prevents excessive wait times (default 10s cap)

2. Updated clientManager.ts

  • All networks now use smart fallback transport (including single-URL configs)
  • Empty string "" preserved as fallback to viem's default RPC
  • Consistent retry behavior across backend services

3. Updated wagmiConfig.ts (Frontend)

  • User wallet interactions now have RPC fallback
  • Configured with public RPC fallbacks for each chain

4. Updated API Handlers

Migrated from old *WithRetry methods to new transport-level retry:

  • moonbeam.controller.ts
  • monerium-onramp-self-transfer-handler.ts
  • moonbeam-to-pendulum-handler.ts
  • balance.ts

5. Substrate API Timeout (register.actor.ts)

Added getApiWithTimeout for Substrate connections with 3 retry attempts and 15s timeout.

Behavior Comparison

Scenario Before After
Single RPC fails ❌ Operation fails ✅ Retries 3 times
First RPC slow, second fast ❌ Waits for timeout ✅ Switches immediately
All RPCs fail once ❌ Fails ✅ Retries with backoff
Contract revert ⚠️ Retries (wasteful) ✅ Fails immediately

Example Flow (2 RPCs, 4 attempts)

Attempt 1 → RPC1 fails  → switch immediately (no delay)
Attempt 2 → RPC2 fails  → switch immediately (first cycle done)
Attempt 3 → RPC1 fails  → wait ~500ms (second cycle, backoff)
Attempt 4 → RPC2 fails  → throw error

Configuration

// Default config
{
  maxAttempts: max(urls.length * 2, 3),  // At least 3 attempts
  initialDelayMs: 500,
  backoffMultiplier: 1.5,
  maxDelayMs: 10_000,
  delayOnFirstCycle: false  // Fast failover
}

Testing

  • ✅ Unit tests for transport creation and config
  • ✅ Integration tests with real RPC endpoints
  • ✅ Manual test script: bun packages/shared/src/services/evm/testRpcFallback.manual.ts

Files Changed

File Change
packages/shared/src/services/evm/smartFallbackTransport.ts New transport implementation
packages/shared/src/services/evm/clientManager.ts Use smart fallback for all networks
apps/frontend/src/wagmiConfig.ts Configure fallback transports for wallet
apps/frontend/src/machines/actors/register.actor.ts Add Substrate API timeout/retry
apps/api/.../moonbeam.controller.ts Use new sendTransaction
apps/api/.../monerium-onramp-self-transfer-handler.ts Use new methods
apps/api/.../moonbeam-to-pendulum-handler.ts Use new methods
packages/shared/src/services/evm/balance.ts Use new readContract

@netlify
Copy link

netlify bot commented Jan 2, 2026

Deploy Preview for vortexfi ready!

Name Link
🔨 Latest commit 5de191d
🔍 Latest deploy log https://app.netlify.com/projects/vortexfi/deploys/697353dbc9723d000809df9b
😎 Deploy Preview https://deploy-preview-991--vortexfi.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Jan 2, 2026

Deploy Preview for vortex-sandbox ready!

Name Link
🔨 Latest commit 5de191d
🔍 Latest deploy log https://app.netlify.com/projects/vortex-sandbox/deploys/697353db588cb1000881a015
😎 Deploy Preview https://deploy-preview-991--vortex-sandbox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Sharqiewicz Sharqiewicz requested review from ebma and gianfra-t January 14, 2026 14:20
@Sharqiewicz Sharqiewicz changed the title implement smartFallbackTransport ready: chore/implement smartFallbackTransport Jan 14, 2026
Copy link
Member

@ebma ebma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sharqiewicz did you test the things listed in the 'Test plan'? We need to make sure that this is not a downgrade, especially considering that previously we were shuffling the connections with a bit of randomness and now we are not anymore.

}

/**
* @deprecated Use readContract instead. Retry logic is now handled at transport level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is now deprecated, let's rather remove the whole function and adjust the existing calls to it to use the other function.

}

/**
* @deprecated Use sendTransaction instead. Retry logic is now handled at transport level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}

/**
* @deprecated Use sendRawTransaction instead. Retry logic is now handled at transport level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@Sharqiewicz Sharqiewicz changed the title ready: chore/implement smartFallbackTransport [READY] Chore: Add RPC Fallback and Retry Mechanism Jan 23, 2026
@Sharqiewicz Sharqiewicz changed the title [READY] Chore: Add RPC Fallback and Retry Mechanism [TESTING] Chore: Add RPC Fallback and Retry Mechanism Jan 23, 2026
Copy link
Contributor

@gianfra-t gianfra-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor of the fallback logic, our EvmClientManager was a bit harder to read.

There is mostly one thing I worry about this change, that may or may not be an issue but we should think it through a bit more. Currently we did not use the retry mechanism on every single RPC interaction we had, we left some with the default behavior of the client to avoid any type of double spending/ double logic execution, for example here, where we fund the ephemeral. So we picked manually which operations would be retriable by choosing or not the withRetry methods.

After this change, every interaction will by default use the retry mechanism and cycle all the defined rpc urls. This is fine for read, or pre-signed transactions but for the rest we need to be sure that there is no chance of a "false-negative" response from the RPCs, this is basically the assumption we were working with, I believe we have seen this in the past when the RPC times out responding.

We could also just pass a flag that enables/disables the retry mechanism so we can control where do we want to use it.

@ebma
Copy link
Member

ebma commented Feb 4, 2026

I agree @gianfra-t, it's risky to introduce the retry for non-presigned transactions, which we create on the fly. We could also think about adjusting the code so that the transaction creation vs submission for 'on-the-fly transactions' is split into two. We'd create a fixed transaction with a fixed nonce in the first phase, and then pass it for submission afterwards. The retry would then always try to submit the same transaction across the different RPCs and we are safe that it would only be applied once. The big downside I see here is that we want to be able to process multiple ramps in parallel and the transactions created might collide if we are not careful with the local nonce handling. We have to keep track of it and increase it before the transactions are applied on the network etc. But sometimes I'm also manually executing transactions on the funding account, swapping USDC -> BRLA for example, and this would mess up the local nonce handling and it somehow needs to recover. All in all, I think it's a slippery slope and difficult to do right. So your suggestion to just allow enabling/disabling the retry with a boolean sounds more reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants