[TESTING] Chore: Add RPC Fallback and Retry Mechanism#991
[TESTING] Chore: Add RPC Fallback and Retry Mechanism#991Sharqiewicz wants to merge 7 commits intostagingfrom
Conversation
✅ Deploy Preview for vortexfi ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for vortex-sandbox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
ebma
left a comment
There was a problem hiding this comment.
@Sharqiewicz did you test the things listed in the 'Test plan'? We need to make sure that this is not a downgrade, especially considering that previously we were shuffling the connections with a bit of randomness and now we are not anymore.
| } | ||
|
|
||
| /** | ||
| * @deprecated Use readContract instead. Retry logic is now handled at transport level. |
There was a problem hiding this comment.
If this is now deprecated, let's rather remove the whole function and adjust the existing calls to it to use the other function.
| } | ||
|
|
||
| /** | ||
| * @deprecated Use sendTransaction instead. Retry logic is now handled at transport level. |
| } | ||
|
|
||
| /** | ||
| * @deprecated Use sendRawTransaction instead. Retry logic is now handled at transport level. |
gianfra-t
left a comment
There was a problem hiding this comment.
Thanks for the refactor of the fallback logic, our EvmClientManager was a bit harder to read.
There is mostly one thing I worry about this change, that may or may not be an issue but we should think it through a bit more. Currently we did not use the retry mechanism on every single RPC interaction we had, we left some with the default behavior of the client to avoid any type of double spending/ double logic execution, for example here, where we fund the ephemeral. So we picked manually which operations would be retriable by choosing or not the withRetry methods.
After this change, every interaction will by default use the retry mechanism and cycle all the defined rpc urls. This is fine for read, or pre-signed transactions but for the rest we need to be sure that there is no chance of a "false-negative" response from the RPCs, this is basically the assumption we were working with, I believe we have seen this in the past when the RPC times out responding.
We could also just pass a flag that enables/disables the retry mechanism so we can control where do we want to use it.
|
I agree @gianfra-t, it's risky to introduce the retry for non-presigned transactions, which we create on the fly. We could also think about adjusting the code so that the transaction creation vs submission for 'on-the-fly transactions' is split into two. We'd create a fixed transaction with a fixed nonce in the first phase, and then pass it for submission afterwards. The retry would then always try to submit the same transaction across the different RPCs and we are safe that it would only be applied once. The big downside I see here is that we want to be able to process multiple ramps in parallel and the transactions created might collide if we are not careful with the local nonce handling. We have to keep track of it and increase it before the transactions are applied on the network etc. But sometimes I'm also manually executing transactions on the funding account, swapping USDC -> BRLA for example, and this would mess up the local nonce handling and it somehow needs to recover. All in all, I think it's a slippery slope and difficult to do right. So your suggestion to just allow enabling/disabling the retry with a boolean sounds more reasonable to me. |
Summary
This PR introduces a robust RPC fallback and retry mechanism for all EVM interactions, improving reliability when RPC endpoints are temporarily unavailable or rate-limited.
Why We Implemented This
Problem: RPC endpoints can fail intermittently due to:
Previously, a single RPC failure would cause the entire operation to fail, requiring manual retry by the user or degraded UX.
What's Changed
1. New
smartFallbackTransport(packages/shared)A custom viem transport that provides:
2. Updated
clientManager.ts""preserved as fallback to viem's default RPC3. Updated
wagmiConfig.ts(Frontend)4. Updated API Handlers
Migrated from old
*WithRetrymethods to new transport-level retry:moonbeam.controller.tsmonerium-onramp-self-transfer-handler.tsmoonbeam-to-pendulum-handler.tsbalance.ts5. Substrate API Timeout (register.actor.ts)
Added
getApiWithTimeoutfor Substrate connections with 3 retry attempts and 15s timeout.Behavior Comparison
Example Flow (2 RPCs, 4 attempts)
Configuration
Testing
bun packages/shared/src/services/evm/testRpcFallback.manual.tsFiles Changed
packages/shared/src/services/evm/smartFallbackTransport.tspackages/shared/src/services/evm/clientManager.tsapps/frontend/src/wagmiConfig.tsapps/frontend/src/machines/actors/register.actor.tsapps/api/.../moonbeam.controller.tssendTransactionapps/api/.../monerium-onramp-self-transfer-handler.tsapps/api/.../moonbeam-to-pendulum-handler.tspackages/shared/src/services/evm/balance.tsreadContract