fix: replace EVM reconnect loop with provider lifecycle#1376
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a ProviderFactory abstraction and threads it through builders and the EVM reader; EvmReadInterface now accepts an optional provider factory and implements provider recreation with exponential backoff and transport-death recovery; builder APIs expose with_provider_factory to inject factories. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as EvmSystemChainBuilder
participant Factory as ProviderFactory
participant ReadIF as EvmReadInterface
participant Provider as EthProvider
participant RPC as RPC/Transport
Builder->>Factory: obtain factory (ProviderConfig.into_read_provider_factory)
Factory->>Provider: async create EthProvider
ReadIF->>Provider: perform backfill / subscribe to events
Provider->>RPC: open transport / stream
alt transport dies or healthcheck fails
ReadIF->>Factory: call factory() to recreate provider
Factory->>Provider: create new EthProvider
ReadIF->>Provider: resume backfill/subscribe after backoff
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 232-239: The health check currently calls
provider.provider().get_block_number() to verify liveness but does not validate
chain identity; update the health check in the provider recreation block (where
provider, chain_id and RetryError::Retry are used) to also fetch the remote
chain ID (e.g., via provider.provider().get_chainid() or equivalent), compare it
to the expected chain_id, and if they differ log a warning including both IDs
and return a RetryError (similar to the existing health-check error) so the
recreated provider is rejected when the endpoint points to a different chain.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
crates/ciphernode-builder/src/ciphernode_builder.rscrates/ciphernode-builder/src/evm_system.rscrates/evm/src/evm_parser.rscrates/evm/src/evm_read_interface.rscrates/evm/src/helpers.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/evm/src/evm_read_interface.rs (1)
421-425: Add backoff on repeated live-stream end to avoid reconnect churn.When
stream.next()returnsNone, the code immediately re-enters backfill/resubscribe with no delay. If the server keeps closing the stream quickly, this can hot-loop and spam reconnect logs.♻️ Suggested tweak
None => { // Stream ended (server-side close, idle timeout, etc.) - // Loop back to backfill + resubscribe with no penalty. + // Loop back to backfill + resubscribe with backoff. warn!(chain_id, "Live event stream ended, will reconnect"); + if sleep_or_shutdown(backoff.next_delay(), &mut shutdown).await { + return; + } break; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/evm/src/evm_read_interface.rs` around lines 421 - 425, When handling the case where stream.next() returns None (the branch that currently logs warn!(chain_id, "Live event stream ended, will reconnect") and break), add a small exponential backoff with jitter before breaking/re-entering the backfill/resubscribe loop to avoid tight reconnect churn; implement a backoff counter (reset on successful connection) and use tokio::time::sleep with a computed delay (e.g., min(max_delay, base * 2^attempt) ± jitter) before break so repeated immediate closures are throttled while still allowing reconnection attempts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 236-248: The recreation loop currently treats an unrecoverable
chain ID mismatch as RetryError::Failure but still continues the outer loop;
change the logic in the provider recreation path (the block that checks
new_chain_id = provider.chain_id() against chain_id and constructs the
RetryError::Failure) so that on an unrecoverable failure you call bus.err(...)
with a descriptive message (including expected vs actual chain IDs) and
immediately return None from the surrounding reader function instead of
continuing; preserve the existing retry behavior only for transient errors (keep
treating transient RetryError cases as retries), and apply the same change to
the other recreation block referenced around the 264-271 area to ensure both
chain-mismatch failures are terminal for this reader instance.
---
Nitpick comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 421-425: When handling the case where stream.next() returns None
(the branch that currently logs warn!(chain_id, "Live event stream ended, will
reconnect") and break), add a small exponential backoff with jitter before
breaking/re-entering the backfill/resubscribe loop to avoid tight reconnect
churn; implement a backoff counter (reset on successful connection) and use
tokio::time::sleep with a computed delay (e.g., min(max_delay, base * 2^attempt)
± jitter) before break so repeated immediate closures are throttled while still
allowing reconnection attempts.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/ciphernode-builder/src/ciphernode_builder.rscrates/evm/src/evm_read_interface.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 280-287: recreate_provider returns None both on fatal errors and
when a graceful shutdown is in progress, so avoid emitting Evm errors during
shutdown: change the post-await check to call bus.err(EType::Evm, ...) only when
result.is_none() AND the shutdown signal has NOT been triggered (inspect the
shutdown value passed into recreate_provider — e.g., use shutdown.is_shutdown()
/ shutdown.is_cancelled() / !shutdown.is_closed() or the appropriate method for
that shutdown type); keep the existing return of result.
- Around line 420-424: The match arm handling None (the "Stream ended" case)
currently breaks straight back into the subscribe flow causing a tight reconnect
loop; change this to treat stream termination as a failure path by incrementing
a reconnect counter and applying exponential backoff before retrying, and when
the counter exceeds a threshold recreate the provider/client instance; implement
with tokio::time::sleep for delays, reset the counter on a successful
subscription, and replace the direct break in the None branch (the
warn!(chain_id, "Live event stream ended, will reconnect") location) with
logging that includes the attempt count and either waits (backoff) or triggers
provider recreation when the threshold is hit.
closes #1375
The EVM read interface was stuck in an infinite reconnect loop because it kept retrying RPC calls on a dead WebSocket transport before escalating to provider recreation.
Summary by CodeRabbit
New Features
Bug Fixes