fix: replace EVM reconnect loop with provider lifecycle by hmzakhalid · Pull Request #1376 · theinterfold/interfold

hmzakhalid · 2026-03-02T11:22:14Z

The EVM read interface was stuck in an infinite reconnect loop because it kept retrying RPC calls on a dead WebSocket transport before escalating to provider recreation.

Summary by CodeRabbit

New Features
- Pluggable provider factories for on-demand read-provider recreation.
- Builder API to attach per-chain provider factories.
- Public options to start read streams with or without a provider factory.
- New extractor callback type for log parsing.
Bug Fixes
- Automatic provider reconnection with exponential backoff and health checks.
- Unified live/backfill streaming with graceful recovery on transport failures.

…ycle

vercel · 2026-03-02T11:22:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
crisp	Ready	Preview, Comment	Mar 2, 2026 0:04am
enclave-docs	Ready	Preview, Comment	Mar 2, 2026 0:04am

coderabbitai · 2026-03-02T11:22:32Z

Warning

Rate limit exceeded

@hmzakhalid has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5e508 and d36aa11.

📒 Files selected for processing (1)

crates/evm/src/evm_read_interface.rs

📝 Walkthrough

Walkthrough

Adds a ProviderFactory abstraction and threads it through builders and the EVM reader; EvmReadInterface now accepts an optional provider factory and implements provider recreation with exponential backoff and transport-death recovery; builder APIs expose with_provider_factory to inject factories.

Changes

Cohort / File(s)	Summary
Provider helpers & parser `crates/evm/src/helpers.rs`, `crates/evm/src/evm_parser.rs`	Adds `ProviderFactory<P>` type alias and `ProviderConfig::into_read_provider_factory()`; introduces `ExtractorFn` alias and adjusts imports.
EVM read interface (lifecycle & reconnection) `crates/evm/src/evm_read_interface.rs`	Adds `provider_factory: Option<ProviderFactory<P>>`, `setup_with_factory` entrypoint, unified live/backfill loop, provider recreation/health-check helpers, and exponential backoff logic.
EVM system wiring / builder `crates/ciphernode-builder/src/evm_system.rs`, `crates/ciphernode-builder/src/ciphernode_builder.rs`	Adds `with_provider_factory(...)` to `EvmSystemChainBuilder`; imports `ProviderConfig`, converts it to a `ProviderFactory` via `into_read_provider_factory()` and passes it into the builder during setup.
Manifest `Cargo.toml`	Small dependency/manifest edits (lines changed +7/-1).

Sequence Diagram(s)

sequenceDiagram
  participant Builder as EvmSystemChainBuilder
  participant Factory as ProviderFactory
  participant ReadIF as EvmReadInterface
  participant Provider as EthProvider
  participant RPC as RPC/Transport

  Builder->>Factory: obtain factory (ProviderConfig.into_read_provider_factory)
  Factory->>Provider: async create EthProvider
  ReadIF->>Provider: perform backfill / subscribe to events
  Provider->>RPC: open transport / stream
  alt transport dies or healthcheck fails
    ReadIF->>Factory: call factory() to recreate provider
    Factory->>Provider: create new EthProvider
    ReadIF->>Provider: resume backfill/subscribe after backoff
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: sync mode preparation [skip-line-limit] #1153 — introduces and wires ProviderFactory into EvmReadInterface and builder setup paths.
feat: add chunked historical log fetching with RPC block range #1331 — overlaps provider recreation/backoff changes in the EVM reader.
Add Authentication Options for RPC Providers #191 — adds ProviderConfig/provider-creation APIs and into_read_provider_factory() used here.

Suggested labels

bug, ciphernode

Suggested reviewers

ctrlc03
ryardley

Poem

🐰 I nibble code and stitch a thread,
A factory wakes when transports are dead,
Backoff counted, providers born anew,
The reader hops on, steady and true —
Hooray for fixes, stitched in blue!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: replacing EVM reconnect loop logic with a provider lifecycle approach to fix the infinite reconnect issue.
Linked Issues check	✅ Passed	The PR directly addresses issue `#1375` by implementing provider lifecycle management with backoff and recreation logic to escape infinite reconnect loops on dead WebSocket transports.
Out of Scope Changes check	✅ Passed	All changes are scoped to provider lifecycle management and provider factory wiring across the EVM system, directly supporting the fix for the infinite reconnect loop issue.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/evm-pvdr-lifecycle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 232-239: The health check currently calls
provider.provider().get_block_number() to verify liveness but does not validate
chain identity; update the health check in the provider recreation block (where
provider, chain_id and RetryError::Retry are used) to also fetch the remote
chain ID (e.g., via provider.provider().get_chainid() or equivalent), compare it
to the expected chain_id, and if they differ log a warning including both IDs
and return a RetryError (similar to the existing health-check error) so the
recreated provider is rejected when the endpoint points to a different chain.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 080bb5a and 000b2e2.

📒 Files selected for processing (5)

crates/ciphernode-builder/src/ciphernode_builder.rs
crates/ciphernode-builder/src/evm_system.rs
crates/evm/src/evm_parser.rs
crates/evm/src/evm_read_interface.rs
crates/evm/src/helpers.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/evm/src/evm_read_interface.rs (1)

421-425: Add backoff on repeated live-stream end to avoid reconnect churn.

When stream.next() returns None, the code immediately re-enters backfill/resubscribe with no delay. If the server keeps closing the stream quickly, this can hot-loop and spam reconnect logs.

♻️ Suggested tweak

                                 None => {
                                     // Stream ended (server-side close, idle timeout, etc.)
-                                    // Loop back to backfill + resubscribe with no penalty.
+                                    // Loop back to backfill + resubscribe with backoff.
                                     warn!(chain_id, "Live event stream ended, will reconnect");
+                                    if sleep_or_shutdown(backoff.next_delay(), &mut shutdown).await {
+                                        return;
+                                    }
                                     break;
                                 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/evm/src/evm_read_interface.rs` around lines 421 - 425, When handling
the case where stream.next() returns None (the branch that currently logs
warn!(chain_id, "Live event stream ended, will reconnect") and break), add a
small exponential backoff with jitter before breaking/re-entering the
backfill/resubscribe loop to avoid tight reconnect churn; implement a backoff
counter (reset on successful connection) and use tokio::time::sleep with a
computed delay (e.g., min(max_delay, base * 2^attempt) ± jitter) before break so
repeated immediate closures are throttled while still allowing reconnection
attempts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 236-248: The recreation loop currently treats an unrecoverable
chain ID mismatch as RetryError::Failure but still continues the outer loop;
change the logic in the provider recreation path (the block that checks
new_chain_id = provider.chain_id() against chain_id and constructs the
RetryError::Failure) so that on an unrecoverable failure you call bus.err(...)
with a descriptive message (including expected vs actual chain IDs) and
immediately return None from the surrounding reader function instead of
continuing; preserve the existing retry behavior only for transient errors (keep
treating transient RetryError cases as retries), and apply the same change to
the other recreation block referenced around the 264-271 area to ensure both
chain-mismatch failures are terminal for this reader instance.

---

Nitpick comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 421-425: When handling the case where stream.next() returns None
(the branch that currently logs warn!(chain_id, "Live event stream ended, will
reconnect") and break), add a small exponential backoff with jitter before
breaking/re-entering the backfill/resubscribe loop to avoid tight reconnect
churn; implement a backoff counter (reset on successful connection) and use
tokio::time::sleep with a computed delay (e.g., min(max_delay, base * 2^attempt)
± jitter) before break so repeated immediate closures are throttled while still
allowing reconnection attempts.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 000b2e2 and 06b2786.

📒 Files selected for processing (2)

crates/ciphernode-builder/src/ciphernode_builder.rs
crates/evm/src/evm_read_interface.rs

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 280-287: recreate_provider returns None both on fatal errors and
when a graceful shutdown is in progress, so avoid emitting Evm errors during
shutdown: change the post-await check to call bus.err(EType::Evm, ...) only when
result.is_none() AND the shutdown signal has NOT been triggered (inspect the
shutdown value passed into recreate_provider — e.g., use shutdown.is_shutdown()
/ shutdown.is_cancelled() / !shutdown.is_closed() or the appropriate method for
that shutdown type); keep the existing return of result.
- Around line 420-424: The match arm handling None (the "Stream ended" case)
currently breaks straight back into the subscribe flow causing a tight reconnect
loop; change this to treat stream termination as a failure path by incrementing
a reconnect counter and applying exponential backoff before retrying, and when
the counter exceeds a threshold recreate the provider/client instance; implement
with tokio::time::sleep for delays, reset the counter on a successful
subscription, and replace the direct break in the None branch (the
warn!(chain_id, "Live event stream ended, will reconnect") location) with
logging that includes the attempt count and either waits (backoff) or triggers
provider recreation when the threshold is hit.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06b2786 and 6d5e508.

📒 Files selected for processing (1)

crates/evm/src/evm_read_interface.rs

fix: replace two-layer EVM reconnect loop with single provider lifec…

000b2e2

…ycle

hmzakhalid requested a review from ryardley March 2, 2026 11:22

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs

hmzakhalid changed the title ~~fix: replace EVM reconnect loop with provider lifecycle~~ fix: replace EVM reconnect loop with provider lifecycle Mar 2, 2026

fix: review comments

06b2786

hmzakhalid requested a review from ctrlc03 March 2, 2026 11:32

vercel Bot deployed to Preview – enclave-docs March 2, 2026 11:33 View deployment

vercel Bot deployed to Preview – crisp March 2, 2026 11:33 View deployment

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs Outdated

hmzakhalid changed the title ~~fix: replace EVM reconnect loop with provider lifecycle~~ fix: replace EVM reconnect loop with provider lifecycle Mar 2, 2026

fix: review comments

6d5e508

vercel Bot deployed to Preview – enclave-docs March 2, 2026 11:53 View deployment

vercel Bot deployed to Preview – crisp March 2, 2026 11:54 View deployment

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs

Comment thread crates/evm/src/evm_read_interface.rs

fix: review comments

d36aa11

vercel Bot deployed to Preview – enclave-docs March 2, 2026 12:04 View deployment

vercel Bot deployed to Preview – crisp March 2, 2026 12:04 View deployment

ctrlc03 reviewed Mar 2, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs

ctrlc03 approved these changes Mar 2, 2026

View reviewed changes

hmzakhalid merged commit cebc2dd into main Mar 2, 2026
26 checks passed

github-actions Bot deleted the fix/evm-pvdr-lifecycle branch March 10, 2026 03:09

Uh oh!

Conversation

hmzakhalid commented Mar 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmzakhalid commented Mar 2, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Mar 2, 2026 •

edited

Loading

coderabbitai Bot commented Mar 2, 2026 •

edited

Loading