Skip to content

fix: replace EVM reconnect loop with provider lifecycle#1376

Merged
hmzakhalid merged 4 commits into
mainfrom
fix/evm-pvdr-lifecycle
Mar 2, 2026
Merged

fix: replace EVM reconnect loop with provider lifecycle#1376
hmzakhalid merged 4 commits into
mainfrom
fix/evm-pvdr-lifecycle

Conversation

@hmzakhalid

@hmzakhalid hmzakhalid commented Mar 2, 2026

Copy link
Copy Markdown
Collaborator

closes #1375

The EVM read interface was stuck in an infinite reconnect loop because it kept retrying RPC calls on a dead WebSocket transport before escalating to provider recreation.

Summary by CodeRabbit

  • New Features

    • Pluggable provider factories for on-demand read-provider recreation.
    • Builder API to attach per-chain provider factories.
    • Public options to start read streams with or without a provider factory.
    • New extractor callback type for log parsing.
  • Bug Fixes

    • Automatic provider reconnection with exponential backoff and health checks.
    • Unified live/backfill streaming with graceful recovery on transport failures.

@vercel

vercel Bot commented Mar 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Mar 2, 2026 0:04am
enclave-docs Ready Ready Preview, Comment Mar 2, 2026 0:04am

Request Review

@hmzakhalid hmzakhalid requested a review from ryardley March 2, 2026 11:22
@coderabbitai

coderabbitai Bot commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@hmzakhalid has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5e508 and d36aa11.

📒 Files selected for processing (1)
  • crates/evm/src/evm_read_interface.rs
📝 Walkthrough

Walkthrough

Adds a ProviderFactory abstraction and threads it through builders and the EVM reader; EvmReadInterface now accepts an optional provider factory and implements provider recreation with exponential backoff and transport-death recovery; builder APIs expose with_provider_factory to inject factories.

Changes

Cohort / File(s) Summary
Provider helpers & parser
crates/evm/src/helpers.rs, crates/evm/src/evm_parser.rs
Adds ProviderFactory<P> type alias and ProviderConfig::into_read_provider_factory(); introduces ExtractorFn alias and adjusts imports.
EVM read interface (lifecycle & reconnection)
crates/evm/src/evm_read_interface.rs
Adds provider_factory: Option<ProviderFactory<P>>, setup_with_factory entrypoint, unified live/backfill loop, provider recreation/health-check helpers, and exponential backoff logic.
EVM system wiring / builder
crates/ciphernode-builder/src/evm_system.rs, crates/ciphernode-builder/src/ciphernode_builder.rs
Adds with_provider_factory(...) to EvmSystemChainBuilder; imports ProviderConfig, converts it to a ProviderFactory via into_read_provider_factory() and passes it into the builder during setup.
Manifest
Cargo.toml
Small dependency/manifest edits (lines changed +7/-1).

Sequence Diagram(s)

sequenceDiagram
  participant Builder as EvmSystemChainBuilder
  participant Factory as ProviderFactory
  participant ReadIF as EvmReadInterface
  participant Provider as EthProvider
  participant RPC as RPC/Transport

  Builder->>Factory: obtain factory (ProviderConfig.into_read_provider_factory)
  Factory->>Provider: async create EthProvider
  ReadIF->>Provider: perform backfill / subscribe to events
  Provider->>RPC: open transport / stream
  alt transport dies or healthcheck fails
    ReadIF->>Factory: call factory() to recreate provider
    Factory->>Provider: create new EthProvider
    ReadIF->>Provider: resume backfill/subscribe after backoff
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

bug, ciphernode

Suggested reviewers

  • ctrlc03
  • ryardley

Poem

🐰 I nibble code and stitch a thread,
A factory wakes when transports are dead,
Backoff counted, providers born anew,
The reader hops on, steady and true —
Hooray for fixes, stitched in blue!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing EVM reconnect loop logic with a provider lifecycle approach to fix the infinite reconnect issue.
Linked Issues check ✅ Passed The PR directly addresses issue #1375 by implementing provider lifecycle management with backoff and recreation logic to escape infinite reconnect loops on dead WebSocket transports.
Out of Scope Changes check ✅ Passed All changes are scoped to provider lifecycle management and provider factory wiring across the EVM system, directly supporting the fix for the infinite reconnect loop issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/evm-pvdr-lifecycle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 232-239: The health check currently calls
provider.provider().get_block_number() to verify liveness but does not validate
chain identity; update the health check in the provider recreation block (where
provider, chain_id and RetryError::Retry are used) to also fetch the remote
chain ID (e.g., via provider.provider().get_chainid() or equivalent), compare it
to the expected chain_id, and if they differ log a warning including both IDs
and return a RetryError (similar to the existing health-check error) so the
recreated provider is rejected when the endpoint points to a different chain.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 080bb5a and 000b2e2.

📒 Files selected for processing (5)
  • crates/ciphernode-builder/src/ciphernode_builder.rs
  • crates/ciphernode-builder/src/evm_system.rs
  • crates/evm/src/evm_parser.rs
  • crates/evm/src/evm_read_interface.rs
  • crates/evm/src/helpers.rs

Comment thread crates/evm/src/evm_read_interface.rs
@hmzakhalid hmzakhalid changed the title fix: replace EVM reconnect loop with provider lifecycle fix: replace EVM reconnect loop with provider lifecycle Mar 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/evm/src/evm_read_interface.rs (1)

421-425: Add backoff on repeated live-stream end to avoid reconnect churn.

When stream.next() returns None, the code immediately re-enters backfill/resubscribe with no delay. If the server keeps closing the stream quickly, this can hot-loop and spam reconnect logs.

♻️ Suggested tweak
                                 None => {
                                     // Stream ended (server-side close, idle timeout, etc.)
-                                    // Loop back to backfill + resubscribe with no penalty.
+                                    // Loop back to backfill + resubscribe with backoff.
                                     warn!(chain_id, "Live event stream ended, will reconnect");
+                                    if sleep_or_shutdown(backoff.next_delay(), &mut shutdown).await {
+                                        return;
+                                    }
                                     break;
                                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/evm/src/evm_read_interface.rs` around lines 421 - 425, When handling
the case where stream.next() returns None (the branch that currently logs
warn!(chain_id, "Live event stream ended, will reconnect") and break), add a
small exponential backoff with jitter before breaking/re-entering the
backfill/resubscribe loop to avoid tight reconnect churn; implement a backoff
counter (reset on successful connection) and use tokio::time::sleep with a
computed delay (e.g., min(max_delay, base * 2^attempt) ± jitter) before break so
repeated immediate closures are throttled while still allowing reconnection
attempts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 236-248: The recreation loop currently treats an unrecoverable
chain ID mismatch as RetryError::Failure but still continues the outer loop;
change the logic in the provider recreation path (the block that checks
new_chain_id = provider.chain_id() against chain_id and constructs the
RetryError::Failure) so that on an unrecoverable failure you call bus.err(...)
with a descriptive message (including expected vs actual chain IDs) and
immediately return None from the surrounding reader function instead of
continuing; preserve the existing retry behavior only for transient errors (keep
treating transient RetryError cases as retries), and apply the same change to
the other recreation block referenced around the 264-271 area to ensure both
chain-mismatch failures are terminal for this reader instance.

---

Nitpick comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 421-425: When handling the case where stream.next() returns None
(the branch that currently logs warn!(chain_id, "Live event stream ended, will
reconnect") and break), add a small exponential backoff with jitter before
breaking/re-entering the backfill/resubscribe loop to avoid tight reconnect
churn; implement a backoff counter (reset on successful connection) and use
tokio::time::sleep with a computed delay (e.g., min(max_delay, base * 2^attempt)
± jitter) before break so repeated immediate closures are throttled while still
allowing reconnection attempts.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 000b2e2 and 06b2786.

📒 Files selected for processing (2)
  • crates/ciphernode-builder/src/ciphernode_builder.rs
  • crates/evm/src/evm_read_interface.rs

Comment thread crates/evm/src/evm_read_interface.rs Outdated
@hmzakhalid hmzakhalid changed the title fix: replace EVM reconnect loop with provider lifecycle fix: replace EVM reconnect loop with provider lifecycle Mar 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/evm_read_interface.rs`:
- Around line 280-287: recreate_provider returns None both on fatal errors and
when a graceful shutdown is in progress, so avoid emitting Evm errors during
shutdown: change the post-await check to call bus.err(EType::Evm, ...) only when
result.is_none() AND the shutdown signal has NOT been triggered (inspect the
shutdown value passed into recreate_provider — e.g., use shutdown.is_shutdown()
/ shutdown.is_cancelled() / !shutdown.is_closed() or the appropriate method for
that shutdown type); keep the existing return of result.
- Around line 420-424: The match arm handling None (the "Stream ended" case)
currently breaks straight back into the subscribe flow causing a tight reconnect
loop; change this to treat stream termination as a failure path by incrementing
a reconnect counter and applying exponential backoff before retrying, and when
the counter exceeds a threshold recreate the provider/client instance; implement
with tokio::time::sleep for delays, reset the counter on a successful
subscription, and replace the direct break in the None branch (the
warn!(chain_id, "Live event stream ended, will reconnect") location) with
logging that includes the attempt count and either waits (backoff) or triggers
provider recreation when the threshold is hit.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06b2786 and 6d5e508.

📒 Files selected for processing (1)
  • crates/evm/src/evm_read_interface.rs

Comment thread crates/evm/src/evm_read_interface.rs
Comment thread crates/evm/src/evm_read_interface.rs
Comment thread crates/evm/src/evm_read_interface.rs
@hmzakhalid hmzakhalid merged commit cebc2dd into main Mar 2, 2026
26 checks passed
@github-actions github-actions Bot deleted the fix/evm-pvdr-lifecycle branch March 10, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EVM reader gets stuck in infinite reconnect loop on dead WebSocket transport

2 participants