feat: add chunked historical log fetching with RPC block range by hmzakhalid · Pull Request #1331 · theinterfold/interfold

hmzakhalid · 2026-02-16T05:39:08Z

Closes #1330
After wiping node data (rm -rf .config/enclave/.enclave/data), nodes fail to synchronize historical blockchain events and are never selected for E3 committees. The node appears to start successfully but silently drops all historical state, resulting in zero sortition participation.

The EvmReadInterface makes a single unbounded eth_getLogs call from deploy_block to latest. Most RPC providers limit eth_getLogs to few thousand blocks, causing the call to fail for long block range and an early return that killed both historical sync AND live event subscription. Without historical events, ticketPrice stayed at 0 so all nodes got 0 available tickets and sortition selected nobody.

So to fix

Split historical fetch into chunks, If a chunk fails, we retry it 3 times with backoff. If it still fails after that, we log it and do not move on, A node that can't get historical data cannot participate going forward since it wont have the state of other ciphernodes
WebSocket auto-reconnect with exponential backoff instead of just dying.
Replaced the hard 30 second timeout with an idle check loop. For large historical syncs this means we won't falsely timeout while actively syncing a lot of data.

Summary by CodeRabbit

Bug Fixes
- More robust EVM sync initialization: buffered handling of late historical-sync events, safer state transitions, and warning logs.
- Improved historical collection: heartbeat-based progress reporting, resilient handling of channel closures, and removal of the outer timeout wrapper.
New Features
- Chunked historical log fetching with per-chunk retries, gap backfilling, and resumable live subscription with exponential reconnect backoff.
- Start-point-aware historical fetches and a timestamp cache to reduce provider calls.
- New pluggable log-fetching abstraction and added runtime logs for selection and sync progress.
Tests
- Comprehensive tests covering chunked fetch, retries, backfill, and log processing.

vercel · 2026-02-16T05:39:12Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
crisp	Ready	Preview, Comment	Feb 17, 2026 4:17am
enclave-docs	Ready	Preview, Comment	Feb 17, 2026 4:17am

coderabbitai · 2026-02-16T05:39:35Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Restructures EVM ingestion and sync lifecycle: richer SyncStatus with Init buffering and BufferUntilLive/Live states; introduces chunked, retrying log fetcher and live subscribe with backfill and reconnect/backoff; removes outer timeout from historical collection; adds selection logging in sortition; adds tokio dev-dep and new internal log_fetcher module.

Changes

Cohort / File(s)	Summary
EVM Sync State & Gateway `crates/evm/src/evm_chain_gateway.rs`	SyncStatus changed to `Init { buffer, pending_sync_complete }`, added `BufferUntilLive` and `Live`. `forward_to_sync_actor` returns `(buffer, pending_sync_complete)`. HistoricalSyncComplete can be buffered during Init (warn logged) and processed after draining buffer.
EVM Read Interface & Filters `crates/evm/src/evm_read_interface.rs`	Added `start_block` to `Filters`; replaced inline historical+live logic with chunked historical fetch (`fetch_logs_chunked`) and `subscribe_live_events` (reconnect/backoff, backfill_to_head, shutdown handling); emits `HistoricalSyncComplete`.
Log fetcher subsystem `crates/evm/src/log_fetcher.rs`, `crates/evm/src/lib.rs`	New `log_fetcher` module: `LogProvider` trait, `fetch_logs_chunked`, `backfill_to_head`, `process_log`, `TimestampTracker`, per-chunk retries and exponential backoff, and unit tests. `lib.rs` registers module.
Sync collection loop `crates/sync/src/sync.rs`	Removed outer `max_dur` timeout from `collect_historical_evm_events`; now uses a 30s progress interval and per-iteration recv timeout with heartbeat/progress logs. Signature changed (dropped Duration param).
Sortition logging `crates/sortition/src/backends.rs`, `crates/sortition/src/sortition.rs`	Clone `e3_id` to avoid moves and add informational logs for selected nodes and selection decisions; behavior unchanged.
Dev/test deps `crates/evm/Cargo.toml`	Added `tokio` as a dev-dependency (`features = ["test-util"]`) for tests.

Sequence Diagram

sequenceDiagram
    participant Reader as EVM Read Interface / LogFetcher
    participant Bus as Event Bus
    participant Gateway as EVM Chain Gateway
    participant SyncActor as Sync Actor

    Reader->>Reader: fetch_logs_chunked(start..head) (chunked, retries)
    loop per chunk
        Reader->>Bus: emit EnclaveEvent<Unsequenced>
    end
    Reader->>Gateway: HistoricalSyncComplete(last_id, latest_block)

    alt Gateway in Init
        Gateway->>Gateway: store pending_sync_complete (warn)
        Gateway->>Gateway: buffer incoming events
    else Gateway forwarded to SyncActor
        Gateway->>SyncActor: forward_to_sync_actor(buffer, pending_sync_complete)
        SyncActor-->>Gateway: ack/processing
        Gateway->>Gateway: transition -> BufferUntilLive
        SyncActor->>Bus: process/drain buffered events
    end

    Gateway->>Gateway: transition BufferUntilLive -> Live
    Reader->>Reader: subscribe_live_events (backoff, backfill_to_head)
    loop live logs
        Reader->>Bus: emit live EnclaveEvent<Unsequenced>
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: sync mode preparation [skip-line-limit] #1153: Overlaps changes to EvmChainGateway and SyncStatus buffering/forwarding logic.
Historical evm events #173: Related refactor of EVM historical/live fetch and subscription logic (log fetching/backfill).
feat: ticket score sortition #698: Related changes to sortition selection and logging in the same files.

Suggested reviewers

ctrlc03
ryardley
0xjei

Poem

🐇
I nibble chunks of logs at dawn and dusk,
Retry my hops when networks fuss and cuss.
I buffer, warn, then pass the carrot on —
Live streams bloom and old gaps are gone.
Hooray for synced hops! 🥕

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main feature: adding chunked historical log fetching with RPC block range support, which directly aligns with the core problem and solution described in the PR objectives.
Linked Issues check	✅ Passed	The changes comprehensively address issue `#1330` by implementing chunked log fetching, retry logic, WebSocket auto-reconnect with backoff, and improved timeout handling to ensure nodes can synchronize historical events and participate in committee selection.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the historical event synchronization issue: chunked fetching in log_fetcher.rs, enhanced sync status handling in evm_chain_gateway.rs, improved timeout logic in sync.rs, and supporting refactors in evm_read_interface.rs and sortition modules.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/chunked-historical-event-fetch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/sync/src/sync.rs (1)

115-158: ⚠️ Potential issue | 🟠 Major

The progress loop will spin indefinitely if a chain's historical fetch fails and the sender is never dropped.

If fetch_historical_logs_chunked fails (line 27 of evm_read_interface.rs), stream_from_evm returns early without sending HistoricalSyncComplete. This means the sender held by EvmChainGateway is never dropped—it's only released when buffer_until_live() is called in response to HistoricalSyncComplete (line 195 of evm_chain_gateway.rs). As a result, collect_historical_evm_events will timeout every 30 seconds indefinitely, logging "Still waiting for historical events from chains" repeatedly, with no upper bound on total wait time and no way to detect that a chain will never report. The sync process hangs silently rather than failing.

Add a maximum total timeout or a mechanism for failed chains to explicitly signal the collector (e.g., send a sentinel/error variant), to guarantee that sync() eventually unblocks and returns an error.

🤖 Fix all issues with AI agents

In `@crates/evm/src/evm_read_interface.rs`:
- Around line 258-299: The historical fetch (fetch_historical_logs_chunked)
returns last_id but the live subscription (subscribe_live_events) starts from
Latest, creating a gap for blocks mined between fetch completion and
subscription; modify the flow so subscribe_live_events is given the last fetched
block (last_id) and uses from_block = last_id + 1 (or re-fetch a small
overlapping range and deduplicate) — pass last_id (or computed latest_block)
into subscribe_live_events and update its logic to accept a starting
BlockNumber/NumberOrTag and start the live subscription from that block instead
of BlockNumberOrTag::Latest.

🧹 Nitpick comments (6)

crates/sortition/src/backends.rs (1)

206-219: Consider extracting the duplicated logging block.

Lines 167-178 and 208-219 are identical. A small helper (e.g., log_sortition_result) taking &e3_id, chain_id, size, and &winners would eliminate the duplication. Not urgent given only two call sites.
crates/sync/src/sync.rs (1)
125-153: Duplicate or unexpected messages from the same chain_id are silently discarded.

Line 128 checks !received.contains(&msg.chain_id), which correctly deduplicates. However, if an unexpected chain_id arrives (not in expected), it is also silently dropped. This is fine for correctness but worth noting for debuggability — a debug! or warn! log for unexpected chain IDs would help diagnose misconfiguration.
🔧 Optional: log unexpected chain IDs
             if expected.contains(&msg.chain_id) && !received.contains(&msg.chain_id) {
                 info!(
                     chain_id = msg.chain_id,
                     events = msg.events.len(),
                     chains_received = received.len() + 1,
                     chains_expected = expected.len(),
                     "Received historical events from chain"
                 );
                 received.insert(msg.chain_id);
                 results.append(&mut msg.events);
+            } else if !expected.contains(&msg.chain_id) {
+                warn!(chain_id = msg.chain_id, "Received events from unexpected chain, ignoring");
             }
crates/evm/src/evm_read_interface.rs (2)
301-368: Reconnection backoff sleep is not interruptible by shutdown.

On lines 314-319, the reconnect sleep can last up to 60 seconds. During this time, the shutdown signal (checked on line 322 via try_recv) is not polled. This means a shutdown request could be delayed by up to a minute.
🔧 Use `select!` to make sleep interruptible
         if reconnect_attempt > 0 {
             let delay_secs = (2u64.pow(reconnect_attempt.min(6))).min(MAX_RECONNECT_DELAY_SECS);
             warn!(
                 chain_id,
                 reconnect_attempt, delay_secs, "Reconnecting to live event stream"
             );
-            tokio::time::sleep(std::time::Duration::from_secs(delay_secs)).await;
-        }
-
-        if shutdown.try_recv().is_ok() {
-            info!("Shutdown signal received, stopping EVM stream");
-            return;
+            select! {
+                _ = tokio::time::sleep(std::time::Duration::from_secs(delay_secs)) => {}
+                _ = &mut *shutdown => {
+                    info!("Shutdown signal received during reconnect backoff, stopping EVM stream");
+                    return;
+                }
+            }
+        } else if shutdown.try_recv().is_ok() {
+            info!("Shutdown signal received, stopping EVM stream");
+            return;
         }
27-29: Constants are reasonable defaults; consider making them configurable.

GET_LOGS_CHUNK_SIZE = 10_000 is a safe default for most RPC providers (many limit to 10k blocks). However, some providers have lower limits (e.g., 2,000 blocks) while others support much higher ranges. Making the chunk size configurable (e.g., via environment variable or config) would improve flexibility without code changes.

This is non-blocking — the current defaults are sensible for the immediate fix.
crates/evm/src/evm_chain_gateway.rs (2)
184-197: Second HistoricalSyncComplete during Init silently overwrites the first.

If forward_historical_sync_complete is called twice while still in Init, line 195 overwrites pending_sync_complete with the new event, losing the first. Since each EvmChainGateway should be per-chain and receive only one HistoricalSyncComplete, this shouldn't happen in practice — but a debug assertion or warning would make the invariant explicit and aid future debugging.
🔧 Optional: warn on unexpected duplicate
         if let SyncStatus::Init {
             pending_sync_complete,
             ..
         } = &mut self.status
         {
+            if pending_sync_complete.is_some() {
+                warn!(
+                    chain_id = event.chain_id,
+                    "Received duplicate HistoricalSyncComplete during Init, overwriting"
+                );
+            }
             warn!(
                 chain_id = event.chain_id,
                 "HistoricalSyncComplete arrived during Init, buffering"
             );
             *pending_sync_complete = Some(event);
             return Ok(());
         }
273-380: Test does not cover the new Init-buffering scenario.

The existing test verifies the happy path (SyncStart → events → SyncComplete → SyncEnded). Consider adding a test where HistoricalSyncComplete arrives before HistoricalEvmSyncStart to exercise the new pending_sync_complete buffering path — this is the race condition that motivated the state machine change.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@crates/evm/src/evm_read_interface.rs`:
- Line 342: The subscription with provider.subscribe_logs(filter) won't backfill
when using Geth because fromBlock is ignored; after the match that creates the
subscription, call provider.get_logs or provider.request("eth_getLogs", ...) to
fetch logs from (result.latest_block + 1) through the current chain head (use
provider.get_block_number()/block_number) and feed those logs into the same
processing/dedup pipeline (respecting the existing ids: HashSet<EventId>), then
continue consuming the live subscription; ensure you compute the range using
result.latest_block and the current block number and merge results so overlaps
are deduplicated by ids.

🧹 Nitpick comments (1)

crates/evm/src/evm_read_interface.rs (1)

376-379: bus.err() called on every reconnect failure may flood downstream.

Each failed subscription attempt emits an error to the bus. If the provider is unavailable for an extended period, this generates a steady stream of error events at exponential-backoff intervals. Consider emitting the error only on the first failure and on periodic intervals, or after a maximum retry threshold is reached.

ryardley

Added a couple of ideas. I think as this gets more complex we should have a test around this so we are all clear on the behavior. I think it would be better if we can avoid passing pending_sync_complete as this is leaking state information outside of the reader.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@crates/evm/src/log_fetcher.rs`:
- Around line 144-186: The current backfill_to_head updates *last_block only on
full success, causing already-sent events to be re-fetched if fetch_logs_chunked
fails mid-way; change the flow so progress is committed incrementally: modify
fetch_logs_chunked (or add a new helper) to return the highest block it
successfully processed (or accept &mut last_block and update it after each
successfully-sent chunk), then in backfill_to_head use that returned
highest_processed_block to set *last_block even on partial success so
already-delivered chunks are not re-sent; reference functions/vars:
backfill_to_head, fetch_logs_chunked, *last_block, and next (EvmEventProcessor)
when locating where to update progress.

🧹 Nitpick comments (2)

crates/evm/src/log_fetcher.rs (2)
35-41: Silent failure when block timestamp fetch fails.

fetch_block_timestamp swallows errors via .ok() with no logging. Combined with TimestampTracker::get using unwrap_or(0) (line 210), a failed timestamp lookup silently produces 0 — which could confuse downstream consumers expecting real timestamps.
♻️ Add a warning when timestamp fetch fails
     async fn fetch_block_timestamp(&self, block_number: u64) -> Option<u64> {
-        self.get_block_by_number(block_number.into())
+        match self.get_block_by_number(block_number.into())
             .await
-            .ok()
-            .flatten()
-            .map(|b| b.header.timestamp)
+        {
+            Ok(Some(b)) => Some(b.header.timestamp),
+            Ok(None) => {
+                warn!(block_number, "Block not found when fetching timestamp");
+                None
+            }
+            Err(e) => {
+                warn!(block_number, error = %e, "Failed to fetch block timestamp");
+                None
+            }
+        }
     }
264-283: Consider asserting on the filter ranges passed to the mock.

The MockLogProvider::fetch_logs ignores the filter parameter, so tests don't verify that the correct block ranges are passed for each chunk. A recording mock that captures (from_block, to_block) per call would catch chunk-boundary regressions.

hmzakhalid added 3 commits February 16, 2026 10:08

feat: add chunked historical log fetching with RPC block range support

d8b9529

chore: add sortition logging

ce8b8f3

fix: crash on failure

894e9ea

hmzakhalid requested review from ctrlc03 and ryardley February 16, 2026 05:39

hmzakhalid self-assigned this Feb 16, 2026

hmzakhalid added bug Something isn't working ciphernode Related to the ciphernode package labels Feb 16, 2026

hmzakhalid linked an issue Feb 16, 2026 that may be closed by this pull request

Investigate Sortition Selection issue #1330

Closed

Merge branch 'main' into fix/chunked-historical-event-fetch

ba3d042

vercel Bot deployed to Preview – enclave-docs February 16, 2026 05:40 View deployment

vercel Bot deployed to Preview – crisp February 16, 2026 05:40 View deployment

coderabbitai Bot reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs Outdated

Comment thread crates/evm/src/evm_read_interface.rs

fix: review comments

51c9293

vercel Bot deployed to Preview – enclave-docs February 16, 2026 06:02 View deployment

vercel Bot deployed to Preview – crisp February 16, 2026 06:03 View deployment

coderabbitai Bot reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs

ctrlc03 linked an issue Feb 16, 2026 that may be closed by this pull request

Issue with sortition, 1 out of 3 nodes get selected #1309

Closed

ryardley reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/evm/src/evm_chain_gateway.rs

ryardley reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/evm/src/evm_read_interface.rs Outdated

ryardley reviewed Feb 16, 2026

View reviewed changes

ctrlc03 mentioned this pull request Feb 16, 2026

feat: integrate share proofs in ciphernode [skip-line-limit] #1334

Merged

hmzakhalid and others added 2 commits February 16, 2026 22:23

feat: log fetcher unit tests

ac22e4a

Merge branch 'main' into fix/chunked-historical-event-fetch

60136e3

vercel Bot deployed to Preview – enclave-docs February 16, 2026 17:31 View deployment

vercel Bot deployed to Preview – crisp February 16, 2026 17:32 View deployment

coderabbitai Bot reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/evm/src/log_fetcher.rs

fix: partial fetch log suceeding

59a0534

vercel Bot deployed to Preview – enclave-docs February 16, 2026 18:17 View deployment

vercel Bot deployed to Preview – crisp February 16, 2026 18:17 View deployment

hmzakhalid requested a review from ryardley February 17, 2026 04:12

Merge branch 'main' into fix/chunked-historical-event-fetch

c0d22fd

vercel Bot deployed to Preview – enclave-docs February 17, 2026 04:17 View deployment

vercel Bot deployed to Preview – crisp February 17, 2026 04:17 View deployment

ryardley approved these changes Feb 17, 2026

View reviewed changes

hmzakhalid merged commit 689e56c into main Feb 17, 2026
26 checks passed

ctrlc03 deleted the fix/chunked-historical-event-fetch branch February 17, 2026 10:31

coderabbitai Bot mentioned this pull request Mar 2, 2026

fix: replace EVM reconnect loop with provider lifecycle #1376

Merged

Uh oh!

Conversation

hmzakhalid commented Feb 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryardley left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmzakhalid commented Feb 16, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Feb 16, 2026 •

edited

Loading

coderabbitai Bot commented Feb 16, 2026 •

edited

Loading