fix: replace panic crash in snapshot buffer with do_send by hmzakhalid · Pull Request #1430 · theinterfold/interfold

hmzakhalid · 2026-03-16T17:32:20Z

We faced the receiver issue again on the the aggregator side and it crashes during startup sync with:

MAJOR ISSUE: Failure!.
The error supplied was: send failed because receiver is full
As a precaution we are crashing the system.

When the node starts and replays historical EVM events the snapshot buffer pipeline overwhelms downstream actor mailboxes, since they all use try_send() wrapped in trap, when any mailbox fills up, the try_send returns SendError::Full, trap catches it and panics, crashing the entire process.

The trap + PanicDispatcher does not work well here since this is not a real error condition, it's transient backpressure during burst load. The data is valid, the actors are healthy, they just can't drain fast enough during sync replay and mailbox limit.

Summary by CodeRabbit

New Features
- Added asynchronous event dispatch with error propagation.
Improvements
- Streamlined event batching and routing for improved reliability.
- Simplified event sequencer message handling with configurable mailbox capacity.
- Enhanced historical event publishing with async dispatch semantics.

vercel · 2026-03-16T17:32:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
crisp	Ready	Preview, Comment	Mar 17, 2026 1:43am
enclave-docs	Ready	Preview, Comment	Mar 17, 2026 1:43am

coderabbitai · 2026-03-16T17:32:34Z

📝 Walkthrough

Walkthrough

This PR introduces async event dispatch capabilities and refactors event handling to remove trap-based error patterns. Key changes include adding naked_dispatch_async to BusHandle for asynchronous event forwarding, configuring mailbox capacity in the Sequencer, and replacing error-handling wrappers with direct do_send calls throughout snapshot buffer components. The sync module is updated to use awaited async dispatch for historical event publishing.

Changes

Cohort / File(s)	Summary
Event Bus & Dispatch `crates/events/src/bus_handle.rs`	Added public async method `naked_dispatch_async` that forwards events to sequencer via `send` and awaits the result with error propagation, complementing the fire-and-forget `naked_dispatch`.
Sequencer Actor Setup `crates/events/src/sequencer.rs`	Added started hook that initializes mailbox capacity to `MAILBOX_LIMIT` and imported the constant from `e3_utils`.
Snapshot Buffer Refactoring `crates/events/src/snapshot_buffer/batch.rs`, `crates/events/src/snapshot_buffer/batch_router.rs`, `crates/events/src/snapshot_buffer/snapshot_buffer.rs`, `crates/events/src/snapshot_buffer/timelock_queue.rs`	Replaced trap-based error handling and `try_send` patterns with unconditional `do_send` calls. Removed `PanicDispatcher` and `EType` imports. Simplified control flow in Insert/EnclaveEvent/FlushSeq handlers; BatchRouter now handles missing prev_agg with error logging rather than panic traps; added trace logging for Insert forwarding.
Sync Module `crates/sync/src/sync.rs`	Changed historical event publishing from synchronous `naked_dispatch` (fire-and-forget) to asynchronous `naked_dispatch_async` with awaited results and error propagation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat!: implement simple fake sequencer [skip-line-limit] #1073: Modifies BusHandle dispatching variants and Sequencer actor in the same events subsystem.
fix: dht pruning & mailbox removal #1351: Replaces fallible actix message-sending patterns with do_send and touches Sequencer mailbox capacity setup.
fix: prevent SyncEnded dedup that blocks live EVM events after restart #1343: Modifies the replay/publish flow in crates/sync/src/sync.rs with changes to event filtering in historical event processing.

Suggested reviewers

ryardley
ctrlc03

Poem

🐰 Async hops across the event bus so fast,
Traps removed, do_sends unsurpassed!
Sequencers spin with mailbox grace,
History publishes at measured pace—
Awaiting results in the async embrace! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: replace panic crash in snapshot buffer with do_send' directly and clearly summarizes the main change: replacing panic-based error handling with do_send calls throughout the snapshot buffer pipeline to avoid crashes during burst load.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/batch-rtr-backpressure

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

🧹 Nitpick comments (1)

crates/events/src/snapshot_buffer/timelock_queue.rs (1)
161-167: Verify: Popped timelocks cannot be retried if do_send silently drops.

The timelock is popped from the heap before do_send. If the BatchRouter mailbox were full, do_send would silently drop the FlushSeq and the batch would never flush.

However, this is an acceptable trade-off because:

BatchRouter has MAILBOX_LIMIT (256) capacity

The previous try_send + trap approach crashed the entire system

During normal operation, having 256+ pending flush messages is unlikely

The current approach prevents crashes at the cost of a theoretical (but highly improbable) silent drop scenario.
💡 Optional: Add trace logging for observability
         while self.timelocks.len() > 0 && self.next_timelock_lt(now_time) {
             if let Some(tl) = self.timelocks.pop() {
                 let seq = tl.0.seq;
                 debug!("Flushing seq {}", seq);
                 self.batch_router.do_send(FlushSeq(seq));
             }
         }
Consider adding a trace log after do_send to aid debugging if flush issues arise in production.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/snapshot_buffer/timelock_queue.rs` around lines 161 - 167,
The loop currently pops timelocks and calls
self.batch_router.do_send(FlushSeq(seq)) which can silently drop messages; keep
the current pop-and-fire behavior but add an observability trace: after calling
do_send in the while loop inside timelock processing (references: timelocks,
next_timelock_lt, batch_router, do_send, FlushSeq, MAILBOX_LIMIT) emit a
trace/debug log that records the seq being flushed and current timelocks length
(and optionally note MAILBOX_LIMIT) so dropped flushes are discoverable in logs
without changing the send semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/events/src/snapshot_buffer/timelock_queue.rs`:
- Around line 161-167: The loop currently pops timelocks and calls
self.batch_router.do_send(FlushSeq(seq)) which can silently drop messages; keep
the current pop-and-fire behavior but add an observability trace: after calling
do_send in the while loop inside timelock processing (references: timelocks,
next_timelock_lt, batch_router, do_send, FlushSeq, MAILBOX_LIMIT) emit a
trace/debug log that records the seq being flushed and current timelocks length
(and optionally note MAILBOX_LIMIT) so dropped flushes are discoverable in logs
without changing the send semantics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 383c1baf-0d78-4869-b9f1-7d239e995f0f

📥 Commits

Reviewing files that changed from the base of the PR and between e06bcbc and e48db65.

📒 Files selected for processing (7)

crates/events/src/bus_handle.rs
crates/events/src/sequencer.rs
crates/events/src/snapshot_buffer/batch.rs
crates/events/src/snapshot_buffer/batch_router.rs
crates/events/src/snapshot_buffer/snapshot_buffer.rs
crates/events/src/snapshot_buffer/timelock_queue.rs
crates/sync/src/sync.rs

fix: replace panic in snapshot buffer

e48db65

hmzakhalid requested a review from ryardley March 16, 2026 17:32

hmzakhalid changed the title ~~fix: replace panic crash in snapshot buffer with do_send~~ fix: replace panic crash in snapshot buffer with do_send Mar 16, 2026

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Merge branch 'main' into fix/batch-rtr-backpressure

f28a2ba

ryardley approved these changes Mar 17, 2026

View reviewed changes

vercel Bot deployed to Preview – enclave-docs March 17, 2026 01:42 View deployment

vercel Bot deployed to Preview – crisp March 17, 2026 01:43 View deployment

ryardley enabled auto-merge (squash) March 17, 2026 02:06

ryardley merged commit b6d3cd0 into main Mar 17, 2026
27 checks passed

github-actions Bot deleted the fix/batch-rtr-backpressure branch March 24, 2026 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: replace panic crash in snapshot buffer with do_send#1430

fix: replace panic crash in snapshot buffer with do_send#1430
ryardley merged 2 commits into
mainfrom
fix/batch-rtr-backpressure

hmzakhalid commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hmzakhalid commented Mar 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmzakhalid commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Mar 16, 2026 •

edited

Loading

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading