Skip to content

fix: prevent SyncEnded dedup that blocks live EVM events after restart#1343

Merged
ryardley merged 3 commits into
mainfrom
hmza/fix-sync-dedup
Feb 18, 2026
Merged

fix: prevent SyncEnded dedup that blocks live EVM events after restart#1343
ryardley merged 3 commits into
mainfrom
hmza/fix-sync-dedup

Conversation

@hmzakhalid

@hmzakhalid hmzakhalid commented Feb 17, 2026

Copy link
Copy Markdown
Collaborator

On restart, the sync process replayed persisted infrastructure events (SyncEnded, EffectsEnabled) from the EventStore back through the EventBus. Since these events have no fields, they always produce the same EventId hash, poisoning the bloom filter and causing the freshly published SyncEnded to be silently dropped. With the EvmChainGateway never receiving SyncEnded, it remained stuck in BufferUntilLive and all live EVM events were silently buffered rather than dispatched.

Summary by CodeRabbit

  • Bug Fixes

    • Infrastructure events no longer trigger deduplication issues during event replay, ensuring proper re-publication.
  • Refactor

    • Simplified network interface shutdown handling.
    • Enhanced sync event tracking with correlation IDs for improved traceability.
  • Tests

    • Added tests validating infrastructure event detection and filtering during replay.

@vercel

vercel Bot commented Feb 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Feb 18, 2026 7:33am
enclave-docs Ready Ready Preview, Comment Feb 18, 2026 7:33am

Request Review

@coderabbitai

coderabbitai Bot commented Feb 17, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

These changes refactor event handling and network interface logic: adding infrastructure event filtering during EventStore replay in the sync module, removing internal shutdown state management from the network interface, and extending event structures with correlation ID tracking fields.

Changes

Cohort / File(s) Summary
Infrastructure Event Filtering
crates/sync/src/sync.rs
Introduces is_infrastructure_event helper to identify and skip infrastructure events (SyncEnded, EffectsEnabled, HistoricalEvmSyncStart) during EventStore replay, prevents deduplication issues, adds explicit SyncEnded publication at process end, includes validation and filtering tests.
Network Interface Shutdown Refactor
crates/net/src/net_interface.rs
Removes Arc shutdown flag state management; updates process_swarm_event, process_swarm_command, and handle_shutdown signatures to remove shutdown_flag parameter; changes process_swarm_command return type from Result to Result<()>; consolidates shutdown logic to break loop directly on shutdown command.
Event Structure Enhancements
crates/events/src/enclave_event/enable_effects.rs, crates/events/src/enclave_event/sync_end.rs
Converts EffectsEnabled and SyncEnded from unit-like structs to structs containing pub correlation_id: CorrelationId field; updates constructors and documentation; enables correlation tracking for these infrastructure events.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • ctrlc03
  • ryardley

Poem

🐰 Infrastructure events now skip the replay dance,
Correlation IDs track their advance,
Shutdown flags fade like morning dew,
Clean refactored flows breaking through! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: prevent SyncEnded dedup that blocks live EVM events after restart' directly and specifically addresses the core issue being fixed: preventing deduplication of SyncEnded events that was blocking live EVM events after restart. The title accurately summarizes the main change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hmza/fix-sync-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/sync/src/sync.rs (1)

26-33: Add HistoricalNetSyncStart to the filter when re-enabling net sync.

The three variants (SyncEnded, EffectsEnabled, HistoricalEvmSyncStart) map exactly to the events re-published at steps 10, 8, and 5 respectively — this is correct for the current codebase state.

However, the commented-out step 6 block (lines 97–105) would re-publish HistoricalNetSyncStart. When that code is re-enabled, HistoricalNetSyncStart must be added here or the same bloom-filter poisoning will recur for net-sync events.

♻️ Suggested addition when re-enabling net sync
 fn is_infrastructure_event(event: &EnclaveEvent) -> bool {
     matches!(
         event.get_data(),
         EnclaveEventData::SyncEnded(_)
             | EnclaveEventData::EffectsEnabled(_)
             | EnclaveEventData::HistoricalEvmSyncStart(_)
+            | EnclaveEventData::HistoricalNetSyncStart(_)
     )
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sync/src/sync.rs` around lines 26 - 33, The is_infrastructure_event
filter currently matches EnclaveEventData::SyncEnded, ::EffectsEnabled, and
::HistoricalEvmSyncStart but omits ::HistoricalNetSyncStart; update the matches
in the is_infrastructure_event function to include
EnclaveEventData::HistoricalNetSyncStart so net-sync events are treated as
infrastructure events (preventing bloom-filter poisoning when the commented
net-sync re-publish block is re-enabled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/sync/src/sync.rs`:
- Around line 26-33: The is_infrastructure_event filter currently matches
EnclaveEventData::SyncEnded, ::EffectsEnabled, and ::HistoricalEvmSyncStart but
omits ::HistoricalNetSyncStart; update the matches in the
is_infrastructure_event function to include
EnclaveEventData::HistoricalNetSyncStart so net-sync events are treated as
infrastructure events (preventing bloom-filter poisoning when the commented
net-sync re-publish block is re-enabled).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/net/src/net_interface.rs (1)

154-175: ⚠️ Potential issue | 🔴 Critical

Ensure start returns Result<()> after the new break.

Because of the new break on Line 155, the loop no longer diverges, so start now ends with () instead of Result<()>. Add an explicit Ok(()) after the loop (and terminate the loop statement) or break Ok(()) to keep the return type correct.

✅ Suggested fix
-        loop {
+        loop {
             if shutdown_flag.load(Ordering::Relaxed) {
                 break;
             }
             select! {
                 // Process commands
                 Some(command) = cmd_rx.recv() => {
                     match process_swarm_command(&mut self.swarm, &event_tx, &shutdown_flag, &mut correlator, command).await {
                         Ok(_) => (),
                         Err(e) => error!("Error processing NetCommand: {e}")
                     }
                 }
                 // Process events
                 event = self.swarm.select_next_some() =>  {
                     match process_swarm_event(&mut self.swarm, &event_tx, &mut correlator, event).await {
                         Ok(_) => (),
                         Err(e) => error!("Error processing NetEvent: {e}")
                     }
                 }
             }
-        }
+        };
+        Ok(())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/net/src/net_interface.rs` around lines 154 - 175, The loop in start
now can exit via the new break when shutdown_flag is set, so the function's
return type must still be Result<()>; update the loop exit to return Ok(()) by
either using break Ok(()) instead of break or by adding an explicit Ok(()) after
the loop. Locate the start function and the loop that checks shutdown_flag (and
references process_swarm_command/process_swarm_event, cmd_rx, event_tx,
correlator) and ensure the function returns Ok(()) to match Result<()> (e.g.,
change the break to break Ok(()) or add return Ok(()) immediately after the
loop).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/net/src/net_interface.rs`:
- Around line 154-175: The loop in start now can exit via the new break when
shutdown_flag is set, so the function's return type must still be Result<()>;
update the loop exit to return Ok(()) by either using break Ok(()) instead of
break or by adding an explicit Ok(()) after the loop. Locate the start function
and the loop that checks shutdown_flag (and references
process_swarm_command/process_swarm_event, cmd_rx, event_tx, correlator) and
ensure the function returns Ok(()) to match Result<()> (e.g., change the break
to break Ok(()) or add return Ok(()) immediately after the loop).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/sync/src/sync.rs (1)

26-33: Consider including HistoricalNetSyncStart for completeness.

HistoricalNetSyncStart is structurally identical to HistoricalEvmSyncStart — it also carries a channel sender that becomes stale across restarts. While net sync is currently disabled, if re-enabled, a stored HistoricalNetSyncStart event would replay a dead channel reference. Adding it here prevents a future footgun at zero cost.

♻️ Proposed addition
 fn is_infrastructure_event(event: &EnclaveEvent) -> bool {
     matches!(
         event.get_data(),
         EnclaveEventData::SyncEnded(_)
             | EnclaveEventData::EffectsEnabled(_)
             | EnclaveEventData::HistoricalEvmSyncStart(_)
+            | EnclaveEventData::HistoricalNetSyncStart(_)
     )
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sync/src/sync.rs` around lines 26 - 33, The is_infrastructure_event
function currently matches EnclaveEventData::HistoricalEvmSyncStart but omits
HistoricalNetSyncStart; update the matches in is_infrastructure_event to also
include EnclaveEventData::HistoricalNetSyncStart so that those events are
treated as infrastructure events (this mirrors the existing handling of
HistoricalEvmSyncStart and prevents replaying stale channel senders).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sync/src/sync.rs`:
- Around line 68-74: The comment mentions "steps 5, 8, 10" but there is no "//
10." marker and SyncEnded is published after the "// 9." block, so update the
comment to accurately reference the actual step labels or add a "// 10." label
where SyncEnded is published; specifically, edit the block that lists the
skipped events (SyncEnded, EffectsEnabled, HistoricalEvmSyncStart) to either
change "10" to "9" (or the correct number) or add a corresponding "// 10."
comment before the SyncEnded publish to keep the step mapping consistent with
the code (refer to the SyncEnded symbol and the surrounding "// 5." and "// 8."
markers to align numbering).

---

Nitpick comments:
In `@crates/sync/src/sync.rs`:
- Around line 26-33: The is_infrastructure_event function currently matches
EnclaveEventData::HistoricalEvmSyncStart but omits HistoricalNetSyncStart;
update the matches in is_infrastructure_event to also include
EnclaveEventData::HistoricalNetSyncStart so that those events are treated as
infrastructure events (this mirrors the existing handling of
HistoricalEvmSyncStart and prevents replaying stale channel senders).

Comment thread crates/sync/src/sync.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/events/src/enclave_event/mod.rs (1)

431-441: Duplicated cause_fingerprint construction — consider extracting a helper.

The identical tuple construction appears in both from_error (lines 431–440) and new_with_timestamp (lines 630–639). A small helper (e.g., on EventContext) would eliminate the duplication and ensure both paths stay in sync if the fingerprint definition evolves.

♻️ Suggested helper extraction

Add a method on EventContext<Sequenced>:

impl EventContext<Sequenced> {
    pub fn fingerprint(&self) -> (EventId, EventId, EventId, u128, Option<u64>, EventSource) {
        (
            self.id(),
            self.origin_id(),
            self.causation_id(),
            self.ts(),
            self.block(),
            self.source(),
        )
    }
}

Then in both call sites:

-        let cause_fingerprint = caused_by.as_ref().map(|cause| {
-            (
-                cause.id(),
-                cause.origin_id(),
-                cause.causation_id(),
-                cause.ts(),
-                cause.block(),
-                cause.source(),
-            )
-        });
+        let cause_fingerprint = caused_by.as_ref().map(|cause| cause.fingerprint());

Also applies to: 630-640

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/enclave_event/mod.rs` around lines 431 - 441, Duplicate
tuple construction for the cause fingerprint is used in from_error and
new_with_timestamp; extract a helper on EventContext<Sequenced> (e.g.,
fingerprint or cause_fingerprint) that returns the tuple (EventId, EventId,
EventId, u128, Option<u64>, EventSource) using self.id(), self.origin_id(),
self.causation_id(), self.ts(), self.block(), self.source(), then replace the
inline caused_by.as_ref().map(|cause| ( ... )) and the corresponding
construction in new_with_timestamp with a call to that helper before passing it
to EventId::hash so both code paths use the same fingerprint logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/sync/src/sync.rs`:
- Around line 68-78: Update the explanatory comment for the infrastructure-event
skip in the for loop that uses is_infrastructure_event(&event): replace the
parenthetical "(payload hash)" with a brief note that EventId now includes
timestamp and cause context (so replayed and newly published events won't
necessarily produce identical hashes), but keep the rationale that skipping
these events remains a defensive measure to avoid deduplication/bloom-filter
issues (e.g., SyncEnded). Reference the loop over events and the
is_infrastructure_event function and the EventId change in mod.rs when making
the wording clearer.

---

Nitpick comments:
In `@crates/events/src/enclave_event/mod.rs`:
- Around line 431-441: Duplicate tuple construction for the cause fingerprint is
used in from_error and new_with_timestamp; extract a helper on
EventContext<Sequenced> (e.g., fingerprint or cause_fingerprint) that returns
the tuple (EventId, EventId, EventId, u128, Option<u64>, EventSource) using
self.id(), self.origin_id(), self.causation_id(), self.ts(), self.block(),
self.source(), then replace the inline caused_by.as_ref().map(|cause| ( ... ))
and the corresponding construction in new_with_timestamp with a call to that
helper before passing it to EventId::hash so both code paths use the same
fingerprint logic.

Comment thread crates/events/src/enclave_event/mod.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crates/events/src/enclave_event/sync_end.rs (1)

19-25: Consider implementing Default via new()

Same as EffectsEnabled: a no-arg new() should have a matching Default implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/enclave_event/sync_end.rs` around lines 19 - 25, The
SyncEnded struct has a no-arg constructor SyncEnded::new() but no Default impl;
add impl Default for SyncEnded that simply calls SyncEnded::new() so callers can
use default(), mirroring the pattern used for EffectsEnabled. Locate the
SyncEnded struct and its impl block containing pub fn new() -> Self and add an
impl Default for SyncEnded { fn default() -> Self { Self::new() } } to provide
the standard Default behavior.
crates/events/src/enclave_event/enable_effects.rs (1)

19-25: Consider implementing Default via new()

Since new() takes no arguments, the idiomatic Rust convention is to also provide impl Default delegating to it. This enables use in generic bounds (T: Default), struct field defaults, and ..Default::default() spread syntax.

♻️ Proposed addition
+impl Default for EffectsEnabled {
+    fn default() -> Self {
+        Self::new()
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/enclave_event/enable_effects.rs` around lines 19 - 25, Add
an impl Default for EffectsEnabled that delegates to EffectsEnabled::new();
specifically implement pub fn default() -> Self { Self::new() } so callers can
rely on the Default trait (e.g., generic bounds, ..Default::default() and struct
field defaults) while reusing the existing new() constructor; modify/locate the
impl for EffectsEnabled (the new() method) to add this Default implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/events/src/enclave_event/enable_effects.rs`:
- Around line 13-17: EffectsEnabled and SyncEnded were added with a new
correlation_id field but lack serde defaults, causing old empty-event
deserialization to fail; implement Default for CorrelationId (returning a
nil/zero value or provide a helper fn that creates a placeholder) and add
#[serde(default)] to the correlation_id field in both the EffectsEnabled and
SyncEnded structs so old persisted empty objects deserialize to a placeholder
CorrelationId and the existing filtering in sync.rs can handle them; update
correlation_id.rs with the Default impl (or helper) and then add
#[serde(default)] to the correlation_id field declarations in enable_effects.rs
and sync_end.rs.

In `@crates/events/src/enclave_event/sync_end.rs`:
- Around line 13-17: Add serde backward-compatibility to the SyncEnded event by
marking its correlation_id field with a serde default so old persisted events
without that field can deserialize; update the SyncEnded struct (symbol:
SyncEnded) to apply #[serde(default)] or #[serde(default =
"CorrelationId::new")] to the correlation_id field (same approach used for
EffectsEnabled) so deserialization of legacy events succeeds and the sync filter
in sync.rs can run.

---

Nitpick comments:
In `@crates/events/src/enclave_event/enable_effects.rs`:
- Around line 19-25: Add an impl Default for EffectsEnabled that delegates to
EffectsEnabled::new(); specifically implement pub fn default() -> Self {
Self::new() } so callers can rely on the Default trait (e.g., generic bounds,
..Default::default() and struct field defaults) while reusing the existing new()
constructor; modify/locate the impl for EffectsEnabled (the new() method) to add
this Default implementation.

In `@crates/events/src/enclave_event/sync_end.rs`:
- Around line 19-25: The SyncEnded struct has a no-arg constructor
SyncEnded::new() but no Default impl; add impl Default for SyncEnded that simply
calls SyncEnded::new() so callers can use default(), mirroring the pattern used
for EffectsEnabled. Locate the SyncEnded struct and its impl block containing
pub fn new() -> Self and add an impl Default for SyncEnded { fn default() ->
Self { Self::new() } } to provide the standard Default behavior.

Comment thread crates/events/src/enclave_event/enable_effects.rs
Comment thread crates/events/src/enclave_event/sync_end.rs

@ryardley ryardley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@ryardley ryardley merged commit c205e60 into main Feb 18, 2026
27 checks passed
@github-actions github-actions Bot deleted the hmza/fix-sync-dedup branch February 26, 2026 03:13
@coderabbitai coderabbitai Bot mentioned this pull request Mar 3, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants