fix: prevent SyncEnded dedup that blocks live EVM events after restart#1343
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThese changes refactor event handling and network interface logic: adding infrastructure event filtering during EventStore replay in the sync module, removing internal shutdown state management from the network interface, and extending event structures with correlation ID tracking fields. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/sync/src/sync.rs (1)
26-33: AddHistoricalNetSyncStartto the filter when re-enabling net sync.The three variants (
SyncEnded,EffectsEnabled,HistoricalEvmSyncStart) map exactly to the events re-published at steps 10, 8, and 5 respectively — this is correct for the current codebase state.However, the commented-out step 6 block (lines 97–105) would re-publish
HistoricalNetSyncStart. When that code is re-enabled,HistoricalNetSyncStartmust be added here or the same bloom-filter poisoning will recur for net-sync events.♻️ Suggested addition when re-enabling net sync
fn is_infrastructure_event(event: &EnclaveEvent) -> bool { matches!( event.get_data(), EnclaveEventData::SyncEnded(_) | EnclaveEventData::EffectsEnabled(_) | EnclaveEventData::HistoricalEvmSyncStart(_) + | EnclaveEventData::HistoricalNetSyncStart(_) ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sync/src/sync.rs` around lines 26 - 33, The is_infrastructure_event filter currently matches EnclaveEventData::SyncEnded, ::EffectsEnabled, and ::HistoricalEvmSyncStart but omits ::HistoricalNetSyncStart; update the matches in the is_infrastructure_event function to include EnclaveEventData::HistoricalNetSyncStart so net-sync events are treated as infrastructure events (preventing bloom-filter poisoning when the commented net-sync re-publish block is re-enabled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/sync/src/sync.rs`:
- Around line 26-33: The is_infrastructure_event filter currently matches
EnclaveEventData::SyncEnded, ::EffectsEnabled, and ::HistoricalEvmSyncStart but
omits ::HistoricalNetSyncStart; update the matches in the
is_infrastructure_event function to include
EnclaveEventData::HistoricalNetSyncStart so net-sync events are treated as
infrastructure events (preventing bloom-filter poisoning when the commented
net-sync re-publish block is re-enabled).
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/net/src/net_interface.rs (1)
154-175:⚠️ Potential issue | 🔴 CriticalEnsure
startreturnsResult<()>after the newbreak.Because of the new
breakon Line 155, the loop no longer diverges, sostartnow ends with()instead ofResult<()>. Add an explicitOk(())after the loop (and terminate the loop statement) orbreak Ok(())to keep the return type correct.✅ Suggested fix
- loop { + loop { if shutdown_flag.load(Ordering::Relaxed) { break; } select! { // Process commands Some(command) = cmd_rx.recv() => { match process_swarm_command(&mut self.swarm, &event_tx, &shutdown_flag, &mut correlator, command).await { Ok(_) => (), Err(e) => error!("Error processing NetCommand: {e}") } } // Process events event = self.swarm.select_next_some() => { match process_swarm_event(&mut self.swarm, &event_tx, &mut correlator, event).await { Ok(_) => (), Err(e) => error!("Error processing NetEvent: {e}") } } } - } + }; + Ok(())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/net/src/net_interface.rs` around lines 154 - 175, The loop in start now can exit via the new break when shutdown_flag is set, so the function's return type must still be Result<()>; update the loop exit to return Ok(()) by either using break Ok(()) instead of break or by adding an explicit Ok(()) after the loop. Locate the start function and the loop that checks shutdown_flag (and references process_swarm_command/process_swarm_event, cmd_rx, event_tx, correlator) and ensure the function returns Ok(()) to match Result<()> (e.g., change the break to break Ok(()) or add return Ok(()) immediately after the loop).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/net/src/net_interface.rs`:
- Around line 154-175: The loop in start now can exit via the new break when
shutdown_flag is set, so the function's return type must still be Result<()>;
update the loop exit to return Ok(()) by either using break Ok(()) instead of
break or by adding an explicit Ok(()) after the loop. Locate the start function
and the loop that checks shutdown_flag (and references
process_swarm_command/process_swarm_event, cmd_rx, event_tx, correlator) and
ensure the function returns Ok(()) to match Result<()> (e.g., change the break
to break Ok(()) or add return Ok(()) immediately after the loop).
2779934 to
709fbfc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/sync/src/sync.rs (1)
26-33: Consider includingHistoricalNetSyncStartfor completeness.
HistoricalNetSyncStartis structurally identical toHistoricalEvmSyncStart— it also carries a channel sender that becomes stale across restarts. While net sync is currently disabled, if re-enabled, a storedHistoricalNetSyncStartevent would replay a dead channel reference. Adding it here prevents a future footgun at zero cost.♻️ Proposed addition
fn is_infrastructure_event(event: &EnclaveEvent) -> bool { matches!( event.get_data(), EnclaveEventData::SyncEnded(_) | EnclaveEventData::EffectsEnabled(_) | EnclaveEventData::HistoricalEvmSyncStart(_) + | EnclaveEventData::HistoricalNetSyncStart(_) ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sync/src/sync.rs` around lines 26 - 33, The is_infrastructure_event function currently matches EnclaveEventData::HistoricalEvmSyncStart but omits HistoricalNetSyncStart; update the matches in is_infrastructure_event to also include EnclaveEventData::HistoricalNetSyncStart so that those events are treated as infrastructure events (this mirrors the existing handling of HistoricalEvmSyncStart and prevents replaying stale channel senders).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sync/src/sync.rs`:
- Around line 68-74: The comment mentions "steps 5, 8, 10" but there is no "//
10." marker and SyncEnded is published after the "// 9." block, so update the
comment to accurately reference the actual step labels or add a "// 10." label
where SyncEnded is published; specifically, edit the block that lists the
skipped events (SyncEnded, EffectsEnabled, HistoricalEvmSyncStart) to either
change "10" to "9" (or the correct number) or add a corresponding "// 10."
comment before the SyncEnded publish to keep the step mapping consistent with
the code (refer to the SyncEnded symbol and the surrounding "// 5." and "// 8."
markers to align numbering).
---
Nitpick comments:
In `@crates/sync/src/sync.rs`:
- Around line 26-33: The is_infrastructure_event function currently matches
EnclaveEventData::HistoricalEvmSyncStart but omits HistoricalNetSyncStart;
update the matches in is_infrastructure_event to also include
EnclaveEventData::HistoricalNetSyncStart so that those events are treated as
infrastructure events (this mirrors the existing handling of
HistoricalEvmSyncStart and prevents replaying stale channel senders).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/events/src/enclave_event/mod.rs (1)
431-441: Duplicatedcause_fingerprintconstruction — consider extracting a helper.The identical tuple construction appears in both
from_error(lines 431–440) andnew_with_timestamp(lines 630–639). A small helper (e.g., onEventContext) would eliminate the duplication and ensure both paths stay in sync if the fingerprint definition evolves.♻️ Suggested helper extraction
Add a method on
EventContext<Sequenced>:impl EventContext<Sequenced> { pub fn fingerprint(&self) -> (EventId, EventId, EventId, u128, Option<u64>, EventSource) { ( self.id(), self.origin_id(), self.causation_id(), self.ts(), self.block(), self.source(), ) } }Then in both call sites:
- let cause_fingerprint = caused_by.as_ref().map(|cause| { - ( - cause.id(), - cause.origin_id(), - cause.causation_id(), - cause.ts(), - cause.block(), - cause.source(), - ) - }); + let cause_fingerprint = caused_by.as_ref().map(|cause| cause.fingerprint());Also applies to: 630-640
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/events/src/enclave_event/mod.rs` around lines 431 - 441, Duplicate tuple construction for the cause fingerprint is used in from_error and new_with_timestamp; extract a helper on EventContext<Sequenced> (e.g., fingerprint or cause_fingerprint) that returns the tuple (EventId, EventId, EventId, u128, Option<u64>, EventSource) using self.id(), self.origin_id(), self.causation_id(), self.ts(), self.block(), self.source(), then replace the inline caused_by.as_ref().map(|cause| ( ... )) and the corresponding construction in new_with_timestamp with a call to that helper before passing it to EventId::hash so both code paths use the same fingerprint logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/sync/src/sync.rs`:
- Around line 68-78: Update the explanatory comment for the infrastructure-event
skip in the for loop that uses is_infrastructure_event(&event): replace the
parenthetical "(payload hash)" with a brief note that EventId now includes
timestamp and cause context (so replayed and newly published events won't
necessarily produce identical hashes), but keep the rationale that skipping
these events remains a defensive measure to avoid deduplication/bloom-filter
issues (e.g., SyncEnded). Reference the loop over events and the
is_infrastructure_event function and the EventId change in mod.rs when making
the wording clearer.
---
Nitpick comments:
In `@crates/events/src/enclave_event/mod.rs`:
- Around line 431-441: Duplicate tuple construction for the cause fingerprint is
used in from_error and new_with_timestamp; extract a helper on
EventContext<Sequenced> (e.g., fingerprint or cause_fingerprint) that returns
the tuple (EventId, EventId, EventId, u128, Option<u64>, EventSource) using
self.id(), self.origin_id(), self.causation_id(), self.ts(), self.block(),
self.source(), then replace the inline caused_by.as_ref().map(|cause| ( ... ))
and the corresponding construction in new_with_timestamp with a call to that
helper before passing it to EventId::hash so both code paths use the same
fingerprint logic.
c29a942 to
dfa0cac
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
crates/events/src/enclave_event/sync_end.rs (1)
19-25: Consider implementingDefaultvianew()Same as
EffectsEnabled: a no-argnew()should have a matchingDefaultimplementation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/events/src/enclave_event/sync_end.rs` around lines 19 - 25, The SyncEnded struct has a no-arg constructor SyncEnded::new() but no Default impl; add impl Default for SyncEnded that simply calls SyncEnded::new() so callers can use default(), mirroring the pattern used for EffectsEnabled. Locate the SyncEnded struct and its impl block containing pub fn new() -> Self and add an impl Default for SyncEnded { fn default() -> Self { Self::new() } } to provide the standard Default behavior.crates/events/src/enclave_event/enable_effects.rs (1)
19-25: Consider implementingDefaultvianew()Since
new()takes no arguments, the idiomatic Rust convention is to also provideimpl Defaultdelegating to it. This enables use in generic bounds (T: Default), struct field defaults, and..Default::default()spread syntax.♻️ Proposed addition
+impl Default for EffectsEnabled { + fn default() -> Self { + Self::new() + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/events/src/enclave_event/enable_effects.rs` around lines 19 - 25, Add an impl Default for EffectsEnabled that delegates to EffectsEnabled::new(); specifically implement pub fn default() -> Self { Self::new() } so callers can rely on the Default trait (e.g., generic bounds, ..Default::default() and struct field defaults) while reusing the existing new() constructor; modify/locate the impl for EffectsEnabled (the new() method) to add this Default implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/events/src/enclave_event/enable_effects.rs`:
- Around line 13-17: EffectsEnabled and SyncEnded were added with a new
correlation_id field but lack serde defaults, causing old empty-event
deserialization to fail; implement Default for CorrelationId (returning a
nil/zero value or provide a helper fn that creates a placeholder) and add
#[serde(default)] to the correlation_id field in both the EffectsEnabled and
SyncEnded structs so old persisted empty objects deserialize to a placeholder
CorrelationId and the existing filtering in sync.rs can handle them; update
correlation_id.rs with the Default impl (or helper) and then add
#[serde(default)] to the correlation_id field declarations in enable_effects.rs
and sync_end.rs.
In `@crates/events/src/enclave_event/sync_end.rs`:
- Around line 13-17: Add serde backward-compatibility to the SyncEnded event by
marking its correlation_id field with a serde default so old persisted events
without that field can deserialize; update the SyncEnded struct (symbol:
SyncEnded) to apply #[serde(default)] or #[serde(default =
"CorrelationId::new")] to the correlation_id field (same approach used for
EffectsEnabled) so deserialization of legacy events succeeds and the sync filter
in sync.rs can run.
---
Nitpick comments:
In `@crates/events/src/enclave_event/enable_effects.rs`:
- Around line 19-25: Add an impl Default for EffectsEnabled that delegates to
EffectsEnabled::new(); specifically implement pub fn default() -> Self {
Self::new() } so callers can rely on the Default trait (e.g., generic bounds,
..Default::default() and struct field defaults) while reusing the existing new()
constructor; modify/locate the impl for EffectsEnabled (the new() method) to add
this Default implementation.
In `@crates/events/src/enclave_event/sync_end.rs`:
- Around line 19-25: The SyncEnded struct has a no-arg constructor
SyncEnded::new() but no Default impl; add impl Default for SyncEnded that simply
calls SyncEnded::new() so callers can use default(), mirroring the pattern used
for EffectsEnabled. Locate the SyncEnded struct and its impl block containing
pub fn new() -> Self and add an impl Default for SyncEnded { fn default() ->
Self { Self::new() } } to provide the standard Default behavior.
On restart, the sync process replayed persisted infrastructure events (
SyncEnded,EffectsEnabled) from the EventStore back through the EventBus. Since these events have no fields, they always produce the sameEventIdhash, poisoning the bloom filter and causing the freshly publishedSyncEndedto be silently dropped. With theEvmChainGatewaynever receivingSyncEnded, it remained stuck inBufferUntilLiveand all live EVM events were silently buffered rather than dispatched.Summary by CodeRabbit
Bug Fixes
Refactor
Tests