feat: replace centralized ag with per-E3 ag committee [skip-line-limit]#1494
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPer-E3 aggregation decentralization: committee members buffer keyshare/decryption-share events; an on-chain–selected active aggregator (lowest non-expelled party in score-sorted committee) performs aggregation and permissionless publication (publishCommittee / publishPlaintextOutput) protected by proof checks and single-publish guards. Changes
Sequence Diagram(s)sequenceDiagram
participant Committee as Committee Members
participant Selector as CiphernodeSelector
participant Node as Node (candidate)
participant Aggregator as ActiveAggregator
participant Chain as CiphernodeRegistry (on-chain)
rect rgba(200,200,255,0.5)
Committee->>Selector: emit CommitteeFinalized(e3Id, committee, ...)
Selector->>Selector: sort by score, persist committee & expelled
Selector->>Committee: emit AggregatorChanged(e3Id, is_aggregator)
end
rect rgba(200,255,200,0.5)
Committee->>Node: broadcast KeyshareCreated / DecryptionshareCreated (buffered)
Node->>Node: observe AggregatorChanged(true)
Node->>Aggregator: flush buffered shares/keyshares
Aggregator->>Aggregator: verify & aggregate (C5/C6/C7)
end
rect rgba(255,200,200,0.5)
Aggregator->>Chain: call publishCommittee / publishPlaintextOutput
Chain->>Chain: verify proofs & single-publish guard (publicKeyHashes == 0)
Chain-->>Aggregator: tx accepted / reverted
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/enclave-contracts/contracts/Enclave.sol (1)
108-108:⚠️ Potential issue | 🟡 MinorName the mapping key to clear the current solhint warning.
committeeThresholdscurrently triggersnamed-parameters-mappingin CI. Naming the key keeps lint output clean and improves readability.Suggested fix
-mapping(CommitteeSize => uint32[2] threshold) public committeeThresholds; +mapping(CommitteeSize committeeSize => uint32[2] threshold) public committeeThresholds;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/enclave-contracts/contracts/Enclave.sol` at line 108, The mapping declaration committeeThresholds currently triggers the solhint named-parameters-mapping rule; update the mapping signature in Enclave.sol to give the key parameter a name (referencing CommitteeSize and committeeThresholds) so the key is explicitly named (e.g., size) while keeping the value name (threshold) and visibility intact; locate the mapping declaration for committeeThresholds and add the key identifier to satisfy the linter.
🧹 Nitpick comments (9)
deploy/copy-secrets.sh (1)
27-27: Quote array elements to prevent word splitting.Shellcheck SC2206 flags unquoted variables in array assignment. While these specific keys won't cause issues, quoting is safer practice.
🔧 Proposed fix
-NET_KEYS=($NETWORK_KEY_CN1 $NETWORK_KEY_CN2 $NETWORK_KEY_CN3 $NETWORK_KEY_CN4) +NET_KEYS=("$NETWORK_KEY_CN1" "$NETWORK_KEY_CN2" "$NETWORK_KEY_CN3" "$NETWORK_KEY_CN4")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/copy-secrets.sh` at line 27, NET_KEYS is being assigned from unquoted variables which can trigger word-splitting; update the NET_KEYS assignment so each element is quoted (i.e., assign NET_KEYS with each of $NETWORK_KEY_CN1, $NETWORK_KEY_CN2, $NETWORK_KEY_CN3, $NETWORK_KEY_CN4 wrapped in double quotes) to ensure each value becomes a single array element and avoid SC2206 warnings.crates/events/src/enclave_event/ticket_generated.rs (1)
23-24: Add a replay-compat test for the new event field.
TicketGeneratedis persisted and replayed, andparty_indexnow affects scheduling. A fixture that deserializes an older payload without this field and asserts it defaults cleanly would make upgrades/restarts much safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/events/src/enclave_event/ticket_generated.rs` around lines 23 - 24, Add a replay-compat unit test for the TicketGenerated event that deserializes an older serialized payload missing the new party_index field and asserts it defaults cleanly (None) and does not change scheduling behavior; specifically create a test (e.g., test_replay_compat_ticket_generated) that loads a byte fixture representing the pre-change TicketGenerated payload, deserializes it into the TicketGenerated struct, asserts ticket_generated.party_index.is_none(), and then runs the same scheduling check/path that would be used during replay to ensure behavior is unchanged.tests/integration/base.sh (1)
77-92: Please add a failover integration case.This still only proves the happy path with every committee member alive. The critical behavior in this PR is that
party_id1..N-M-1take over after the primary times out, so add a companion case that stops or isolates the primary and verifies both waits still succeed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/base.sh` around lines 77 - 92, Add a failover integration case that simulates the primary (party_id 0) timing out and verifies the backup parties (party_id 1..N-M-1) take over: create a new test block that reuses wait_for_committee_pubkey, fake_encrypt.sh, pnpm e3-program:publishInput and pnpm e3:publishCiphertext but before publishing stop or isolate the primary node/process (the one corresponding to party_id 0) — e.g., kill or block its network — then run the publish steps and assert waiton "$SCRIPT_DIR/output/output.bin" and wait_for_plaintext_output 0 "$SCRIPT_DIR/output/plaintext.txt" still succeed; ensure the test cleans up by restarting/unblocking the primary after the case so other tests are unaffected.tests/integration/fns.sh (1)
101-119: Consider separating declaration and assignment to avoid masking return values.The static analyzer flagged that
local start_time=$(date +%s)combines declaration and assignment, which can mask the return value if the command fails.♻️ Proposed fix
wait_for_committee_pubkey() { local e3_id="$1" local out_file="$2" local timeout="${3:-1300}" - local start_time=$(date +%s) + local start_time + start_time=$(date +%s) while true; doThe same pattern applies to
wait_for_plaintext_output(line 125) andwait_for_active_aggregator_address(line 144).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/fns.sh` around lines 101 - 119, The issue is that lines like "local start_time=$(date +%s)" combine declaration and command substitution which can mask the command's return status; change each to declare the local variable first then assign the timestamp in a separate statement (e.g., in wait_for_committee_pubkey(): use "local start_time" then "start_time=$(date +%s)" and handle a possible failure from date if desired), and apply the same pattern to wait_for_plaintext_output and wait_for_active_aggregator_address so the date command's exit code is not swallowed.crates/aggregator/src/keyshare_created_filter_buffer.rs (1)
75-93: Simplify theKeyshareCreatedmatch arms.Lines 76-85 and 86-91 have overlapping logic. The first arm forwards when
is_aggregator && committee.contains && !expelled, while the third arm buffers whencommittee.contains && !expelled(without aggregator check). This could be consolidated:
- If
is_aggregator→ forward (if valid)- Else → buffer (if valid)
- Otherwise → drop
Suggested simplification
EnclaveEventData::KeyshareCreated(data) => match &self.committee { - Some(committee) - if self.is_aggregator - && committee.contains(&data.node) - && !self.expelled_nodes.contains(&data.node) => - { - self.dest.do_send(msg); - } - None => { - self.buffer.push(msg); - } - Some(committee) - if committee.contains(&data.node) - && !self.expelled_nodes.contains(&data.node) => - { - self.buffer.push(msg); + Some(committee) if committee.contains(&data.node) && !self.expelled_nodes.contains(&data.node) => { + if self.is_aggregator { + self.dest.do_send(msg); + } else { + self.buffer.push(msg); + } } + None => { + self.buffer.push(msg); + } _ => {} },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/aggregator/src/keyshare_created_filter_buffer.rs` around lines 75 - 93, The KeyshareCreated match arm is duplicated; consolidate by checking EnclaveEventData::KeyshareCreated(data) then if let Some(committee) = &self.committee and committee.contains(&data.node) and !self.expelled_nodes.contains(&data.node) then dispatch based on self.is_aggregator: if self.is_aggregator call self.dest.do_send(msg) else call self.buffer.push(msg); otherwise drop (no action). This replaces the two overlapping Some(committee) arms and the None branch with a single clear conditional using self.committee, self.is_aggregator, self.expelled_nodes, self.dest.do_send, and self.buffer.push.crates/aggregator/src/threshold_plaintext_aggregator.rs (1)
266-329: Verify threshold semantics whenthreshold_nreaches zero.Line 306 checks
threshold_n == 0to stay inCollecting, but if all parties are expelled,threshold_ncould reach zero through repeated decrements. At that point, the aggregator is stuck inCollectingwith no path forward. Consider whether an explicit failure transition is needed whenthreshold_ndrops belowthreshold_m(line 289 logs a warning but continues).The current behavior is safe (stays in
Collecting), but may benefit from emitting an event or transitioning to a failed state for observability.crates/aggregator/src/decryptionshare_created_buffer.rs (1)
31-52: Consider retaining non-forwarded events for late aggregator promotion.In
flush(), events that don't match the forwarding criteria (lines 49-50) are silently dropped. If this node later receivesAggregatorChanged(is_aggregator=true)again (e.g., after a failed handoff), those dropped events won't be available.Currently,
flush()is only called once when becoming aggregator, so this may be acceptable. However, if the buffer could be re-used or if aggregator role can toggle, consider whether dropped events should be retained or logged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/aggregator/src/decryptionshare_created_buffer.rs` around lines 31 - 52, flush() currently drains and drops any events that don't match forwarding criteria, losing them if this node later becomes aggregator again; change the logic in flush() to only remove and forward matching events and retain non-forwarded events in self.buffer for potential later promotion: iterate over a drained list or use a temporary Vec (e.g., retained_events) and for each event match on event.get_data() (EnclaveEventData::DecryptionshareCreated, ::CommitteeMemberExpelled, ::E3RequestComplete, ::Shutdown) and call self.dest.do_send(event) only for forwarded cases, otherwise push the event onto retained_events; after the loop assign retained_events back to self.buffer (or perform an in-place retain) so non-forwarded events are not lost; keep using self.expelled_parties and self.is_aggregator checks as before.crates/evm/src/ciphernode_registry_sol.rs (1)
437-461: Preflight error allows finalization to proceed.When
should_finalize_committeereturnsErr, the code logs an error but falls through to attempt finalization anyway. This is likely intentional (fail-open for resilience), but could lead to wasted gas on transactions that will revert.Consider documenting this intentional fail-open behavior or returning early on preflight errors:
Err(err) => { error!("Failed to preflight finalizeCommittee: {}", format_evm_error(&err)); return; // or continue with caution }The current behavior may be acceptable if the on-chain call has its own guards.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/evm/src/ciphernode_registry_sol.rs` around lines 437 - 461, The match in handle (handling CommitteeFinalizeRequested) calls should_finalize_committee and currently logs an error on Err but continues to attempt finalization; update the Err branch to return early (i.e., after error log, stop the async task) to avoid hitting finalizeCommittee on preflight failure, referencing the handle method, CommitteeFinalizeRequested, should_finalize_committee, and format_evm_error so you locate the match and add the early return; alternatively, if you intend fail-open behavior, add a clear comment documenting that decision instead of changing control flow.packages/enclave-contracts/tasks/enclave.ts (1)
39-54: Consider returning thesignerfrom the connection helper.The
getRegistryConnectionhelper returnsethers,deployment, andregistry, but not thesigner. While the registry is already connected with the signer, some callers might need direct access to the signer for other operations. This is a minor observation as current usage doesn't require it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/enclave-contracts/tasks/enclave.ts` around lines 39 - 54, The helper getRegistryConnection should also return the signer so callers can use it directly; update the return object in getRegistryConnection to include the signer (the local variable signer obtained from ethers.getSigners()) alongside ethers, deployment, and registry, and adjust any callers of getRegistryConnection if they need to destructure or use signer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 626-638: The should_publish_committee function currently treats
any error from ICiphernodeRegistry::committeePublicKey as "not published";
change it to match on the call() result so only an explicit empty key
(Bytes::default() or equivalent) returns Ok(true), a non-empty key returns
Ok(false), and all other Err(e) are propagated (Err(e.into())); update
references in should_publish_committee to use the matched result of
contract.committeePublicKey(e3_id_u256).call().await and add a short comment
clarifying which error cases are considered real failures versus "not
published".
In `@packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol`:
- Line 192: The doc comment on CiphernodeRegistryOwnable (the line starting "///
`@dev` Permissionless once the committee is finalized. Verification of C5 proof is
done in Enclave.onCommitteePublished.") exceeds 120 chars; split it into two
shorter /// `@dev` comment lines so each is under the 120-character limit (for
example break after "finalized." and continue the rest on the next /// `@dev`
line) while preserving the exact wording and reference to
Enclave.onCommitteePublished.
In `@packages/enclave-contracts/tasks/enclave.ts`:
- Around line 443-455: The mapping over activeNodes uses activeScores[index]
without validating lengths, which can produce undefined scores and break the
sort; update the block around activeNodes/activeScores to first validate that
activeScores.length >= activeNodes.length (or filter/map only entries where
activeScores[index] is defined) before building the objects, and handle
mismatches by either throwing a clear error or assigning a safe default score;
ensure the change touches the logic that constructs activeAggregator (the map +
sort that references activeNodes, activeScores, index) so you never access
undefined scores and logging of activeAggregator.node remains safe.
In `@tests/integration/persist.sh`:
- Around line 74-79: The script assigns ACTIVE_AGG_ADDRESS and ACTIVE_AGG via
wait_for_active_aggregator_address and node_name_for_address but doesn't fail if
node_name_for_address fails, so add an explicit fail-fast guard right after the
ACTIVE_AGG assignment: check the exit status ($?) or test that ACTIVE_AGG is
non-empty/valid (or matches expected node name pattern), and if the resolution
failed, print an error (including ACTIVE_AGG_ADDRESS) and exit 1; do the same
guard for the analogous stop/start pair that uses the resolved node name (the
second instance around the enclave_nodes_start/enclave_nodes_stop usage) so
enclave_nodes_stop/enclave_nodes_start never receive an invalid node name.
---
Outside diff comments:
In `@packages/enclave-contracts/contracts/Enclave.sol`:
- Line 108: The mapping declaration committeeThresholds currently triggers the
solhint named-parameters-mapping rule; update the mapping signature in
Enclave.sol to give the key parameter a name (referencing CommitteeSize and
committeeThresholds) so the key is explicitly named (e.g., size) while keeping
the value name (threshold) and visibility intact; locate the mapping declaration
for committeeThresholds and add the key identifier to satisfy the linter.
---
Nitpick comments:
In `@crates/aggregator/src/decryptionshare_created_buffer.rs`:
- Around line 31-52: flush() currently drains and drops any events that don't
match forwarding criteria, losing them if this node later becomes aggregator
again; change the logic in flush() to only remove and forward matching events
and retain non-forwarded events in self.buffer for potential later promotion:
iterate over a drained list or use a temporary Vec (e.g., retained_events) and
for each event match on event.get_data()
(EnclaveEventData::DecryptionshareCreated, ::CommitteeMemberExpelled,
::E3RequestComplete, ::Shutdown) and call self.dest.do_send(event) only for
forwarded cases, otherwise push the event onto retained_events; after the loop
assign retained_events back to self.buffer (or perform an in-place retain) so
non-forwarded events are not lost; keep using self.expelled_parties and
self.is_aggregator checks as before.
In `@crates/aggregator/src/keyshare_created_filter_buffer.rs`:
- Around line 75-93: The KeyshareCreated match arm is duplicated; consolidate by
checking EnclaveEventData::KeyshareCreated(data) then if let Some(committee) =
&self.committee and committee.contains(&data.node) and
!self.expelled_nodes.contains(&data.node) then dispatch based on
self.is_aggregator: if self.is_aggregator call self.dest.do_send(msg) else call
self.buffer.push(msg); otherwise drop (no action). This replaces the two
overlapping Some(committee) arms and the None branch with a single clear
conditional using self.committee, self.is_aggregator, self.expelled_nodes,
self.dest.do_send, and self.buffer.push.
In `@crates/events/src/enclave_event/ticket_generated.rs`:
- Around line 23-24: Add a replay-compat unit test for the TicketGenerated event
that deserializes an older serialized payload missing the new party_index field
and asserts it defaults cleanly (None) and does not change scheduling behavior;
specifically create a test (e.g., test_replay_compat_ticket_generated) that
loads a byte fixture representing the pre-change TicketGenerated payload,
deserializes it into the TicketGenerated struct, asserts
ticket_generated.party_index.is_none(), and then runs the same scheduling
check/path that would be used during replay to ensure behavior is unchanged.
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 437-461: The match in handle (handling CommitteeFinalizeRequested)
calls should_finalize_committee and currently logs an error on Err but continues
to attempt finalization; update the Err branch to return early (i.e., after
error log, stop the async task) to avoid hitting finalizeCommittee on preflight
failure, referencing the handle method, CommitteeFinalizeRequested,
should_finalize_committee, and format_evm_error so you locate the match and add
the early return; alternatively, if you intend fail-open behavior, add a clear
comment documenting that decision instead of changing control flow.
In `@deploy/copy-secrets.sh`:
- Line 27: NET_KEYS is being assigned from unquoted variables which can trigger
word-splitting; update the NET_KEYS assignment so each element is quoted (i.e.,
assign NET_KEYS with each of $NETWORK_KEY_CN1, $NETWORK_KEY_CN2,
$NETWORK_KEY_CN3, $NETWORK_KEY_CN4 wrapped in double quotes) to ensure each
value becomes a single array element and avoid SC2206 warnings.
In `@packages/enclave-contracts/tasks/enclave.ts`:
- Around line 39-54: The helper getRegistryConnection should also return the
signer so callers can use it directly; update the return object in
getRegistryConnection to include the signer (the local variable signer obtained
from ethers.getSigners()) alongside ethers, deployment, and registry, and adjust
any callers of getRegistryConnection if they need to destructure or use signer.
In `@tests/integration/base.sh`:
- Around line 77-92: Add a failover integration case that simulates the primary
(party_id 0) timing out and verifies the backup parties (party_id 1..N-M-1) take
over: create a new test block that reuses wait_for_committee_pubkey,
fake_encrypt.sh, pnpm e3-program:publishInput and pnpm e3:publishCiphertext but
before publishing stop or isolate the primary node/process (the one
corresponding to party_id 0) — e.g., kill or block its network — then run the
publish steps and assert waiton "$SCRIPT_DIR/output/output.bin" and
wait_for_plaintext_output 0 "$SCRIPT_DIR/output/plaintext.txt" still succeed;
ensure the test cleans up by restarting/unblocking the primary after the case so
other tests are unaffected.
In `@tests/integration/fns.sh`:
- Around line 101-119: The issue is that lines like "local start_time=$(date
+%s)" combine declaration and command substitution which can mask the command's
return status; change each to declare the local variable first then assign the
timestamp in a separate statement (e.g., in wait_for_committee_pubkey(): use
"local start_time" then "start_time=$(date +%s)" and handle a possible failure
from date if desired), and apply the same pattern to wait_for_plaintext_output
and wait_for_active_aggregator_address so the date command's exit code is not
swallowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9917a1ef-5503-4ddd-9bbb-0a69304c1c01
📒 Files selected for processing (70)
agent/flow-trace/00_INDEX.mdagent/flow-trace/03_E3_REQUEST_AND_COMMITTEE.mdagent/flow-trace/04_DKG_AND_COMPUTATION.mdagent/flow-trace/05_FAILURE_REFUND_SLASHING.mdagent/flow-trace/06_DEACTIVATION_AND_COMPLETION.mdcrates/aggregator/src/committee_finalizer.rscrates/aggregator/src/decryptionshare_created_buffer.rscrates/aggregator/src/ext.rscrates/aggregator/src/keyshare_created_filter_buffer.rscrates/aggregator/src/lib.rscrates/aggregator/src/threshold_plaintext_aggregator.rscrates/ciphernode-builder/src/ciphernode_builder.rscrates/ciphernode-entrypoint.shcrates/cli/src/start.rscrates/config/src/app_config.rscrates/entrypoint/src/start/aggregator_start.rscrates/entrypoint/src/start/mod.rscrates/entrypoint/src/start/start.rscrates/events/src/committee.rscrates/events/src/enclave_event/aggregator_changed.rscrates/events/src/enclave_event/mod.rscrates/events/src/enclave_event/ticket_generated.rscrates/evm/src/ciphernode_registry_sol.rscrates/evm/src/enclave_sol_writer.rscrates/sortition/src/ciphernode_selector.rscrates/sortition/src/repo.rscrates/sortition/src/sortition.rsdappnode/README.mddappnode/config.template.yamldappnode/docker-compose.ymldappnode/entrypoint.shdappnode/setup-wizard.ymldeploy/cn2.yamldeploy/cn3.yamldeploy/cn4.yamldeploy/copy-secrets.shdeploy/docker-compose.ymldeploy/inspect.shdeploy/local/nodes.shdeploy/local/start.shdeploy/swarm_deployment.mddocs/pages/ciphernode-operators/running.mdxdocs/pages/project-template.mdxexamples/CRISP/.env.exampleexamples/CRISP/Readme.mdexamples/CRISP/enclave.config.yamlexamples/CRISP/scripts/dev_cipher.shexamples/CRISP/scripts/setup_testnet.shpackage.jsonpackages/enclave-contracts/artifacts/contracts/Enclave.sol/Enclave.jsonpackages/enclave-contracts/artifacts/contracts/interfaces/IBondingRegistry.sol/IBondingRegistry.jsonpackages/enclave-contracts/artifacts/contracts/interfaces/ICiphernodeRegistry.sol/ICiphernodeRegistry.jsonpackages/enclave-contracts/artifacts/contracts/interfaces/IEnclave.sol/IEnclave.jsonpackages/enclave-contracts/artifacts/contracts/interfaces/ISlashingManager.sol/ISlashingManager.jsonpackages/enclave-contracts/artifacts/contracts/registry/CiphernodeRegistryOwnable.sol/CiphernodeRegistryOwnable.jsonpackages/enclave-contracts/contracts/Enclave.solpackages/enclave-contracts/contracts/interfaces/ICiphernodeRegistry.solpackages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.solpackages/enclave-contracts/contracts/test/MockCiphernodeRegistry.solpackages/enclave-contracts/hardhat.config.tspackages/enclave-contracts/package.jsonpackages/enclave-contracts/tasks/enclave.tspackages/enclave-contracts/test/Registry/CiphernodeRegistryOwnable.spec.tspackages/enclave-contracts/test/Slashing/CommitteeExpulsion.spec.tstemplates/default/enclave.config.yamltemplates/default/scripts/dev_ciphernodes.shtests/integration/base.shtests/integration/enclave.config.yamltests/integration/fns.shtests/integration/persist.sh
💤 Files with no reviewable changes (14)
- deploy/cn4.yaml
- crates/entrypoint/src/start/mod.rs
- dappnode/config.template.yaml
- dappnode/setup-wizard.yml
- deploy/local/nodes.sh
- templates/default/enclave.config.yaml
- dappnode/entrypoint.sh
- dappnode/docker-compose.yml
- examples/CRISP/enclave.config.yaml
- templates/default/scripts/dev_ciphernodes.sh
- examples/CRISP/scripts/dev_cipher.sh
- tests/integration/enclave.config.yaml
- crates/config/src/app_config.rs
- crates/entrypoint/src/start/aggregator_start.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/evm/src/ciphernode_registry_sol.rs (1)
447-461: Preflight error handling continues despite errors — intentional resilience?When the preflight check fails with an error (lines 454-459), the code logs the error but continues to attempt finalization. This provides resilience against transient RPC issues but may cause unnecessary failed transactions if the preflight fails due to a persistent problem.
Consider whether the
Errcase should alsoreturnearly, or add a comment clarifying that continuing on preflight errors is intentional for resilience:Suggested clarification
Err(err) => { + // Preflight failed, but proceed anyway for resilience against transient RPC issues error!( "Failed to preflight finalizeCommittee: {}", format_evm_error(&err) ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/evm/src/ciphernode_registry_sol.rs` around lines 447 - 461, The preflight error branch currently logs the error from should_finalize_committee (using format_evm_error) but then continues to attempt finalizeCommittee; update the Err arm to either return early after logging to avoid sending likely-failing transactions, or explicitly add a clear comment above the match explaining that continuing on preflight errors is an intentional resilience decision (include rationale), and ensure the log includes e3_id and sufficient context; edit the match handling in the async block that calls should_finalize_committee and the Err arm that formats the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 447-461: The preflight error branch currently logs the error from
should_finalize_committee (using format_evm_error) but then continues to attempt
finalizeCommittee; update the Err arm to either return early after logging to
avoid sending likely-failing transactions, or explicitly add a clear comment
above the match explaining that continuing on preflight errors is an intentional
resilience decision (include rationale), and ensure the log includes e3_id and
sufficient context; edit the match handling in the async block that calls
should_finalize_committee and the Err arm that formats the error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 19fccd47-1cbf-417d-ac24-1ae810004e23
📒 Files selected for processing (4)
crates/evm/src/ciphernode_registry_sol.rspackages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.solpackages/enclave-contracts/tasks/enclave.tstests/integration/persist.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration/persist.sh
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 447-460: The caller around the finalizeCommittee flow calls
should_finalize_committee(provider, contract_address, e3_id) and treats any Err
as a logged preflight failure but then proceeds to submit finalizeCommittee;
change should_finalize_committee so that when the underlying error is the
ThresholdNotMet condition it returns Ok(false) (i.e., map ThresholdNotMet ->
Ok(false)) instead of Err, so callers (the finalizeCommittee submission code)
will skip sending finalizeCommittee for committees that haven't reached
threshold; update the same behavior for the other identical call site that
checks finalizability before calling finalizeCommittee as well.
- Around line 275-276: The active_aggregators map is only populated from future
AggregatorChanged events, so late/restarted writers start with an empty
active_aggregators and is_active_aggregator_for() wrongly returns false; fix by
hydrating active_aggregators on attach/startup from current on-chain/stateful
information (e.g., query the current aggregator status or scan past
AggregatorChanged/PublicKeyAggregated state) and populate the HashMap<E3id,
bool> before returning from attach(); ensure the same key names
(active_aggregators, attach, is_active_aggregator_for, AggregatorChanged,
PublicKeyAggregated) are updated so existing logic that checks
active_aggregators behaves correctly for already-eligible nodes.
In `@crates/keyshare/src/threshold_keyshare.rs`:
- Around line 580-591: The handler currently drops ThresholdShareCreated when
state.state isn't GeneratingThresholdShare or AggregatingDecryptionKey; update
the guard to also accept KeyshareState::CollectingEncryptionKeys so early
single-publish shares aren't lost—i.e., change the matches(...) to include
KeyshareState::CollectingEncryptionKeys (and adjust the trace message if
desired), ensuring ThresholdShareCreated messages with msg.share.party_id are
processed or queued instead of returned early while state.variant_name() is
CollectingEncryptionKeys.
- Around line 625-633: The handler for EncryptionKeyCreated currently ignores
messages unless state.state is KeyshareState::CollectingEncryptionKeys, which
drops legitimate early arrivals; update the check to also accept the pre-init
window by allowing KeyshareState::Init (or other appropriate pre-collection
variants) and, if in Init, stash the incoming key (e.g., push into a
pending_encryption_keys collection on state) so it will be processed once the
state transitions to KeyshareState::CollectingEncryptionKeys; keep the existing
trace logging (using state.e3_id, state.variant_name(), msg.key.party_id) but
remove the unconditional return for pre-init cases so early keys are preserved
and applied during the collection phase.
In `@crates/net/src/net_sync_manager.rs`:
- Around line 401-404: The DirectRequester retry configuration (the
DirectRequester::builder(...) call that sets
.max_retries(3).retry_timeout(Duration::from_secs(5))) is too aggressive and
makes historical sync brittle; change the behavior to either restore a larger
retry window (e.g., increase max_retries back toward the previous value or
increase retry_timeout) and/or modify the fallback logic in the NetSyncManager's
connection handling (the code that reacts to ConnectionEstablished and the
fallback block currently between the initial request and the
ConnectionEstablished wait) to continue retrying requests against an
already-connected peer instead of only waiting for a new ConnectionEstablished
event; update DirectRequester::builder(...) and the fallback retry loop so
requests keep being retried while the peer remains connected or extend
max_retries/retry_timeout to cover transient stalls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3736614f-62ab-4c50-a884-c443c45502ef
📒 Files selected for processing (4)
crates/evm/src/ciphernode_registry_sol.rscrates/keyshare/src/threshold_keyshare.rscrates/net/src/net_sync_manager.rstests/integration/persist.sh
ca5b1ca to
a5e609b
Compare
a5e609b to
1c1625a
Compare
1c1625a to
0a835f1
Compare
Closes #1476
Summary by CodeRabbit
Refactor
New Features
Documentation