Skip to content

feat: replace centralized ag with per-E3 ag committee [skip-line-limit]#1494

Merged
ctrlc03 merged 12 commits into
mainfrom
feat/1476-ag-committee
Apr 1, 2026
Merged

feat: replace centralized ag with per-E3 ag committee [skip-line-limit]#1494
ctrlc03 merged 12 commits into
mainfrom
feat/1476-ag-committee

Conversation

@hmzakhalid

@hmzakhalid hmzakhalid commented Mar 29, 2026

Copy link
Copy Markdown
Collaborator

Closes #1476

Summary by CodeRabbit

  • Refactor

    • Unified node model: removed dedicated aggregator role; committee publication and plaintext submission are permissionless, protected by on-chain proof checks and duplicate-publish guards, with an active aggregator deterministically selected from the committee.
  • New Features

    • CLI/Hardhat tasks and scripts to fetch committee public keys, query the active aggregator, and retrieve decoded plaintext output.
  • Documentation

    • Removed role-specific setup docs and updated templates/examples to the unified ciphernode model.

@vercel

vercel Bot commented Mar 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Apr 1, 2026 9:26am
enclave-docs Ready Ready Preview, Comment Apr 1, 2026 9:26am

Request Review

@coderabbitai

coderabbitai Bot commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Per-E3 aggregation decentralization: committee members buffer keyshare/decryption-share events; an on-chain–selected active aggregator (lowest non-expelled party in score-sorted committee) performs aggregation and permissionless publication (publishCommittee / publishPlaintextOutput) protected by proof checks and single-publish guards.

Changes

Cohort / File(s) Summary
Flow-trace docs
agent/flow-trace/00_INDEX.md, agent/flow-trace/03_E3_REQUEST_AND_COMMITTEE.md, agent/flow-trace/04_DKG_AND_COMPUTATION.md, agent/flow-trace/05_FAILURE_REFUND_SLASHING.md, agent/flow-trace/06_DEACTIVATION_AND_COMPLETION.md
Document shift from a central aggregator to per‑E3 active-aggregator model, buffered keyshare/decryptionshare flows, permissionless on-chain publication, and associated minor event/payload changes.
Aggregator crate — buffering & finalization
crates/aggregator/src/committee_finalizer.rs, crates/aggregator/src/keyshare_created_filter_buffer.rs, crates/aggregator/src/decryptionshare_created_buffer.rs, crates/aggregator/src/threshold_plaintext_aggregator.rs, crates/aggregator/src/ext.rs, crates/aggregator/src/lib.rs
Add buffering actors for keyshares/decryptionshares; convert committee finalization scheduling to per‑party staggered timers; gate buffer flush on AggregatorChanged; handle expelled members and update aggregator aggregation handlers.
Event types & selector/sortition
crates/events/src/enclave_event/aggregator_changed.rs, crates/events/src/enclave_event/mod.rs, crates/events/src/enclave_event/ticket_generated.rs, crates/events/src/committee.rs, crates/sortition/src/ciphernode_selector.rs, crates/sortition/src/sortition.rs, crates/sortition/src/repo.rs
Add AggregatorChanged event, include party_index in TicketGenerated, persist selector state (committees/expelled/is_aggregator), sort committees by score, and expose active-aggregator selection logic.
EVM writers / runtime gating
crates/evm/src/ciphernode_registry_sol.rs, crates/evm/src/enclave_sol_writer.rs
Writers track effects_enabled and per‑E3 active_aggregators, subscribe to AggregatorChanged, perform on‑chain preflight checks, and only publish when node is active aggregator.
Contracts / ABI & runtime contract behavior
packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol, packages/enclave-contracts/contracts/interfaces/ICiphernodeRegistry.sol, packages/enclave-contracts/artifacts/.../ICiphernodeRegistry.json, packages/enclave-contracts/contracts/Enclave.sol, packages/enclave-contracts/contracts/test/*, packages/enclave-contracts/test/*
Made publishCommittee permissionless (onlyOwner removed); getActiveCommitteeNodes now returns (address[] nodes, uint256[] scores); callers/tests updated to destructure and ignore scores when appropriate.
Entrypoint / config / CLI
crates/cli/src/start.rs, crates/entrypoint/src/start/aggregator_start.rs (removed), crates/entrypoint/src/start/start.rs, crates/ciphernode-builder/src/ciphernode_builder.rs, crates/config/src/app_config.rs, crates/ciphernode-entrypoint.sh
Remove explicit aggregator role/config; unify startup to a single ciphernode binary; drop NodeRole and aggregator-specific entrypoint; builder enables pubkey/plaintext aggregation and emits persisted aggregator state on start.
Tests, scripts & integration helpers
packages/enclave-contracts/tasks/enclave.ts, package.json, tests/integration/*, tests/integration/fns.sh, tests/integration/base.sh, tests/integration/persist.sh
Add Hardhat tasks to query committee public key / active aggregator / plaintext; add polling helpers and integration script changes to discover active aggregator and wait for outputs.
Examples, templates & deployment
examples/CRISP/*, templates/default/*, dappnode/*, deploy/*, deploy/docker-compose.yml, deploy/cn*.yaml
Remove dedicated aggregator node from examples/templates; rename aggregator → cn4 in deployments; update docker/compose, scripts, and dappnode templates to remove NODE_ROLE handling.
Misc & artifacts
crates/aggregator/src/*, crates/lib.rs, packages/enclave-contracts/artifacts/*, crates/net/src/net_sync_manager.rs
Add decryptionshare buffer module, mailbox capacity tweaks, artifact buildInfoId updates, and a reduced event-fetch retry count in net sync manager.

Sequence Diagram(s)

sequenceDiagram
    participant Committee as Committee Members
    participant Selector as CiphernodeSelector
    participant Node as Node (candidate)
    participant Aggregator as ActiveAggregator
    participant Chain as CiphernodeRegistry (on-chain)

    rect rgba(200,200,255,0.5)
    Committee->>Selector: emit CommitteeFinalized(e3Id, committee, ...)
    Selector->>Selector: sort by score, persist committee & expelled
    Selector->>Committee: emit AggregatorChanged(e3Id, is_aggregator)
    end

    rect rgba(200,255,200,0.5)
    Committee->>Node: broadcast KeyshareCreated / DecryptionshareCreated (buffered)
    Node->>Node: observe AggregatorChanged(true)
    Node->>Aggregator: flush buffered shares/keyshares
    Aggregator->>Aggregator: verify & aggregate (C5/C6/C7)
    end

    rect rgba(255,200,200,0.5)
    Aggregator->>Chain: call publishCommittee / publishPlaintextOutput
    Chain->>Chain: verify proofs & single-publish guard (publicKeyHashes == 0)
    Chain-->>Aggregator: tx accepted / reverted
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

ciphernode, contracts

Suggested reviewers

  • ctrlc03

Poem

"I hopped through logs and buffered night,
Collected shares until the light.
The lowest ticket raised its paw,
and any node can now withdraw.
🐇✨ — a rabbit's tiny byte"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.77% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main architectural shift: replacing a single centralized aggregator with a per-E3 committee-based aggregator system, which is the primary change across all modifications.
Linked Issues check ✅ Passed All code changes successfully implement the objectives from #1476: remove centralized aggregator role, enable per-E3 aggregator selection from committee members by party_id, add permissionless publication with C5 proof verification, and maintain single unified binary for all nodes.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the decentralized aggregator architecture: aggregator selection logic, event buffering, contract interface updates, deployment/configuration cleanup, and integration test updates align with issue #1476 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/1476-ag-committee

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/enclave-contracts/contracts/Enclave.sol (1)

108-108: ⚠️ Potential issue | 🟡 Minor

Name the mapping key to clear the current solhint warning.

committeeThresholds currently triggers named-parameters-mapping in CI. Naming the key keeps lint output clean and improves readability.

Suggested fix
-mapping(CommitteeSize => uint32[2] threshold) public committeeThresholds;
+mapping(CommitteeSize committeeSize => uint32[2] threshold) public committeeThresholds;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/enclave-contracts/contracts/Enclave.sol` at line 108, The mapping
declaration committeeThresholds currently triggers the solhint
named-parameters-mapping rule; update the mapping signature in Enclave.sol to
give the key parameter a name (referencing CommitteeSize and
committeeThresholds) so the key is explicitly named (e.g., size) while keeping
the value name (threshold) and visibility intact; locate the mapping declaration
for committeeThresholds and add the key identifier to satisfy the linter.
🧹 Nitpick comments (9)
deploy/copy-secrets.sh (1)

27-27: Quote array elements to prevent word splitting.

Shellcheck SC2206 flags unquoted variables in array assignment. While these specific keys won't cause issues, quoting is safer practice.

🔧 Proposed fix
-NET_KEYS=($NETWORK_KEY_CN1 $NETWORK_KEY_CN2 $NETWORK_KEY_CN3 $NETWORK_KEY_CN4)
+NET_KEYS=("$NETWORK_KEY_CN1" "$NETWORK_KEY_CN2" "$NETWORK_KEY_CN3" "$NETWORK_KEY_CN4")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/copy-secrets.sh` at line 27, NET_KEYS is being assigned from unquoted
variables which can trigger word-splitting; update the NET_KEYS assignment so
each element is quoted (i.e., assign NET_KEYS with each of $NETWORK_KEY_CN1,
$NETWORK_KEY_CN2, $NETWORK_KEY_CN3, $NETWORK_KEY_CN4 wrapped in double quotes)
to ensure each value becomes a single array element and avoid SC2206 warnings.
crates/events/src/enclave_event/ticket_generated.rs (1)

23-24: Add a replay-compat test for the new event field.

TicketGenerated is persisted and replayed, and party_index now affects scheduling. A fixture that deserializes an older payload without this field and asserts it defaults cleanly would make upgrades/restarts much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/enclave_event/ticket_generated.rs` around lines 23 - 24,
Add a replay-compat unit test for the TicketGenerated event that deserializes an
older serialized payload missing the new party_index field and asserts it
defaults cleanly (None) and does not change scheduling behavior; specifically
create a test (e.g., test_replay_compat_ticket_generated) that loads a byte
fixture representing the pre-change TicketGenerated payload, deserializes it
into the TicketGenerated struct, asserts ticket_generated.party_index.is_none(),
and then runs the same scheduling check/path that would be used during replay to
ensure behavior is unchanged.
tests/integration/base.sh (1)

77-92: Please add a failover integration case.

This still only proves the happy path with every committee member alive. The critical behavior in this PR is that party_id 1..N-M-1 take over after the primary times out, so add a companion case that stops or isolates the primary and verifies both waits still succeed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/base.sh` around lines 77 - 92, Add a failover integration
case that simulates the primary (party_id 0) timing out and verifies the backup
parties (party_id 1..N-M-1) take over: create a new test block that reuses
wait_for_committee_pubkey, fake_encrypt.sh, pnpm e3-program:publishInput and
pnpm e3:publishCiphertext but before publishing stop or isolate the primary
node/process (the one corresponding to party_id 0) — e.g., kill or block its
network — then run the publish steps and assert waiton
"$SCRIPT_DIR/output/output.bin" and wait_for_plaintext_output 0
"$SCRIPT_DIR/output/plaintext.txt" still succeed; ensure the test cleans up by
restarting/unblocking the primary after the case so other tests are unaffected.
tests/integration/fns.sh (1)

101-119: Consider separating declaration and assignment to avoid masking return values.

The static analyzer flagged that local start_time=$(date +%s) combines declaration and assignment, which can mask the return value if the command fails.

♻️ Proposed fix
   wait_for_committee_pubkey() {
     local e3_id="$1"
     local out_file="$2"
     local timeout="${3:-1300}"
-    local start_time=$(date +%s)
+    local start_time
+    start_time=$(date +%s)

     while true; do

The same pattern applies to wait_for_plaintext_output (line 125) and wait_for_active_aggregator_address (line 144).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/fns.sh` around lines 101 - 119, The issue is that lines
like "local start_time=$(date +%s)" combine declaration and command substitution
which can mask the command's return status; change each to declare the local
variable first then assign the timestamp in a separate statement (e.g., in
wait_for_committee_pubkey(): use "local start_time" then "start_time=$(date
+%s)" and handle a possible failure from date if desired), and apply the same
pattern to wait_for_plaintext_output and wait_for_active_aggregator_address so
the date command's exit code is not swallowed.
crates/aggregator/src/keyshare_created_filter_buffer.rs (1)

75-93: Simplify the KeyshareCreated match arms.

Lines 76-85 and 86-91 have overlapping logic. The first arm forwards when is_aggregator && committee.contains && !expelled, while the third arm buffers when committee.contains && !expelled (without aggregator check). This could be consolidated:

  • If is_aggregator → forward (if valid)
  • Else → buffer (if valid)
  • Otherwise → drop
Suggested simplification
 EnclaveEventData::KeyshareCreated(data) => match &self.committee {
-    Some(committee)
-        if self.is_aggregator
-            && committee.contains(&data.node)
-            && !self.expelled_nodes.contains(&data.node) =>
-    {
-        self.dest.do_send(msg);
-    }
-    None => {
-        self.buffer.push(msg);
-    }
-    Some(committee)
-        if committee.contains(&data.node)
-            && !self.expelled_nodes.contains(&data.node) =>
-    {
-        self.buffer.push(msg);
+    Some(committee) if committee.contains(&data.node) && !self.expelled_nodes.contains(&data.node) => {
+        if self.is_aggregator {
+            self.dest.do_send(msg);
+        } else {
+            self.buffer.push(msg);
+        }
     }
+    None => {
+        self.buffer.push(msg);
+    }
     _ => {}
 },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/aggregator/src/keyshare_created_filter_buffer.rs` around lines 75 -
93, The KeyshareCreated match arm is duplicated; consolidate by checking
EnclaveEventData::KeyshareCreated(data) then if let Some(committee) =
&self.committee and committee.contains(&data.node) and
!self.expelled_nodes.contains(&data.node) then dispatch based on
self.is_aggregator: if self.is_aggregator call self.dest.do_send(msg) else call
self.buffer.push(msg); otherwise drop (no action). This replaces the two
overlapping Some(committee) arms and the None branch with a single clear
conditional using self.committee, self.is_aggregator, self.expelled_nodes,
self.dest.do_send, and self.buffer.push.
crates/aggregator/src/threshold_plaintext_aggregator.rs (1)

266-329: Verify threshold semantics when threshold_n reaches zero.

Line 306 checks threshold_n == 0 to stay in Collecting, but if all parties are expelled, threshold_n could reach zero through repeated decrements. At that point, the aggregator is stuck in Collecting with no path forward. Consider whether an explicit failure transition is needed when threshold_n drops below threshold_m (line 289 logs a warning but continues).

The current behavior is safe (stays in Collecting), but may benefit from emitting an event or transitioning to a failed state for observability.

crates/aggregator/src/decryptionshare_created_buffer.rs (1)

31-52: Consider retaining non-forwarded events for late aggregator promotion.

In flush(), events that don't match the forwarding criteria (lines 49-50) are silently dropped. If this node later receives AggregatorChanged(is_aggregator=true) again (e.g., after a failed handoff), those dropped events won't be available.

Currently, flush() is only called once when becoming aggregator, so this may be acceptable. However, if the buffer could be re-used or if aggregator role can toggle, consider whether dropped events should be retained or logged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/aggregator/src/decryptionshare_created_buffer.rs` around lines 31 -
52, flush() currently drains and drops any events that don't match forwarding
criteria, losing them if this node later becomes aggregator again; change the
logic in flush() to only remove and forward matching events and retain
non-forwarded events in self.buffer for potential later promotion: iterate over
a drained list or use a temporary Vec (e.g., retained_events) and for each event
match on event.get_data() (EnclaveEventData::DecryptionshareCreated,
::CommitteeMemberExpelled, ::E3RequestComplete, ::Shutdown) and call
self.dest.do_send(event) only for forwarded cases, otherwise push the event onto
retained_events; after the loop assign retained_events back to self.buffer (or
perform an in-place retain) so non-forwarded events are not lost; keep using
self.expelled_parties and self.is_aggregator checks as before.
crates/evm/src/ciphernode_registry_sol.rs (1)

437-461: Preflight error allows finalization to proceed.

When should_finalize_committee returns Err, the code logs an error but falls through to attempt finalization anyway. This is likely intentional (fail-open for resilience), but could lead to wasted gas on transactions that will revert.

Consider documenting this intentional fail-open behavior or returning early on preflight errors:

Err(err) => {
    error!("Failed to preflight finalizeCommittee: {}", format_evm_error(&err));
    return; // or continue with caution
}

The current behavior may be acceptable if the on-chain call has its own guards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/evm/src/ciphernode_registry_sol.rs` around lines 437 - 461, The match
in handle (handling CommitteeFinalizeRequested) calls should_finalize_committee
and currently logs an error on Err but continues to attempt finalization; update
the Err branch to return early (i.e., after error log, stop the async task) to
avoid hitting finalizeCommittee on preflight failure, referencing the handle
method, CommitteeFinalizeRequested, should_finalize_committee, and
format_evm_error so you locate the match and add the early return;
alternatively, if you intend fail-open behavior, add a clear comment documenting
that decision instead of changing control flow.
packages/enclave-contracts/tasks/enclave.ts (1)

39-54: Consider returning the signer from the connection helper.

The getRegistryConnection helper returns ethers, deployment, and registry, but not the signer. While the registry is already connected with the signer, some callers might need direct access to the signer for other operations. This is a minor observation as current usage doesn't require it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/enclave-contracts/tasks/enclave.ts` around lines 39 - 54, The helper
getRegistryConnection should also return the signer so callers can use it
directly; update the return object in getRegistryConnection to include the
signer (the local variable signer obtained from ethers.getSigners()) alongside
ethers, deployment, and registry, and adjust any callers of
getRegistryConnection if they need to destructure or use signer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 626-638: The should_publish_committee function currently treats
any error from ICiphernodeRegistry::committeePublicKey as "not published";
change it to match on the call() result so only an explicit empty key
(Bytes::default() or equivalent) returns Ok(true), a non-empty key returns
Ok(false), and all other Err(e) are propagated (Err(e.into())); update
references in should_publish_committee to use the matched result of
contract.committeePublicKey(e3_id_u256).call().await and add a short comment
clarifying which error cases are considered real failures versus "not
published".

In `@packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol`:
- Line 192: The doc comment on CiphernodeRegistryOwnable (the line starting "///
`@dev` Permissionless once the committee is finalized. Verification of C5 proof is
done in Enclave.onCommitteePublished.") exceeds 120 chars; split it into two
shorter /// `@dev` comment lines so each is under the 120-character limit (for
example break after "finalized." and continue the rest on the next /// `@dev`
line) while preserving the exact wording and reference to
Enclave.onCommitteePublished.

In `@packages/enclave-contracts/tasks/enclave.ts`:
- Around line 443-455: The mapping over activeNodes uses activeScores[index]
without validating lengths, which can produce undefined scores and break the
sort; update the block around activeNodes/activeScores to first validate that
activeScores.length >= activeNodes.length (or filter/map only entries where
activeScores[index] is defined) before building the objects, and handle
mismatches by either throwing a clear error or assigning a safe default score;
ensure the change touches the logic that constructs activeAggregator (the map +
sort that references activeNodes, activeScores, index) so you never access
undefined scores and logging of activeAggregator.node remains safe.

In `@tests/integration/persist.sh`:
- Around line 74-79: The script assigns ACTIVE_AGG_ADDRESS and ACTIVE_AGG via
wait_for_active_aggregator_address and node_name_for_address but doesn't fail if
node_name_for_address fails, so add an explicit fail-fast guard right after the
ACTIVE_AGG assignment: check the exit status ($?) or test that ACTIVE_AGG is
non-empty/valid (or matches expected node name pattern), and if the resolution
failed, print an error (including ACTIVE_AGG_ADDRESS) and exit 1; do the same
guard for the analogous stop/start pair that uses the resolved node name (the
second instance around the enclave_nodes_start/enclave_nodes_stop usage) so
enclave_nodes_stop/enclave_nodes_start never receive an invalid node name.

---

Outside diff comments:
In `@packages/enclave-contracts/contracts/Enclave.sol`:
- Line 108: The mapping declaration committeeThresholds currently triggers the
solhint named-parameters-mapping rule; update the mapping signature in
Enclave.sol to give the key parameter a name (referencing CommitteeSize and
committeeThresholds) so the key is explicitly named (e.g., size) while keeping
the value name (threshold) and visibility intact; locate the mapping declaration
for committeeThresholds and add the key identifier to satisfy the linter.

---

Nitpick comments:
In `@crates/aggregator/src/decryptionshare_created_buffer.rs`:
- Around line 31-52: flush() currently drains and drops any events that don't
match forwarding criteria, losing them if this node later becomes aggregator
again; change the logic in flush() to only remove and forward matching events
and retain non-forwarded events in self.buffer for potential later promotion:
iterate over a drained list or use a temporary Vec (e.g., retained_events) and
for each event match on event.get_data()
(EnclaveEventData::DecryptionshareCreated, ::CommitteeMemberExpelled,
::E3RequestComplete, ::Shutdown) and call self.dest.do_send(event) only for
forwarded cases, otherwise push the event onto retained_events; after the loop
assign retained_events back to self.buffer (or perform an in-place retain) so
non-forwarded events are not lost; keep using self.expelled_parties and
self.is_aggregator checks as before.

In `@crates/aggregator/src/keyshare_created_filter_buffer.rs`:
- Around line 75-93: The KeyshareCreated match arm is duplicated; consolidate by
checking EnclaveEventData::KeyshareCreated(data) then if let Some(committee) =
&self.committee and committee.contains(&data.node) and
!self.expelled_nodes.contains(&data.node) then dispatch based on
self.is_aggregator: if self.is_aggregator call self.dest.do_send(msg) else call
self.buffer.push(msg); otherwise drop (no action). This replaces the two
overlapping Some(committee) arms and the None branch with a single clear
conditional using self.committee, self.is_aggregator, self.expelled_nodes,
self.dest.do_send, and self.buffer.push.

In `@crates/events/src/enclave_event/ticket_generated.rs`:
- Around line 23-24: Add a replay-compat unit test for the TicketGenerated event
that deserializes an older serialized payload missing the new party_index field
and asserts it defaults cleanly (None) and does not change scheduling behavior;
specifically create a test (e.g., test_replay_compat_ticket_generated) that
loads a byte fixture representing the pre-change TicketGenerated payload,
deserializes it into the TicketGenerated struct, asserts
ticket_generated.party_index.is_none(), and then runs the same scheduling
check/path that would be used during replay to ensure behavior is unchanged.

In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 437-461: The match in handle (handling CommitteeFinalizeRequested)
calls should_finalize_committee and currently logs an error on Err but continues
to attempt finalization; update the Err branch to return early (i.e., after
error log, stop the async task) to avoid hitting finalizeCommittee on preflight
failure, referencing the handle method, CommitteeFinalizeRequested,
should_finalize_committee, and format_evm_error so you locate the match and add
the early return; alternatively, if you intend fail-open behavior, add a clear
comment documenting that decision instead of changing control flow.

In `@deploy/copy-secrets.sh`:
- Line 27: NET_KEYS is being assigned from unquoted variables which can trigger
word-splitting; update the NET_KEYS assignment so each element is quoted (i.e.,
assign NET_KEYS with each of $NETWORK_KEY_CN1, $NETWORK_KEY_CN2,
$NETWORK_KEY_CN3, $NETWORK_KEY_CN4 wrapped in double quotes) to ensure each
value becomes a single array element and avoid SC2206 warnings.

In `@packages/enclave-contracts/tasks/enclave.ts`:
- Around line 39-54: The helper getRegistryConnection should also return the
signer so callers can use it directly; update the return object in
getRegistryConnection to include the signer (the local variable signer obtained
from ethers.getSigners()) alongside ethers, deployment, and registry, and adjust
any callers of getRegistryConnection if they need to destructure or use signer.

In `@tests/integration/base.sh`:
- Around line 77-92: Add a failover integration case that simulates the primary
(party_id 0) timing out and verifies the backup parties (party_id 1..N-M-1) take
over: create a new test block that reuses wait_for_committee_pubkey,
fake_encrypt.sh, pnpm e3-program:publishInput and pnpm e3:publishCiphertext but
before publishing stop or isolate the primary node/process (the one
corresponding to party_id 0) — e.g., kill or block its network — then run the
publish steps and assert waiton "$SCRIPT_DIR/output/output.bin" and
wait_for_plaintext_output 0 "$SCRIPT_DIR/output/plaintext.txt" still succeed;
ensure the test cleans up by restarting/unblocking the primary after the case so
other tests are unaffected.

In `@tests/integration/fns.sh`:
- Around line 101-119: The issue is that lines like "local start_time=$(date
+%s)" combine declaration and command substitution which can mask the command's
return status; change each to declare the local variable first then assign the
timestamp in a separate statement (e.g., in wait_for_committee_pubkey(): use
"local start_time" then "start_time=$(date +%s)" and handle a possible failure
from date if desired), and apply the same pattern to wait_for_plaintext_output
and wait_for_active_aggregator_address so the date command's exit code is not
swallowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9917a1ef-5503-4ddd-9bbb-0a69304c1c01

📥 Commits

Reviewing files that changed from the base of the PR and between a8c3fc8 and 42ec426.

📒 Files selected for processing (70)
  • agent/flow-trace/00_INDEX.md
  • agent/flow-trace/03_E3_REQUEST_AND_COMMITTEE.md
  • agent/flow-trace/04_DKG_AND_COMPUTATION.md
  • agent/flow-trace/05_FAILURE_REFUND_SLASHING.md
  • agent/flow-trace/06_DEACTIVATION_AND_COMPLETION.md
  • crates/aggregator/src/committee_finalizer.rs
  • crates/aggregator/src/decryptionshare_created_buffer.rs
  • crates/aggregator/src/ext.rs
  • crates/aggregator/src/keyshare_created_filter_buffer.rs
  • crates/aggregator/src/lib.rs
  • crates/aggregator/src/threshold_plaintext_aggregator.rs
  • crates/ciphernode-builder/src/ciphernode_builder.rs
  • crates/ciphernode-entrypoint.sh
  • crates/cli/src/start.rs
  • crates/config/src/app_config.rs
  • crates/entrypoint/src/start/aggregator_start.rs
  • crates/entrypoint/src/start/mod.rs
  • crates/entrypoint/src/start/start.rs
  • crates/events/src/committee.rs
  • crates/events/src/enclave_event/aggregator_changed.rs
  • crates/events/src/enclave_event/mod.rs
  • crates/events/src/enclave_event/ticket_generated.rs
  • crates/evm/src/ciphernode_registry_sol.rs
  • crates/evm/src/enclave_sol_writer.rs
  • crates/sortition/src/ciphernode_selector.rs
  • crates/sortition/src/repo.rs
  • crates/sortition/src/sortition.rs
  • dappnode/README.md
  • dappnode/config.template.yaml
  • dappnode/docker-compose.yml
  • dappnode/entrypoint.sh
  • dappnode/setup-wizard.yml
  • deploy/cn2.yaml
  • deploy/cn3.yaml
  • deploy/cn4.yaml
  • deploy/copy-secrets.sh
  • deploy/docker-compose.yml
  • deploy/inspect.sh
  • deploy/local/nodes.sh
  • deploy/local/start.sh
  • deploy/swarm_deployment.md
  • docs/pages/ciphernode-operators/running.mdx
  • docs/pages/project-template.mdx
  • examples/CRISP/.env.example
  • examples/CRISP/Readme.md
  • examples/CRISP/enclave.config.yaml
  • examples/CRISP/scripts/dev_cipher.sh
  • examples/CRISP/scripts/setup_testnet.sh
  • package.json
  • packages/enclave-contracts/artifacts/contracts/Enclave.sol/Enclave.json
  • packages/enclave-contracts/artifacts/contracts/interfaces/IBondingRegistry.sol/IBondingRegistry.json
  • packages/enclave-contracts/artifacts/contracts/interfaces/ICiphernodeRegistry.sol/ICiphernodeRegistry.json
  • packages/enclave-contracts/artifacts/contracts/interfaces/IEnclave.sol/IEnclave.json
  • packages/enclave-contracts/artifacts/contracts/interfaces/ISlashingManager.sol/ISlashingManager.json
  • packages/enclave-contracts/artifacts/contracts/registry/CiphernodeRegistryOwnable.sol/CiphernodeRegistryOwnable.json
  • packages/enclave-contracts/contracts/Enclave.sol
  • packages/enclave-contracts/contracts/interfaces/ICiphernodeRegistry.sol
  • packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol
  • packages/enclave-contracts/contracts/test/MockCiphernodeRegistry.sol
  • packages/enclave-contracts/hardhat.config.ts
  • packages/enclave-contracts/package.json
  • packages/enclave-contracts/tasks/enclave.ts
  • packages/enclave-contracts/test/Registry/CiphernodeRegistryOwnable.spec.ts
  • packages/enclave-contracts/test/Slashing/CommitteeExpulsion.spec.ts
  • templates/default/enclave.config.yaml
  • templates/default/scripts/dev_ciphernodes.sh
  • tests/integration/base.sh
  • tests/integration/enclave.config.yaml
  • tests/integration/fns.sh
  • tests/integration/persist.sh
💤 Files with no reviewable changes (14)
  • deploy/cn4.yaml
  • crates/entrypoint/src/start/mod.rs
  • dappnode/config.template.yaml
  • dappnode/setup-wizard.yml
  • deploy/local/nodes.sh
  • templates/default/enclave.config.yaml
  • dappnode/entrypoint.sh
  • dappnode/docker-compose.yml
  • examples/CRISP/enclave.config.yaml
  • templates/default/scripts/dev_ciphernodes.sh
  • examples/CRISP/scripts/dev_cipher.sh
  • tests/integration/enclave.config.yaml
  • crates/config/src/app_config.rs
  • crates/entrypoint/src/start/aggregator_start.rs

Comment thread crates/evm/src/ciphernode_registry_sol.rs
Comment thread packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol Outdated
Comment thread packages/enclave-contracts/tasks/enclave.ts
Comment thread tests/integration/persist.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/evm/src/ciphernode_registry_sol.rs (1)

447-461: Preflight error handling continues despite errors — intentional resilience?

When the preflight check fails with an error (lines 454-459), the code logs the error but continues to attempt finalization. This provides resilience against transient RPC issues but may cause unnecessary failed transactions if the preflight fails due to a persistent problem.

Consider whether the Err case should also return early, or add a comment clarifying that continuing on preflight errors is intentional for resilience:

Suggested clarification
                 Err(err) => {
+                    // Preflight failed, but proceed anyway for resilience against transient RPC issues
                     error!(
                         "Failed to preflight finalizeCommittee: {}",
                         format_evm_error(&err)
                     );
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/evm/src/ciphernode_registry_sol.rs` around lines 447 - 461, The
preflight error branch currently logs the error from should_finalize_committee
(using format_evm_error) but then continues to attempt finalizeCommittee; update
the Err arm to either return early after logging to avoid sending likely-failing
transactions, or explicitly add a clear comment above the match explaining that
continuing on preflight errors is an intentional resilience decision (include
rationale), and ensure the log includes e3_id and sufficient context; edit the
match handling in the async block that calls should_finalize_committee and the
Err arm that formats the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 447-461: The preflight error branch currently logs the error from
should_finalize_committee (using format_evm_error) but then continues to attempt
finalizeCommittee; update the Err arm to either return early after logging to
avoid sending likely-failing transactions, or explicitly add a clear comment
above the match explaining that continuing on preflight errors is an intentional
resilience decision (include rationale), and ensure the log includes e3_id and
sufficient context; edit the match handling in the async block that calls
should_finalize_committee and the Err arm that formats the error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19fccd47-1cbf-417d-ac24-1ae810004e23

📥 Commits

Reviewing files that changed from the base of the PR and between 42ec426 and 384e52f.

📒 Files selected for processing (4)
  • crates/evm/src/ciphernode_registry_sol.rs
  • packages/enclave-contracts/contracts/registry/CiphernodeRegistryOwnable.sol
  • packages/enclave-contracts/tasks/enclave.ts
  • tests/integration/persist.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/persist.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/evm/src/ciphernode_registry_sol.rs`:
- Around line 447-460: The caller around the finalizeCommittee flow calls
should_finalize_committee(provider, contract_address, e3_id) and treats any Err
as a logged preflight failure but then proceeds to submit finalizeCommittee;
change should_finalize_committee so that when the underlying error is the
ThresholdNotMet condition it returns Ok(false) (i.e., map ThresholdNotMet ->
Ok(false)) instead of Err, so callers (the finalizeCommittee submission code)
will skip sending finalizeCommittee for committees that haven't reached
threshold; update the same behavior for the other identical call site that
checks finalizability before calling finalizeCommittee as well.
- Around line 275-276: The active_aggregators map is only populated from future
AggregatorChanged events, so late/restarted writers start with an empty
active_aggregators and is_active_aggregator_for() wrongly returns false; fix by
hydrating active_aggregators on attach/startup from current on-chain/stateful
information (e.g., query the current aggregator status or scan past
AggregatorChanged/PublicKeyAggregated state) and populate the HashMap<E3id,
bool> before returning from attach(); ensure the same key names
(active_aggregators, attach, is_active_aggregator_for, AggregatorChanged,
PublicKeyAggregated) are updated so existing logic that checks
active_aggregators behaves correctly for already-eligible nodes.

In `@crates/keyshare/src/threshold_keyshare.rs`:
- Around line 580-591: The handler currently drops ThresholdShareCreated when
state.state isn't GeneratingThresholdShare or AggregatingDecryptionKey; update
the guard to also accept KeyshareState::CollectingEncryptionKeys so early
single-publish shares aren't lost—i.e., change the matches(...) to include
KeyshareState::CollectingEncryptionKeys (and adjust the trace message if
desired), ensuring ThresholdShareCreated messages with msg.share.party_id are
processed or queued instead of returned early while state.variant_name() is
CollectingEncryptionKeys.
- Around line 625-633: The handler for EncryptionKeyCreated currently ignores
messages unless state.state is KeyshareState::CollectingEncryptionKeys, which
drops legitimate early arrivals; update the check to also accept the pre-init
window by allowing KeyshareState::Init (or other appropriate pre-collection
variants) and, if in Init, stash the incoming key (e.g., push into a
pending_encryption_keys collection on state) so it will be processed once the
state transitions to KeyshareState::CollectingEncryptionKeys; keep the existing
trace logging (using state.e3_id, state.variant_name(), msg.key.party_id) but
remove the unconditional return for pre-init cases so early keys are preserved
and applied during the collection phase.

In `@crates/net/src/net_sync_manager.rs`:
- Around line 401-404: The DirectRequester retry configuration (the
DirectRequester::builder(...) call that sets
.max_retries(3).retry_timeout(Duration::from_secs(5))) is too aggressive and
makes historical sync brittle; change the behavior to either restore a larger
retry window (e.g., increase max_retries back toward the previous value or
increase retry_timeout) and/or modify the fallback logic in the NetSyncManager's
connection handling (the code that reacts to ConnectionEstablished and the
fallback block currently between the initial request and the
ConnectionEstablished wait) to continue retrying requests against an
already-connected peer instead of only waiting for a new ConnectionEstablished
event; update DirectRequester::builder(...) and the fallback retry loop so
requests keep being retried while the peer remains connected or extend
max_retries/retry_timeout to cover transient stalls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3736614f-62ab-4c50-a884-c443c45502ef

📥 Commits

Reviewing files that changed from the base of the PR and between 51047c1 and 76e92fd.

📒 Files selected for processing (4)
  • crates/evm/src/ciphernode_registry_sol.rs
  • crates/keyshare/src/threshold_keyshare.rs
  • crates/net/src/net_sync_manager.rs
  • tests/integration/persist.sh

Comment thread crates/evm/src/ciphernode_registry_sol.rs
Comment thread crates/evm/src/ciphernode_registry_sol.rs
Comment thread crates/keyshare/src/threshold_keyshare.rs
Comment thread crates/keyshare/src/threshold_keyshare.rs Outdated
Comment thread crates/net/src/net_sync_manager.rs Outdated
Comment thread crates/evm/src/enclave_sol_writer.rs Outdated
Comment thread packages/enclave-contracts/contracts/Enclave.sol Outdated
Comment thread crates/evm/src/ciphernode_registry_sol.rs

@cedoor cedoor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@cedoor cedoor force-pushed the feat/1476-ag-committee branch from 1c1625a to 0a835f1 Compare April 1, 2026 09:24
@ctrlc03 ctrlc03 merged commit f7ba2f2 into main Apr 1, 2026
32 checks passed
@ctrlc03 ctrlc03 deleted the feat/1476-ag-committee branch April 1, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace centralized aggregator with per-E3 aggregator committee

3 participants