Skip to content

fix: restart issues#1467

Merged
hmzakhalid merged 9 commits into
mainfrom
fix/optional-aggregation-serde-serializing
Mar 21, 2026
Merged

fix: restart issues#1467
hmzakhalid merged 9 commits into
mainfrom
fix/optional-aggregation-serde-serializing

Conversation

@ctrlc03

@ctrlc03 ctrlc03 commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator

fix issues on restart for the aggregator:

  • [ag] invalid value: integer 722558768, expected variant index 0 <= i < 13 (deserialization bug)
  • not reading events with proofs which are too big for read limit on restart

Summary by CodeRabbit

  • Bug Fixes

    • Improved retry error messages and returned errors when attempts exhaust or on non-retryable failures.
    • Added buffering and extra debug/info logs when querying and delivering stored events; info logs now include counts and timestamps for network-sourced events.
    • Emit a warning when an event’s store is missing and routing falls back to the default.
  • Chores

    • Two fields now follow standard serialization/deserialization behavior.
    • Several CI jobs had their conditional execution disabled and one job's runner selection changed.
  • Tests

    • Enabled proof aggregation in an integration test script.

@vercel

vercel Bot commented Mar 21, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Mar 21, 2026 4:16pm
enclave-docs Ready Ready Preview, Comment Mar 21, 2026 4:16pm

Request Review

@coderabbitai

coderabbitai Bot commented Mar 21, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed Serde field attributes from a proof fold state; refactored retry error handling and logging; added buffering and extra logs in event queries and net sync handling; disabled CI job gating conditions; enabled proof aggregation in an integration test script.

Changes

Cohort / File(s) Summary
Serde Attribute Removal
crates/aggregator/src/proof_fold.rs
Removed serde(default) / skip_serializing_if attributes from ProofFoldState fields (total_steps, fold_input_was_empty), reverting to Serde's standard Option/bool behavior.
Retry Error Handling
crates/utils/src/retry.rs
Flattened Err handling to match re { ... }; removed a generic error log; on retry exhaustion logs and returns an anyhow error with the last error; on non-retryable failure logs and returns the error directly. Backoff loop retained.
Eventstore Buffering & Logging
crates/events/src/eventstore.rs
Eagerly buffers raw log items into a Vec before passing to collector; added debug logs for index misses and query result counts.
Event Routing Guard
crates/events/src/eventstore_router.rs
Checks for existence of a store address before dispatching; warns and falls back to AggregateId(0) if none found.
NetSyncManager Filtering & Reporting
crates/net/src/net_sync_manager.rs
Materializes store-returned events to compute total before filter, filters to EventSource::Net, applies take(limit), clones unsequenced events, and logs aggregate id with counts.
CI Job Gating Disabled / Runner Change
.github/workflows/ci.yml
Replaced conditional if: needs.detect_changes.outputs.<flag> == 'true' with if: false for many jobs; changed build_enclave_cli runner to self-hosted group: enclave-ci with labels: [enclave-ci-runner].
Integration Test Flag Change
tests/integration/persist.sh
Enabled proof aggregation by setting --proof-aggregation-enabled to true in pnpm committee:new invocation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ryardley

Poem

🐇 I nudged some attrs from fields tonight,
I tuned retries to log the final fight,
I gathered events into a little stack,
I quieted CI and flipped a flag back —
Hopping off now with a carrot light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'fix: restart issues' is vague and generic, using non-descriptive terms that do not convey meaningful information about the specific changes made. Replace with a more specific title that reflects the actual changes, such as 'fix: remove skip_serializing_if attributes from ProofFoldState' or 'fix: serialization issues on aggregator restart'.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/optional-aggregation-serde-serializing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 122-123: The workflow jobs were incorrectly hard-disabled by
replacing their conditional expressions with "if: false", which breaks dependent
jobs (e.g., zk_prover_e2e ← build_circuits, crisp_e2e ← build_crisp_sdk,
template_integration/test_enclave_init ← build_e3_support_dev) and triggers
actionlint errors; revert each "if: false" back to the original path-based or
expression condition used before (restore the condition for every affected job
occurrence instead of using a constant false), or if a temporary kill switch is
required, change the condition to reference a workflow input or repository
variable (e.g., inputs or env var) so you can toggle runs without committing a
constant false.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6427d4d-0e0d-4e40-abe5-29f70f656821

📥 Commits

Reviewing files that changed from the base of the PR and between ae757d1 and 916132e.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml
  • crates/events/src/eventstore.rs
  • crates/net/src/net_sync_manager.rs
  • tests/integration/persist.sh

Comment thread .github/workflows/ci.yml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)

122-123: ⚠️ Potential issue | 🟠 Major

Please restore the real job gates before merge.

These if: false guards still hard-disable a large slice of CI, and they also cascade into downstream skips for jobs that consume artifacts from build_circuits, build_crisp_sdk, and build_e3_support_dev. Revert them to the needs.detect_changes.outputs.* == 'true' conditions, or put the temporary kill switch behind a workflow input/repo variable instead of a constant.

Also applies to: 177-178, 221-222, 306-307, 350-351, 398-399, 589-590, 665-666, 781-782, 941-942, 1033-1034, 1084-1085, 1161-1162

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 122 - 123, Restore the real CI job
gate conditions in .github/workflows/ci.yml by replacing the temporary "if:
false" guards with the original conditional expressions (e.g.,
needs.detect_changes.outputs.<job_name> == 'true') for each affected job (those
currently using "if: false" around build_circuits, build_crisp_sdk,
build_e3_support_dev and others listed), or alternatively switch the constant to
a workflow input/repo variable and use that variable in the job-level if:
expressions so you can toggle the gate without hard-disabling downstream jobs;
update every occurrence of "if: false" in the workflow to reference the
appropriate needs.detect_changes.outputs.* or the new input/secret variable so
artifact-producing jobs and their consumers are correctly gated instead of
permanently skipped.
🧹 Nitpick comments (1)
crates/events/src/eventstore_router.rs (1)

104-116: Consolidate store selection into a single lookup path.

This does two map lookups for the same key. You can merge warning + fallback into one match to reduce branching and keep the control flow tighter.

♻️ Proposed refactor
-        let has_store = self.stores.contains_key(&aggregate_id);
-        if !has_store {
-            tracing::warn!(
-                "EventStoreRouter: no store for aggregate={}, falling back to 0 (source={:?})",
-                aggregate_id,
-                msg.event.source()
-            );
-        }
-        let store_addr = self.stores.get(&aggregate_id).unwrap_or_else(|| {
-            self.stores
-                .get(&AggregateId::new(0))
-                .expect("Default EventStore for AggregateId(0) not found")
-        });
+        let store_addr = match self.stores.get(&aggregate_id) {
+            Some(addr) => addr,
+            None => {
+                tracing::warn!(
+                    "EventStoreRouter: no store for aggregate={}, falling back to 0 (source={:?})",
+                    aggregate_id,
+                    msg.event.source()
+                );
+                self.stores
+                    .get(&AggregateId::new(0))
+                    .expect("Default EventStore for AggregateId(0) not found")
+            }
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/events/src/eventstore_router.rs` around lines 104 - 116, The current
code performs two map lookups for the same key (aggregate_id) and then falls
back to AggregateId(0); replace this with a single match on
self.stores.get(&aggregate_id) to choose the store: if Some(store) return it, if
None emit the tracing::warn! (using msg.event.source() and aggregate_id) and
then look up the default store via
self.stores.get(&AggregateId::new(0)).expect(...). Update the binding for
store_addr to use that match result so you only call self.stores.get for
aggregate_id once and keep the fallback lookup for the default store.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 546-548: The test_enclave_init job runs on ubuntu-latest while
build_enclave_cli produces the enclave-binary on the enclave-ci self-hosted
runner, causing potential binary incompatibility; update the test_enclave_init
job to use the same runner configuration (runs-on: group: enclave-ci with
labels: [enclave-ci-runner]) so it consumes the artifact on the matching
OS/arch, or alternatively modify build_enclave_cli to also produce an
ubuntu-latest build if test_enclave_init must remain on ubuntu-latest;
additionally, add an actionlint configuration declaring the custom
enclave-ci-runner label to avoid actionlint false positives.

In `@crates/events/src/eventstore.rs`:
- Around line 88-95: The code currently materializes all remaining events with
read_from(seq).collect(), which can OOM for large stores; instead pass the
streaming iterator directly into collect_events to preserve early stop/limit
behavior: replace the Vec materialization (read_from(seq).collect()) with a
boxed iterator created from self.log.read_from(seq) and call
collect_events(Box::new(self.log.read_from(seq)), filter, limit). Update the
tracing::debug log to stop using total (events.len()) since we no longer have a
Vec — log seq and result.len() (or compute a separate safe counter if needed)
and keep use of query_by_ts, read_from, and collect_events as the referenced
symbols.

---

Duplicate comments:
In @.github/workflows/ci.yml:
- Around line 122-123: Restore the real CI job gate conditions in
.github/workflows/ci.yml by replacing the temporary "if: false" guards with the
original conditional expressions (e.g., needs.detect_changes.outputs.<job_name>
== 'true') for each affected job (those currently using "if: false" around
build_circuits, build_crisp_sdk, build_e3_support_dev and others listed), or
alternatively switch the constant to a workflow input/repo variable and use that
variable in the job-level if: expressions so you can toggle the gate without
hard-disabling downstream jobs; update every occurrence of "if: false" in the
workflow to reference the appropriate needs.detect_changes.outputs.* or the new
input/secret variable so artifact-producing jobs and their consumers are
correctly gated instead of permanently skipped.

---

Nitpick comments:
In `@crates/events/src/eventstore_router.rs`:
- Around line 104-116: The current code performs two map lookups for the same
key (aggregate_id) and then falls back to AggregateId(0); replace this with a
single match on self.stores.get(&aggregate_id) to choose the store: if
Some(store) return it, if None emit the tracing::warn! (using msg.event.source()
and aggregate_id) and then look up the default store via
self.stores.get(&AggregateId::new(0)).expect(...). Update the binding for
store_addr to use that match result so you only call self.stores.get for
aggregate_id once and keep the fallback lookup for the default store.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 922b13ea-f377-4ba6-bb65-cce2b2f5515d

📥 Commits

Reviewing files that changed from the base of the PR and between 916132e and 5384883.

📒 Files selected for processing (3)
  • .github/workflows/ci.yml
  • crates/events/src/eventstore.rs
  • crates/events/src/eventstore_router.rs

Comment thread .github/workflows/ci.yml
Comment thread crates/events/src/eventstore.rs
@ctrlc03 ctrlc03 force-pushed the fix/optional-aggregation-serde-serializing branch from 59580c1 to eea0f2e Compare March 21, 2026 16:14
@ctrlc03 ctrlc03 changed the title fix: remove skip_serializing_if fix: restart issue Mar 21, 2026
@ctrlc03 ctrlc03 changed the title fix: restart issue fix: restart issues Mar 21, 2026
@hmzakhalid hmzakhalid merged commit 12e709b into main Mar 21, 2026
50 of 51 checks passed
@ctrlc03 ctrlc03 deleted the fix/optional-aggregation-serde-serializing branch March 28, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants