Skip to content

feat: handle E3RequestComplete for timeout failures [skip-line-limit]#1590

Merged
0xjei merged 4 commits into
mainfrom
feat/e3reqcomtimeout
Jun 11, 2026
Merged

feat: handle E3RequestComplete for timeout failures [skip-line-limit]#1590
0xjei merged 4 commits into
mainfrom
feat/e3reqcomtimeout

Conversation

@0xjei

@0xjei 0xjei commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

The problem was that E3Router never published E3RequestComplete for timeout failures, leaving per-E3 contexts alive indefinitely. This means a memory/resource leak for every timed-out E3 request!

I added a FailureReason::is_timeout() to distinguish pure timeout failures (which have no slashing lifecycle) from misbehaviour failures (e.g., slashing). Then, the RequestRouter::route handles it!

Summary by CodeRabbit

  • Bug Fixes

    • Improved failure reason reporting to distinguish timeout failures (committee formation, DKG, compute, decryption) from other failure types.
    • Fixed handling of late-arriving failure events after request completion to be properly ignored.
  • Improvements

    • Enhanced cleanup signal handling for timeout-related failures.

@0xjei 0xjei requested a review from hmzakhalid June 9, 2026 08:46
@0xjei 0xjei self-assigned this Jun 9, 2026
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Jun 11, 2026 3:48pm
enclave-docs Ready Ready Preview, Comment Jun 11, 2026 3:48pm
enclave-enclave-dashboard Ready Ready Preview, Comment Jun 11, 2026 3:48pm
interfold-dashboard Ready Ready Preview, Comment Jun 11, 2026 3:48pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The PR refines timeout-driven failure handling in the request router and collector telemetry to distinguish timeout-based failures from misbehavior-based failures. It introduces a FailureReason::is_timeout() classifier, updates the router to publish cleanup signals only for timeout failures and silently ignore late terminal events, remaps collector failure reasons to timeout variants, and updates documentation.

Changes

Timeout Failure Handling and Context Cleanup

Layer / File(s) Summary
Failure Reason Timeout Classification
crates/events/src/interfold_event/e3_failed.rs
New is_timeout() method classifies CommitteeFormationTimeout, DKGTimeout, ComputeTimeout, and DecryptionTimeout variants as timeout-driven, enabling distinction from non-timeout failures.
Router Late-Terminal and Timeout Handling
crates/request/src/domain/routing.rs
Router now silently ignores late-arriving E3StageChanged(Complete|Failed) and timeout E3Failed events after request completion, publishes E3RequestComplete only for timeout E3Failed during post-forward decisions, and includes comprehensive test coverage with timeout-specific helper and assertions.
Collector Failure Telemetry Mapping
crates/keyshare/src/actors/threshold_keyshare.rs
Encryption and threshold collectors now emit DKGTimeout, and decryption collector now emits DecryptionTimeout in place of InsufficientCommitteeMembers. Unit tests updated to assert the new failure reason values.
Flow Documentation Updates
agent/flow-trace/05_FAILURE_REFUND_SLASHING.md
Documents that E3RequestComplete is published only for timeout E3Failed events while misbehavior failures skip cleanup to allow slashing to complete, and late terminal events after teardown are silently ignored.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A timeout found its rightful name,
No more confused with committee blame,
The router sighs, the late events fade,
Cleanup signals, properly made!
From DKG to decryption's gleam,
Timeouts now reign supreme. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: handle E3RequestComplete for timeout failures' accurately reflects the main change: distinguishing timeout failures to enable proper E3RequestComplete signaling in the router. It directly addresses the primary objective of fixing the memory leak for timed-out E3 requests.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/e3reqcomtimeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@0xjei 0xjei linked an issue Jun 9, 2026 that may be closed by this pull request

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/keyshare/src/actors/threshold_keyshare.rs`:
- Around line 2050-2051: Update the local test assertions that expect
FailureReason::InsufficientCommitteeMembers to instead expect
FailureReason::DKGTimeout where the handlers now emit timeout reasons;
specifically find assertions referencing
FailureReason::InsufficientCommitteeMembers in this file (e.g., inside the tests
around the threshold keyshare handlers) and replace them with
FailureReason::DKGTimeout, and run the test-suite to confirm the
timeout-classification behavior is validated; ensure any related pattern matches
or helper assertion utilities that check the reason are updated to accept
DKGTimeout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b9eaad3a-b427-4afc-ada5-57ec88a7de00

📥 Commits

Reviewing files that changed from the base of the PR and between 209a130 and b15fa56.

📒 Files selected for processing (5)
  • agent/flow-trace/00_INDEX.md
  • agent/flow-trace/05_FAILURE_REFUND_SLASHING.md
  • crates/events/src/enclave_event/e3_failed.rs
  • crates/keyshare/src/actors/threshold_keyshare.rs
  • crates/request/src/domain/routing.rs

Comment thread crates/keyshare/src/actors/threshold_keyshare.rs
@0xjei

0xjei commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@hmzakhalid ready whenever!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent/flow-trace/05_FAILURE_REFUND_SLASHING.md (1)

1071-1072: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document E3StageChanged(Complete) as a late-terminal event.

The code treats both E3StageChanged(Complete) and E3StageChanged(Failed) as late-terminal events that are silently ignored after context teardown, but the documentation only mentions E3StageChanged(Failed).

📝 Proposed fix
-    │   └─ E3StageChanged(Failed) and E3Failed(timeout) arriving after context teardown
-    │       are silently ignored (expected on-chain lag)
+    │   └─ E3StageChanged(Complete|Failed) and E3Failed(timeout) arriving after context 
+    │       teardown are silently ignored (expected on-chain lag)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md` around lines 1071 - 1072,
Update the documentation to state that both E3StageChanged(Complete) and
E3StageChanged(Failed) are treated as late-terminal events that are silently
ignored after context teardown; find the section that currently only mentions
E3StageChanged(Failed) (the paragraph describing late-terminal events /
"silently ignored" behavior) and add E3StageChanged(Complete) alongside
E3StageChanged(Failed) so the docs match the actual handling in the code and
examples referencing context teardown.
🧹 Nitpick comments (1)
agent/flow-trace/05_FAILURE_REFUND_SLASHING.md (1)

83-83: 💤 Low value

Add language identifier to fenced code block.

The fenced code block is missing a language specification, which triggers a markdown linter warning (MD040).

📝 Proposed fix
-```
+```text
 Anyone calls: Interfold.processE3Failure(e3Id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md` at line 83, The fenced code
block showing "Anyone calls: Interfold.processE3Failure(e3Id)" lacks a language
identifier and triggers MD040; update that fenced block to include a language
specifier (e.g., "text" or "bash") so the linter passes—locate the block
containing Interfold.processE3Failure(e3Id) in the file and change the opening
``` to ```text (or another appropriate language).

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md`:
- Around line 1071-1072: Update the documentation to state that both
E3StageChanged(Complete) and E3StageChanged(Failed) are treated as late-terminal
events that are silently ignored after context teardown; find the section that
currently only mentions E3StageChanged(Failed) (the paragraph describing
late-terminal events / "silently ignored" behavior) and add
E3StageChanged(Complete) alongside E3StageChanged(Failed) so the docs match the
actual handling in the code and examples referencing context teardown.

---

Nitpick comments:
In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md`:
- Line 83: The fenced code block showing "Anyone calls:
Interfold.processE3Failure(e3Id)" lacks a language identifier and triggers
MD040; update that fenced block to include a language specifier (e.g., "text" or
"bash") so the linter passes—locate the block containing
Interfold.processE3Failure(e3Id) in the file and change the opening ``` to
```text (or another appropriate language).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e51ceb28-eada-48d5-a0f9-297a69ab4d49

📥 Commits

Reviewing files that changed from the base of the PR and between 65771fc and c03ff38.

📒 Files selected for processing (4)
  • agent/flow-trace/05_FAILURE_REFUND_SLASHING.md
  • crates/events/src/interfold_event/e3_failed.rs
  • crates/keyshare/src/actors/threshold_keyshare.rs
  • crates/request/src/domain/routing.rs
💤 Files with no reviewable changes (1)
  • crates/events/src/interfold_event/e3_failed.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/keyshare/src/actors/threshold_keyshare.rs
  • crates/request/src/domain/routing.rs

@0xjei 0xjei added this to the PHASE 4: INVITE-ONLY TESTNET milestone Jun 11, 2026

@hmzakhalid hmzakhalid left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet utACK

@0xjei 0xjei merged commit bb67681 into main Jun 11, 2026
33 checks passed
@ctrlc03 ctrlc03 deleted the feat/e3reqcomtimeout branch June 16, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trigger E3RequestCompleted based on timeout

2 participants