feat: handle E3RequestComplete for timeout failures [skip-line-limit]#1590
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe PR refines timeout-driven failure handling in the request router and collector telemetry to distinguish timeout-based failures from misbehavior-based failures. It introduces a ChangesTimeout Failure Handling and Context Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/keyshare/src/actors/threshold_keyshare.rs`:
- Around line 2050-2051: Update the local test assertions that expect
FailureReason::InsufficientCommitteeMembers to instead expect
FailureReason::DKGTimeout where the handlers now emit timeout reasons;
specifically find assertions referencing
FailureReason::InsufficientCommitteeMembers in this file (e.g., inside the tests
around the threshold keyshare handlers) and replace them with
FailureReason::DKGTimeout, and run the test-suite to confirm the
timeout-classification behavior is validated; ensure any related pattern matches
or helper assertion utilities that check the reason are updated to accept
DKGTimeout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b9eaad3a-b427-4afc-ada5-57ec88a7de00
📒 Files selected for processing (5)
agent/flow-trace/00_INDEX.mdagent/flow-trace/05_FAILURE_REFUND_SLASHING.mdcrates/events/src/enclave_event/e3_failed.rscrates/keyshare/src/actors/threshold_keyshare.rscrates/request/src/domain/routing.rs
|
@hmzakhalid ready whenever! |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agent/flow-trace/05_FAILURE_REFUND_SLASHING.md (1)
1071-1072:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument
E3StageChanged(Complete)as a late-terminal event.The code treats both
E3StageChanged(Complete)andE3StageChanged(Failed)as late-terminal events that are silently ignored after context teardown, but the documentation only mentionsE3StageChanged(Failed).📝 Proposed fix
- │ └─ E3StageChanged(Failed) and E3Failed(timeout) arriving after context teardown - │ are silently ignored (expected on-chain lag) + │ └─ E3StageChanged(Complete|Failed) and E3Failed(timeout) arriving after context + │ teardown are silently ignored (expected on-chain lag)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md` around lines 1071 - 1072, Update the documentation to state that both E3StageChanged(Complete) and E3StageChanged(Failed) are treated as late-terminal events that are silently ignored after context teardown; find the section that currently only mentions E3StageChanged(Failed) (the paragraph describing late-terminal events / "silently ignored" behavior) and add E3StageChanged(Complete) alongside E3StageChanged(Failed) so the docs match the actual handling in the code and examples referencing context teardown.
🧹 Nitpick comments (1)
agent/flow-trace/05_FAILURE_REFUND_SLASHING.md (1)
83-83: 💤 Low valueAdd language identifier to fenced code block.
The fenced code block is missing a language specification, which triggers a markdown linter warning (MD040).
📝 Proposed fix
-``` +```text Anyone calls: Interfold.processE3Failure(e3Id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md` at line 83, The fenced code block showing "Anyone calls: Interfold.processE3Failure(e3Id)" lacks a language identifier and triggers MD040; update that fenced block to include a language specifier (e.g., "text" or "bash") so the linter passes—locate the block containing Interfold.processE3Failure(e3Id) in the file and change the opening ``` to ```text (or another appropriate language).Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md`:
- Around line 1071-1072: Update the documentation to state that both
E3StageChanged(Complete) and E3StageChanged(Failed) are treated as late-terminal
events that are silently ignored after context teardown; find the section that
currently only mentions E3StageChanged(Failed) (the paragraph describing
late-terminal events / "silently ignored" behavior) and add
E3StageChanged(Complete) alongside E3StageChanged(Failed) so the docs match the
actual handling in the code and examples referencing context teardown.
---
Nitpick comments:
In `@agent/flow-trace/05_FAILURE_REFUND_SLASHING.md`:
- Line 83: The fenced code block showing "Anyone calls:
Interfold.processE3Failure(e3Id)" lacks a language identifier and triggers
MD040; update that fenced block to include a language specifier (e.g., "text" or
"bash") so the linter passes—locate the block containing
Interfold.processE3Failure(e3Id) in the file and change the opening ``` to
```text (or another appropriate language).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e51ceb28-eada-48d5-a0f9-297a69ab4d49
📒 Files selected for processing (4)
agent/flow-trace/05_FAILURE_REFUND_SLASHING.mdcrates/events/src/interfold_event/e3_failed.rscrates/keyshare/src/actors/threshold_keyshare.rscrates/request/src/domain/routing.rs
💤 Files with no reviewable changes (1)
- crates/events/src/interfold_event/e3_failed.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/keyshare/src/actors/threshold_keyshare.rs
- crates/request/src/domain/routing.rs
The problem was that
E3Routernever publishedE3RequestCompletefor timeout failures, leaving per-E3 contexts alive indefinitely. This means a memory/resource leak for every timed-out E3 request!I added a
FailureReason::is_timeout()to distinguish pure timeout failures (which have no slashing lifecycle) from misbehaviour failures (e.g., slashing). Then, theRequestRouter::routehandles it!Summary by CodeRabbit
Bug Fixes
Improvements