fix: silent error path causes integration test to hang#1346
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughCapture of compute requests before spawn and explicit TaskPool panic/spawn error propagation added; pool spawn errors are mapped to ComputeRequestError and published on the bus with early return. Unmatched correlation_id arrivals in zk-prover now log errors. CI runner targets changed to self-hosted; various net logs reduced to debug. Changes
Sequence Diagram(s)sequenceDiagram
participant Requester
participant MultithreadActor as Multithread
participant TaskPool
participant Bus
Requester->>Multithread: Submit ComputeRequest
Multithread->>Multithread: request_snapshot = clone(request)
Multithread->>TaskPool: spawn(task with request_snapshot)
alt spawn OK
TaskPool->>TaskPool: execute task (catch_unwind)
alt task panics
TaskPool->>Multithread: return Err(Panic(msg))
Multithread->>Bus: publish ComputeRequestError(kind, msg)
Multithread-->>Requester: return Ok(())
else task completes
TaskPool->>Multithread: return Ok(result)
Multithread->>Bus: publish ComputeResponse/result
Multithread-->>Requester: return Ok(())
end
else spawn Err(pool_err)
TaskPool-->>Multithread: Err(pool_err)
Multithread->>Bus: publish ComputeRequestError(mapped_kind, pool_err)
Multithread-->>Requester: return Ok(())
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/multithread/src/multithread.rs`:
- Around line 208-210: The code currently maps any TrBFV pool spawn error to the
hardcoded e3_trbfv::TrBFVError::GenPkShareAndSkSss variant, which is misleading;
update the match in the ComputeRequestKind::TrBFV arm to inspect
request_snapshot.request (the actual TrBFV variant) and map pool errors to the
corresponding TrBFV error variant for each operation (e.g., Convert to
GenEsiSss, CalculateDecryptionKey, CalculateDecryptionShare,
CalculateThresholdDecryption, or GenPkShareAndSkSss) instead of always using
GenPkShareAndSkSss so diagnostics reflect the real failing TrBFV operation.
In `@crates/zk-prover/src/actors/proof_request.rs`:
- Around line 137-140: In handle_pk_generation_response (the branch that looks
up pending_threshold), update the error log message that currently says
"Received PkBfv ComputeResponse..." to the correct wording "Received
PkGeneration ComputeResponse..." so the log matches the handler and pending
request type; modify the error! call that references correlation_id in that
function to use "PkGeneration" instead of "PkBfv".
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
36-36: Verify your self-hosted runner policy for public-repo PRs.Running CI on self-hosted runners is efficient, but for a public repository the
pull_requesttrigger means external contributors' PRs execute on your own infrastructure. If runners are configured to auto-approve workflows from all contributors (not just maintainers), a malicious PR can exfiltrate secrets and modify the runner environment. Additionally, thecontrib-readme-jobruns withpermissions: contents: writeon the self-hosted runner — ensure GITHUB_TOKEN scoping is tightly controlled.Recommended actions:
- Confirm GitHub's "Fork pull request workflows" setting is set to require approval from maintainers (Settings → Actions → Fork pull request workflows from outside collaborators).
- Consider keeping security-sensitive jobs (those with
secrets.GITHUB_TOKENwrite access) isolated or guarded with anif: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repositorycondition.Also applies to: 79-79, 122-122, 151-151, 188-188, 228-228, 263-263, 276-276, 341-341, 403-403, 434-434, 507-507, 615-615, 641-641, 670-670, 719-719, 763-763, 823-823, 860-860
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml at line 36, The workflow currently runs jobs on self-hosted runners (lines showing "runs-on: self-hosted") including the job named contrib-readme-job which has permissions: contents: write and may run for pull_request events; update those job definitions to guard against untrusted external PRs by adding an if condition such as verifying the event is not a forked PR (e.g., if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository) and tighten permissions/GITHUB_TOKEN scope (remove write-level access where not strictly needed or use minimal permissions) so self-hosted runners do not execute privileged steps for external contributors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/ci.yml:
- Line 36: The workflow currently runs jobs on self-hosted runners (lines
showing "runs-on: self-hosted") including the job named contrib-readme-job which
has permissions: contents: write and may run for pull_request events; update
those job definitions to guard against untrusted external PRs by adding an if
condition such as verifying the event is not a forked PR (e.g., if:
github.event_name != 'pull_request' ||
github.event.pull_request.head.repo.full_name == github.repository) and tighten
permissions/GITHUB_TOKEN scope (remove write-level access where not strictly
needed or use minimal permissions) so self-hosted runners do not execute
privileged steps for external contributors.
if the rayon thread panics, it causes the
?operator to return early, before publishing either ComputeResponse or ComputeRequestError.Summary by CodeRabbit
Bug Fixes
Chores