Skip to content

fix: bump jest timeout for bundler-touching tests to avoid racing SLOW gRPC#210

Merged
chrisli30 merged 5 commits into
stagingfrom
fix/slow-test-timeout-209
Apr 9, 2026
Merged

fix: bump jest timeout for bundler-touching tests to avoid racing SLOW gRPC#210
chrisli30 merged 5 commits into
stagingfrom
fix/slow-test-timeout-209

Conversation

@chrisli30
Copy link
Copy Markdown
Member

Summary

  • Tests that submit real UserOps via the bundler use TimeoutPresets.SLOW on the gRPC client (~244s worst case) but jest was capped at TIMEOUT_DURATION = 60s. Under parallel load, when the aggregator's FlushStuckUserOps path engaged, jest aborted at 60s while the gRPC call still had ~3 minutes of headroom, producing rotating flakes.
  • Add TIMEOUT_DURATION_SLOW = 300000 (5 min) and apply it to the five files that submit real UserOps. Fast unit-style tests keep the 60s TIMEOUT_DURATION so they fail fast.

Fixes #209

Test plan

  • TEST_ENV=dev yarn test:nodes runs cleanly back-to-back without rotating flakes
  • TEST_ENV=dev yarn test:workflows clean
  • TEST_ENV=dev yarn test:executions clean

🤖 Generated with Claude Code

…W gRPC

Tests that submit real UserOps via the bundler use TimeoutPresets.SLOW
on the gRPC client (120s + 2 retries x 2s delay ~= 244s worst case),
but jest was capped at TIMEOUT_DURATION = 60s. Under parallel load, when
the aggregator's FlushStuckUserOps path engages, jest aborted at 60s
while the gRPC call still had ~3 minutes of headroom, causing rotating
flakes across deploy+trigger tests.

Add TIMEOUT_DURATION_SLOW = 300000 (5 min) and apply it via
jest.setTimeout in the five files that submit real UserOps:
- tests/nodes/contractWrite.test.ts
- tests/nodes/ethTransfer.test.ts
- tests/nodes/loopNode.test.ts
- tests/workflows/triggerWorkflow.test.ts
- tests/executions/runNodeWithInputs.test.ts

Fast unit-style tests continue to use the 60s TIMEOUT_DURATION so they
fail fast.

Fixes #209
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts Jest timeouts for integration tests that submit real UserOps via the bundler so Jest no longer times out before TimeoutPresets.SLOW gRPC calls complete under load (reducing flaky failures).

Changes:

  • Introduces TIMEOUT_DURATION_SLOW (5 minutes) in shared test utilities.
  • Switches bundler-touching test suites to jest.setTimeout(TIMEOUT_DURATION_SLOW) while keeping fast tests on the existing 60s default.
  • Adds clarifying comments in affected test files referencing the flake/root cause.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/workflows/triggerWorkflow.test.ts Uses slow suite timeout for workflow-trigger tests that submit real UserOps.
tests/utils/utils.ts Adds TIMEOUT_DURATION_SLOW constant and documents why it’s needed.
tests/nodes/loopNode.test.ts Uses slow suite timeout for loop node tests that deploy/trigger real UserOps.
tests/nodes/ethTransfer.test.ts Uses slow suite timeout for ETH transfer tests that deploy/trigger real UserOps.
tests/nodes/contractWrite.test.ts Uses slow suite timeout for contract write tests that deploy/trigger real UserOps.
tests/executions/runNodeWithInputs.test.ts Uses slow suite timeout for runNodeWithInputs tests that can submit real UserOps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/utils/utils.ts Outdated
Comment on lines +116 to +120
// For tests that submit real UserOps via the bundler. The gRPC client uses
// TimeoutPresets.SLOW (120s + 2 retries × 2s delay ≈ 244s worst case), so
// jest must allow more than that to avoid racing the gRPC call under load.
// See AvaProtocol/ava-sdk-js#209.
export const TIMEOUT_DURATION_SLOW = 300000; // 5 minutes
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale comment for TIMEOUT_DURATION_SLOW hard-codes a worst-case calculation ("120s + 2 retries × 2s delay ≈ 244s"). Given TimeoutPresets.SLOW is defined in @avaprotocol/types and retry semantics could change, this risks drifting out of sync. Consider deriving TIMEOUT_DURATION_SLOW from TimeoutPresets.SLOW (plus a safety buffer) and/or clarifying the calculation based on the current retry implementation so future changes don’t silently reintroduce timeouts.

Copilot uses AI. Check for mistakes.
Comment on lines 11 to 15
import {
getNextId,
TIMEOUT_DURATION,
TIMEOUT_DURATION_SLOW,
removeCreatedWorkflows,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now sets the suite timeout to TIMEOUT_DURATION_SLOW, but there are still individual tests using an explicit timeout of TIMEOUT_DURATION * 3 (180s). Per-test timeouts override the suite timeout, so those tests can still time out earlier than the gRPC TimeoutPresets.SLOW worst case under load. Consider updating those per-test overrides to use TIMEOUT_DURATION_SLOW (or removing them if the suite timeout is sufficient).

Copilot uses AI. Check for mistakes.
Comment thread tests/nodes/contractWrite.test.ts Outdated
@@ -13,6 +13,7 @@ import {
import {
getNextId,
TIMEOUT_DURATION,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now sets the suite timeout to TIMEOUT_DURATION_SLOW, but at least one test still uses an explicit timeout of TIMEOUT_DURATION * 3 (180s). Since per-test timeouts override the suite timeout, that test can still abort before the gRPC TimeoutPresets.SLOW request finishes under load. Consider switching those per-test overrides to TIMEOUT_DURATION_SLOW (or relying on the suite timeout).

Suggested change
TIMEOUT_DURATION,

Copilot uses AI. Check for mistakes.
- Derive TIMEOUT_DURATION_SLOW from TimeoutPresets.SLOW so retry-config changes stay in sync automatically.
- Replace per-test TIMEOUT_DURATION * 3 (180s) overrides in contractWrite and ethTransfer "consistent response format" tests with TIMEOUT_DURATION_SLOW; per-test overrides beat the suite-wide jest.setTimeout, so they were still racing the gRPC SLOW preset.
- Drop now-unused TIMEOUT_DURATION imports from those two files.
`docker compose exec` (without -T) allocates a pseudo-TTY which converts
\n to \r\n on stdout. The captured API_KEY then has a trailing \r, gets
written to $GITHUB_ENV, and the gRPC client sends a corrupted JWT in the
Authorization header. The aggregator rejects it with "API key is invalid",
causing every authenticated test in tests/core/auth.test.ts to fail.

Add -T so the key is captured cleanly.
The bare `client.withdrawFunds(req)` calls relied on the SDK default
gRPC timeout (~30-60s), which is shorter than the actual paymaster
withdrawal flow takes under load. The SDK was killing the request before
the bundler returned, causing 4 tests in withdraw.test.ts to fail with
"gRPC request timeout after 30000ms for method withdrawFunds".

Pass `{ timeout: TimeoutPresets.SLOW }` (3 minutes) to the four
happy-path call sites:
- Basic Tests: minimal-parameters ETH withdrawal
- Edge Cases: explicit gas refund recipient
- Authentication Tests: request-level auth key
- Response Validation: properly formatted response

Edge-case `rejects.toThrow()` callers are left untouched — those use
intentionally invalid input and should fail fast without the long retry.
withdraw.test.ts performs real on-chain ETH transfers via the bundler,
which (a) drains the dev wallet over time and (b) is the slowest test
in the suite. The 4 failing tests in CI are all in this file, and
they fail with "insufficient balance" rather than logic errors —
the wallet just runs out of funds across CI runs.

Add a separate `test:withdraw` script for manual / local invocation
and exclude withdraw.test.ts from `test:core` so the CI matrix
(which only runs predefined suites) skips it automatically. The prod
workflow also calls `yarn test:core` directly so it inherits the
exclusion.

To run withdraw tests manually: `TEST_ENV=dev yarn test:withdraw`
(make sure the dev smart wallet has enough ETH first).
@chrisli30 chrisli30 merged commit 44f535d into staging Apr 9, 2026
4 checks passed
@chrisli30 chrisli30 deleted the fix/slow-test-timeout-209 branch April 9, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants