Skip to content

CI: Rust tests job takes 26+ minutes; shard it and shrink real-time test windows #335

Description

@hardbyte

Problem

The PR-gating Rust tests job took 26m22s on #334 (run 27250885594), a docs/changelog/version-bump PR. That is too long for the inner loop, and the duration profile says most of it is serialized waiting, not useful work.

Data points from that run:

  • One test binary alone took 180.14s (41 tests, includes test_v031_backfills_queue_storage_failed_done_metric_index).
  • Other binaries: 60.36s (21 tests), 46.23s (18 tests), 32.56s (4 tests).
  • Many individual integration tests trip the harness's "has been running for over 60 seconds" warning and then pass: test_queue_storage_receipt_claims_rescue_after_grace_window, test_queue_storage_receipt_claims_retry_successfully, test_queue_storage_receipt_deadline_rescue_force_closes_expired_claim, test_queue_storage_register_callback_rejects_stale_lease, test_queue_storage_retry_from_dlq_surfaces_unique_conflict, test_queue_storage_runtime_callback_timeout_moves_to_dlq, and more.

A test that needs >60s of wall clock is almost always waiting on a real-time window (grace periods, rescue intervals, rotation cadence, poll intervals) rather than doing 60s of work.

Proposed work

  1. Shard the job. Split Rust tests into a matrix of concurrent CI jobs — per package or per test-binary group — each with its own Postgres service container so DB contention doesn't serialize across shards. Target: worst shard under ~8 minutes.
  2. Adopt cargo-nextest. Per-test parallelism with per-test timing output, which also gives us a durable list of the slowest tests per run instead of one-off log archaeology.
  3. Audit the >60s tests. For each, check whether the configured window (lease grace, deadline, rescue cadence, callback timeout, maintenance tick) can be shrunk to hundreds of milliseconds in the test fixture. These windows are configuration, not contract — the test should pin the behavior, not the production default duration.
  4. Keep an eye on per-binary DB setup. The 180s binary suggests migrations/setup may be re-running per test; worth checking whether schema setup can be done once per binary (or per shard) with per-test schema/database isolation.

Related: test_weight_proportionality flaked on the same run for a different reason (drain-race in the assertion) — fixed on the #334 branch. Worth a quick pass over other finite-job-pool tests for the same assert-after-drain shape while doing item 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    operationalOperational tooling and configuration

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions