Skip to content

test: balance unit test shards by isolating heavy test (and fix shard index bug)#563

Merged
fglock merged 1 commit intomasterfrom
feature/balanced-test-shards
Apr 26, 2026
Merged

test: balance unit test shards by isolating heavy test (and fix shard index bug)#563
fglock merged 1 commit intomasterfrom
feature/balanced-test-shards

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 26, 2026

Summary

Balances the 4 parallel unit-test shards run by make and fixes a related correctness bug that was silently skipping tests.

Background

Profiling the existing 4-shard layout from build/test-results/testUnitShard*/TEST-*.xml:

Shard Tests Time
0 49 16.6 s
1 49 16.6 s
2 49 31.8 s
3 48 18.6 s

Shard 2 was nearly 2x slower than the others. Cause: a single test, unit/code_too_large.t, takes ~15.7 s by itself — the next-slowest test is 1.45 s. Pure round-robin sharding cannot balance this; one shard's runtime is dominated by that one file.

Changes

1. Heavy-test isolation (5 shards)

Add a small HEAVY_TESTS list in PerlScriptExecutionTest.java. The last shard runs only those tests; the other shards round-robin the rest. Bumping to 5 shards lets the heavy shard run in parallel with 4 evenly-balanced light shards. Adding more heavy tests later is just a one-line edit.

2. Bug fix: off-by-one in shard index handling

While verifying the new layout I noticed the dedicated heavy shard wasn't actually running code_too_large.t. Root cause is pre-existing:

// Maven surefire.forkNumber is 1-indexed, convert to 0-indexed
if (shardIndex >= 1 && shardIndex <= shardTotal) {
    shardIndex = shardIndex - 1;
}

This conversion fired for any index in [1, shardTotal], including Gradle's already-0-indexed values. With the original 4 shards:

  • gradle index 0 → stays 0 → runs i % 4 == 0
  • gradle index 1 → becomes 0 → runs the same tests as shard 0
  • gradle index 2 → becomes 1 → runs i % 4 == 1
  • gradle index 3 → becomes 2 → runs i % 4 == 2
  • nothing runs i % 4 == 3about 25% of unit tests were silently skipped

This is consistent with what the surefire XMLs showed: the first 3 test names on shards 0 and 1 were identical. With the new 5-shard layout, the heavy test fell into the never-run bucket, exposing the bug.

Fix: standardize on 0-indexed in both build.gradle and pom.xml; remove the conversion.

Result

Verified by parsing build/test-results/testUnitShard*/TEST-*.xml after the new run:

Shard Tests Time Heaviest
0 49 23.5 s array.t (1.9s)
1 48 23.9 s array_autovivification.t (2.3s)
2 48 21.1 s bare_block_return.t (2.1s)
3 48 24.1 s autoload.t (1.7s)
4 (heavy) 1 14.9 s code_too_large.t (14.9s)

Total: 194 unique tests across all shards, 0 duplicates.

Wall time on make: ~32 s → ~22–26 s (≈25% faster), and now actually runs every test.

Test plan

  • make succeeds and reports all shards green
  • Verified per-shard timings via XML reports (table above)
  • Verified 194 unique tests run with 0 duplicates
  • Verified code_too_large.t runs on shard 4

Generated with Devin

Two related changes to the parallel unit test sharding:

1. Add a 5th shard dedicated to known-heavy tests (currently just
   unit/code_too_large.t at ~16s). Other shards round-robin the
   remaining tests. Heavy tests are listed in HEAVY_TESTS in
   PerlScriptExecutionTest.java and easy to extend.

2. Fix a pre-existing off-by-one bug in shard index handling.
   The "Maven surefire is 1-indexed" conversion triggered for
   any shardIndex in [1, shardTotal], which incorrectly caught
   Gradle's 0-indexed values. Result: shard 1 ran the same tests
   as shard 0 and the last shard's tests were never run at all
   (about 25% of unit tests silently skipped, including
   code_too_large.t after the 5-shard change).

   Both backends now pass 0-indexed shard.index values; the
   conversion is removed.

Wall-time improvement on `make`: ~32s -> ~22-26s, while also
fixing the silent test-skipping correctness bug.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit 294dfc0 into master Apr 26, 2026
2 checks passed
@fglock fglock deleted the feature/balanced-test-shards branch April 26, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant