test: balance unit test shards by isolating heavy test (and fix shard index bug) by fglock · Pull Request #563 · fglock/PerlOnJava

fglock · 2026-04-26T15:34:39Z

Summary

Balances the 4 parallel unit-test shards run by make and fixes a related correctness bug that was silently skipping tests.

Background

Profiling the existing 4-shard layout from build/test-results/testUnitShard*/TEST-*.xml:

Shard	Tests	Time
0	49	16.6 s
1	49	16.6 s
2	49	31.8 s
3	48	18.6 s

Shard 2 was nearly 2x slower than the others. Cause: a single test, unit/code_too_large.t, takes ~15.7 s by itself — the next-slowest test is 1.45 s. Pure round-robin sharding cannot balance this; one shard's runtime is dominated by that one file.

Changes

1. Heavy-test isolation (5 shards)

Add a small HEAVY_TESTS list in PerlScriptExecutionTest.java. The last shard runs only those tests; the other shards round-robin the rest. Bumping to 5 shards lets the heavy shard run in parallel with 4 evenly-balanced light shards. Adding more heavy tests later is just a one-line edit.

2. Bug fix: off-by-one in shard index handling

While verifying the new layout I noticed the dedicated heavy shard wasn't actually running code_too_large.t. Root cause is pre-existing:

// Maven surefire.forkNumber is 1-indexed, convert to 0-indexed
if (shardIndex >= 1 && shardIndex <= shardTotal) {
    shardIndex = shardIndex - 1;
}

This conversion fired for any index in [1, shardTotal], including Gradle's already-0-indexed values. With the original 4 shards:

gradle index 0 → stays 0 → runs i % 4 == 0
gradle index 1 → becomes 0 → runs the same tests as shard 0
gradle index 2 → becomes 1 → runs i % 4 == 1
gradle index 3 → becomes 2 → runs i % 4 == 2
nothing runs i % 4 == 3 → about 25% of unit tests were silently skipped

This is consistent with what the surefire XMLs showed: the first 3 test names on shards 0 and 1 were identical. With the new 5-shard layout, the heavy test fell into the never-run bucket, exposing the bug.

Fix: standardize on 0-indexed in both build.gradle and pom.xml; remove the conversion.

Result

Verified by parsing build/test-results/testUnitShard*/TEST-*.xml after the new run:

Shard	Tests	Time	Heaviest
0	49	23.5 s	array.t (1.9s)
1	48	23.9 s	array_autovivification.t (2.3s)
2	48	21.1 s	bare_block_return.t (2.1s)
3	48	24.1 s	autoload.t (1.7s)
4 (heavy)	1	14.9 s	code_too_large.t (14.9s)

Total: 194 unique tests across all shards, 0 duplicates.

Wall time on make: ~32 s → ~22–26 s (≈25% faster), and now actually runs every test.

Test plan

make succeeds and reports all shards green
Verified per-shard timings via XML reports (table above)
Verified 194 unique tests run with 0 duplicates
Verified code_too_large.t runs on shard 4

Generated with Devin

Two related changes to the parallel unit test sharding: 1. Add a 5th shard dedicated to known-heavy tests (currently just unit/code_too_large.t at ~16s). Other shards round-robin the remaining tests. Heavy tests are listed in HEAVY_TESTS in PerlScriptExecutionTest.java and easy to extend. 2. Fix a pre-existing off-by-one bug in shard index handling. The "Maven surefire is 1-indexed" conversion triggered for any shardIndex in [1, shardTotal], which incorrectly caught Gradle's 0-indexed values. Result: shard 1 ran the same tests as shard 0 and the last shard's tests were never run at all (about 25% of unit tests silently skipped, including code_too_large.t after the 5-shard change). Both backends now pass 0-indexed shard.index values; the conversion is removed. Wall-time improvement on `make`: ~32s -> ~22-26s, while also fixing the silent test-skipping correctness bug. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock merged commit 294dfc0 into master Apr 26, 2026
2 checks passed

fglock deleted the feature/balanced-test-shards branch April 26, 2026 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: balance unit test shards by isolating heavy test (and fix shard index bug)#563

test: balance unit test shards by isolating heavy test (and fix shard index bug)#563
fglock merged 1 commit intomasterfrom
feature/balanced-test-shards

fglock commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Apr 26, 2026

Summary

Background

Changes

Result

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant