test: balance unit test shards by isolating heavy test (and fix shard index bug)#563
Merged
test: balance unit test shards by isolating heavy test (and fix shard index bug)#563
Conversation
Two related changes to the parallel unit test sharding: 1. Add a 5th shard dedicated to known-heavy tests (currently just unit/code_too_large.t at ~16s). Other shards round-robin the remaining tests. Heavy tests are listed in HEAVY_TESTS in PerlScriptExecutionTest.java and easy to extend. 2. Fix a pre-existing off-by-one bug in shard index handling. The "Maven surefire is 1-indexed" conversion triggered for any shardIndex in [1, shardTotal], which incorrectly caught Gradle's 0-indexed values. Result: shard 1 ran the same tests as shard 0 and the last shard's tests were never run at all (about 25% of unit tests silently skipped, including code_too_large.t after the 5-shard change). Both backends now pass 0-indexed shard.index values; the conversion is removed. Wall-time improvement on `make`: ~32s -> ~22-26s, while also fixing the silent test-skipping correctness bug. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Balances the 4 parallel unit-test shards run by
makeand fixes a related correctness bug that was silently skipping tests.Background
Profiling the existing 4-shard layout from
build/test-results/testUnitShard*/TEST-*.xml:Shard 2 was nearly 2x slower than the others. Cause: a single test,
unit/code_too_large.t, takes ~15.7 s by itself — the next-slowest test is 1.45 s. Pure round-robin sharding cannot balance this; one shard's runtime is dominated by that one file.Changes
1. Heavy-test isolation (5 shards)
Add a small
HEAVY_TESTSlist inPerlScriptExecutionTest.java. The last shard runs only those tests; the other shards round-robin the rest. Bumping to 5 shards lets the heavy shard run in parallel with 4 evenly-balanced light shards. Adding more heavy tests later is just a one-line edit.2. Bug fix: off-by-one in shard index handling
While verifying the new layout I noticed the dedicated heavy shard wasn't actually running
code_too_large.t. Root cause is pre-existing:This conversion fired for any index in
[1, shardTotal], including Gradle's already-0-indexed values. With the original 4 shards:i % 4 == 0i % 4 == 1i % 4 == 2i % 4 == 3→ about 25% of unit tests were silently skippedThis is consistent with what the surefire XMLs showed: the first 3 test names on shards 0 and 1 were identical. With the new 5-shard layout, the heavy test fell into the never-run bucket, exposing the bug.
Fix: standardize on 0-indexed in both
build.gradleandpom.xml; remove the conversion.Result
Verified by parsing
build/test-results/testUnitShard*/TEST-*.xmlafter the new run:Total: 194 unique tests across all shards, 0 duplicates.
Wall time on
make: ~32 s → ~22–26 s (≈25% faster), and now actually runs every test.Test plan
makesucceeds and reports all shards greencode_too_large.truns on shard 4Generated with Devin