socat CI: run the test suite as parallel shards via parallel-make-check.py#10771
Open
julek-wolfssl wants to merge 6 commits into
Open
socat CI: run the test suite as parallel shards via parallel-make-check.py#10771julek-wolfssl wants to merge 6 commits into
julek-wolfssl wants to merge 6 commits into
Conversation
…ck.py
The socat suite runs ~590 tests sequentially in one job and is
sleep-bound: a few tests sit in fixed waits (INTRANETRIPPER alone sleeps
~140s at -t 1.0) that dominate the ~10 min runtime.
Generalize the parallel runner so any command can ride its worker pool,
not just wolfSSL build configs. Three additive config keys (defaults keep
existing build configs behaving exactly as before):
build:false skip configure/make/check; run only the prepare+run commands
netns:true run each command under 'bwrap --unshare-net' so parallel
network tests can't collide on ports
shards:N fan a config out into N instances, each with $SHARD/$SHARDS
in its env and its own build-<name>-<k> dir; the pool bounds
how many run at once, so N>threads load-balances dynamically
socat.yml uses it: one config (build:false, netns:true, shards:12) runs a
round-robin slice of test.sh per shard, each in its own network namespace
and build-dir copy, with --no-fail-fast so every failure is reported as the
unsharded run did.
|
retest this please |
socat.yml sets shards to 3*nproc (a few per CPU so the pool always has
queued work to balance the slow tests against), and guards each shard's
slice with ${tests:-0} so a shard that draws no test numbers is a no-op
(test 0 matches nothing) instead of letting test.sh fall back to running
the whole suite.
parallel-make-check.py: guard the summary's occupancy/utilization ratios
against a zero wall time (every job a no-op when shards exceed the work)
so it can't divide by zero.
The first CI run failed because each shard's bwrap --unshare-net netns differs from the host namespace the expect_fail lists were calibrated for: - IPv6: the netns is IPv4-only, so ::1 and dual-stack (v6only) tests fail. Re-create IPv6 loopback in each shard (disable_ipv6=0, add ::1, bindv6only=0), best-effort (|| true) so IPv4-only runners still work. - Timing/port flakiness: one thread per CPU oversubscribes during TLS handshakes (each shard runs a server+client socat). Cap --threads at half the CPUs. Also turn on fail-fast (drop --no-fail-fast).
Address a review comment: the <name>-<k> instances from shard fan-out could collide with another config's name and share a build-<name> dir. Validate after fan-out, matching the duplicate-name check in load_configs.
Comment on lines
+60
to
+62
| sparse-checkout: | | ||
| .github/actions | ||
| .github/scripts |
Last run's IPv6 re-creation had no effect (netns stayed IPv4-only). - parallel-make-check.py: add --cap-add CAP_NET_ADMIN to the netns bwrap so a shard can configure its own loopback. - socat.yml: bring lo up, add ::1 plus a non-loopback ULA (so the resolver treats IPv6 as configured) and set bindv6only=0. Drop 2>/dev/null so the setup's success/failure shows in the log.
The IPv6 re-creation failed with "Operation not permitted": Ubuntu 24.04 restricts unprivileged user namespaces via AppArmor, leaving CAP_NET_ADMIN ineffective inside bwrap's netns. Add the same kernel.apparmor_restrict_unprivileged_userns=0 step the other bwrap workflows use, so each shard can configure its netns loopback (::1, dual-stack) and the suite's IPv6 tests run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The socat suite runs ~590 tests sequentially in a single job and is
sleep-bound: a handful of tests sit in fixed waits (INTRANETRIPPER alone sleeps
~140s at
-t 1.0) that dominate the ~10 min runtime.This generalizes the shared parallel runner
(
.github/scripts/parallel-make-check.py) so any command can ride its workerpool, not just wolfSSL build configs, and uses it to shard the socat tests
across a single runner.
parallel-make-check.py— three additive config keysDefaults keep existing build configs behaving exactly as before:
build: false— skip configure/make/check; run only theprepare/runcommands, so an arbitrary command can use the pool.
netns: true— run each command underbwrap --unshare-net(its ownnetwork namespace) so parallel network tests can't collide on ports. Needs
bubblewrap; warns and falls back to the shared namespace if
bwrapismissing.
shards: N— fan a config out into N instances, each with$SHARD(1..N) and
$SHARDS=Nin its env and its ownbuild-<name>-<k>dir. Thepool (
--threads) bounds how many run at once, soN> threadsload-balances dynamically. Composes with the existing
--shardCI split.socat.ymlOne config (
build:false,netns:true,shards:12) runs a round-robin sliceof
test.shper shard (seq $SHARD $SHARDS 999), each in its own networknamespace and its own copy of the build dir (their generated certs/temp files
would otherwise race).
--no-fail-fastruns every shard so all unexpectedfailures are reported, as the unsharded run did. The job timeout drops from 30
to 15 min.