Skip to content

chore(cpan-tester): add -k kill-after to timeout so wedged jperl is hard-killed#573

Merged
fglock merged 2 commits intomasterfrom
chore/cpan-tester-timeout-kill-after
Apr 27, 2026
Merged

chore(cpan-tester): add -k kill-after to timeout so wedged jperl is hard-killed#573
fglock merged 2 commits intomasterfrom
chore/cpan-tester-timeout-kill-after

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 27, 2026

Summary

The CPAN random tester (dev/tools/cpan_random_tester.pl) wrapped each jcpan -t invocation with a plain timeout ${secs}s ..., which only sends SIGTERM. A wedged JVM that ignores SIGTERM — or a child suspended via SIGTSTP — survives the timeout and lingers indefinitely, eating ~2GB of RAM and blocking reruns.

This PR mirrors the fix already merged into perl_test_runner.pl (#567):

  • Add -k 10s so the wrapper follows up SIGTERM with SIGKILL after a 10-second grace period.
  • Treat exit codes 124 (SIGTERM-then-exit), 137 (SIGKILL) and 143 (SIGTERM) as timeouts, not just 124.
  • Skip external-timeout detection on Windows ($^O eq 'MSWin32'), since Windows' timeout.exe is a sleep-with-countdown, not GNU coreutils. Falls through to the existing fork+alarm fallback.

Cross-platform behavior:

  • Linux: uses timeout -k 10s ${secs}s (GNU coreutils).
  • macOS: uses timeout -k 10s if coreutils is installed via Homebrew, else gtimeout -k 10s. If neither, falls back to fork + setpgrp + kill(-pid).
  • Windows: skips external timeout, uses the alarm-based fallback.

Also includes a refresh of dev/cpan-reports/ from the background tester (4147 modules tested, 1062 pass, 3085 fail).

Test plan

  • perl -c dev/tools/cpan_random_tester.pl — syntax OK
  • make passes (all unit tests green)

Generated with Devin

fglock and others added 2 commits April 27, 2026 16:05
…ard-killed

cpan_random_tester.pl wrapped each `jcpan -t` invocation with a plain
`timeout ${secs}s ...`, which only sends SIGTERM. A wedged JVM that
ignores SIGTERM (or a child process that has been suspended via
SIGTSTP) survives the timeout and lingers indefinitely, eating RAM
and blocking reruns.

Mirror the fix already in perl_test_runner.pl:

- Add `-k 10s` so the wrapper follows up SIGTERM with SIGKILL after a
  10-second grace period.
- Treat exit codes 124 (SIGTERM-then-exit), 137 (SIGKILL) and 143
  (SIGTERM) as timeouts instead of just 124.
- Skip external-`timeout` detection on Windows (`$^O eq 'MSWin32'`),
  because Windows' `timeout.exe` is a sleep-with-countdown, not GNU
  coreutils. Falls through to the existing fork+alarm fallback.

Cross-platform behavior:
- Linux: uses `timeout -k 10s ${secs}s` (GNU coreutils).
- macOS: uses `timeout -k 10s` if coreutils is installed via Homebrew,
  otherwise `gtimeout -k 10s`. If neither, falls back to
  fork+setpgrp+kill(-pid).
- Windows: skips external timeout, uses the alarm-based fallback.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Modules tested: 4147 (+8). Pass: 1062 (+1). Fail: 3085 (+7).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the chore/cpan-tester-timeout-kill-after branch from c320cf0 to c4889bc Compare April 27, 2026 14:06
@fglock fglock merged commit 5facf9d into master Apr 27, 2026
2 checks passed
@fglock fglock deleted the chore/cpan-tester-timeout-kill-after branch April 27, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant