Skip to content

chore(test-runner): hard-kill wedged jperl with timeout -k#567

Merged
fglock merged 1 commit intomasterfrom
chore/test-runner-timeout-kill-after
Apr 27, 2026
Merged

chore(test-runner): hard-kill wedged jperl with timeout -k#567
fglock merged 1 commit intomasterfrom
chore/test-runner-timeout-kill-after

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 27, 2026

Summary

GNU timeout only sends SIGTERM by default. If jperl/JVM ignores or hangs in response to SIGTERM (e.g. blocked in a JNI call or a stuck shutdown hook), the wrapped process can outlive the configured timeout indefinitely. We observed an op/gv.t worker still running ~10 hours after a 300s timeout.

This patches dev/tools/perl_test_runner.pl so wedged jperl processes are actually killed:

  • Pass -k 10s to both timeout and gtimeout, so SIGKILL follows SIGTERM after a short grace period.
  • Skip timeout detection on Windows ($^O of MSWin32 / cygwin / msys). Windows' built-in timeout.exe is a sleep-with-countdown utility, not GNU timeout, so calling it as timeout 300s ... would corrupt the command. Windows continues to use the existing alarm()-based fallback.
  • Treat exit codes 137 (128+SIGKILL) and 143 (128+SIGTERM) as the timeout status alongside 124, so -k-killed processes are still classified as timeouts rather than generic errors.

Test plan

  • perl -c dev/tools/perl_test_runner.pl — syntax OK
  • make — full unit test suite passes
  • Verified resulting command on macOS: timeout -k 10s 300s ...
  • Linux uses GNU timeout -k; macOS uses timeout -k if homebrew coreutils is present, else falls through to gtimeout -k; Windows falls through to the alarm fallback (unchanged behavior)

Generated with Devin

…ard-killed

GNU `timeout` only sends SIGTERM by default. If jperl/JVM ignores or
hangs in response to SIGTERM (e.g. blocked in a JNI call or a stuck
shutdown hook), the wrapped process can outlive the configured
timeout indefinitely -- we observed an `op/gv.t` worker still running
~10 hours after a 300s timeout.

- Pass `-k 10s` to both `timeout` and `gtimeout` so SIGKILL follows
  SIGTERM after a short grace period.
- Skip detection on Windows ($^O = MSWin32/cygwin/msys); Windows'
  built-in `timeout.exe` is a sleep-with-countdown, not GNU timeout,
  so calling it as `timeout 300s ...` would be wrong. Windows keeps
  using the alarm-based fallback.
- Treat exit codes 137 (128+SIGKILL) and 143 (128+SIGTERM) as
  'timeout' status alongside 124, so `-k`-killed processes are still
  classified as timeouts rather than generic errors.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit b0e4cbf into master Apr 27, 2026
2 checks passed
@fglock fglock deleted the chore/test-runner-timeout-kill-after branch April 27, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant