feat(worker): add per-request worker_timeout (hard request timeout)#2476
feat(worker): add per-request worker_timeout (hard request timeout)#2476mansurs wants to merge 1 commit into
Conversation
475c287 to
a3b27ef
Compare
Add an experimental `worker_timeout` worker option: a hard per-request timeout for worker mode, the equivalent of PHP-FPM's request_terminate_timeout. When a worker request runs longer than the timeout it is aborted with a "Worker request timeout of N second(s) exceeded" fatal and the worker restarts cleanly for the next request. Unlike max_execution_time, this also covers time spent blocked in an external call. A signal/EINTR alone cannot abort such a call (PHP retries EINTR, and mysqlnd even drops its socket from EG(regular_list)), so on Linux the watchdog inspects what the thread is parked in via /proc/self/task/<tid>/syscall and shuts down the socket(s) involved: - read/recvfrom/recvmsg/connect: fd is the syscall's first argument; - poll/ppoll: the pollfd array is read from the process's own memory with process_vm_readv(2) (PHP's stream layer, and Redis/HTTP/DB clients on it, always poll before reading). Both syscalls are matched: glibc and musl implement poll() via the dedicated poll syscall on arches that have one (e.g. amd64) and via ppoll only elsewhere (e.g. arm64); - epoll_wait/epoll_pwait: watched fds are enumerated from /proc/self/fdinfo/<epfd> (covers curl_multi, gRPC). Every fd is confirmed to be a socket, and after recovering a pointer/table-derived fd the thread's syscall is re-read to confirm it is still parked there before shutdown, so a stale pointer or reused fd cannot close an unrelated descriptor. The watchdog body runs under the same mutex as its cancellation, so a watchdog racing request completion can never interrupt the wrong request. A long sleep() is woken by the realtime kill signal (Linux/FreeBSD). The fatal is raised at the next opcode via a custom zend_interrupt_function (guarded against double installation across embedded Init/Shutdown cycles). On macOS/Windows only the VM-interrupt flag is set (CPU-bound overruns are caught; a blocking syscall already in progress cannot be unblocked). Configurable per worker via the Caddyfile `worker_timeout` directive and the WithWorkerTimeout API; defaults to 0 (disabled). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
a3b27ef to
69dba08
Compare
|
Just quickly skimming over this, you should probably use go timers, otherwise this ends up being way too complex |
|
Thanks for skimming through! It actually does use a Go timer already: the whole thing is driven by a A timer alone just can't unblock a thread sitting in a blocking syscall. Cgo calls aren't preemptible and PHP happily retries |
What
Adds an experimental
worker_timeoutworker option: a hard per-request timeout for worker mode — the worker-mode equivalent of PHP-FPM'srequest_terminate_timeout. When a worker request runs longer than the timeout, FrankenPHP aborts it with a fatal:and the worker script restarts cleanly to serve the next request. No userland code is required.
Configurable per worker:
…and via the Go API:
WithWorkerTimeout(30 * time.Second). Defaults to0(disabled).Why this is more than
max_execution_timemax_execution_timedoes not count time spent inside a blocking call — so a worker stuck on a slowSELECT SLEEP(30), a hung Redis/Elasticsearch/HTTP read, or a black-holedconnect()holds its thread until the call returns on its own. Worse, a signal/EINTRalone cannot abort such a call: PHP retriesEINTR, and mysqlnd even removes its socket fromEG(regular_list), so it isn't reachable via PHP's resource list. (Verified: even PHP's ownmax_execution_timecan't stop aSELECT SLEEP(30).)How it works
A
time.AfterFuncwatchdog is armed per request (epoch-guarded, cancelled on finish). On fire it:EG(vm_interrupt)(reusing the existing force-kill slot — no new signal path), so a customzend_interrupt_functionraises the fatal at the next opcode boundary./proc/self/task/<tid>/syscalland shuts down the socket(s) involved so a retried blocking read fails terminally. Only sockets are aborted this way (a read blocked on a file or pipe is left alone):read/recvfrom/recvmsg/connect→ the fd is the syscall's first argument;poll/ppoll→ thestruct pollfdarray is read from the process's own address space withprocess_vm_readv(2)(PHP's stream layer — and the Redis/HTTP/DB clients built on it — always polls before reading). Both syscalls are matched: glibc/musl implementpoll()via the dedicatedpollsyscall on arches that have one (amd64, 386, arm) and viappollonly where they don't (arm64, riscv64, loong64);epoll_wait/epoll_pwait→ watched fds are enumerated from/proc/self/fdinfo/<epfd>(covers own-loop clients likecurl_multi, gRPC).sleep()) via the realtime kill signal.Safety: every fd is confirmed to be a socket before shutdown, and after recovering a pointer/table-derived fd the thread's syscall is re-read to confirm it is still parked there on the same argument — so a stale pointer or a reused fd cannot close an unrelated descriptor. The
/procandprocess_vm_readvreads are same-process, read-only, need noptraceprivilege, and fail closed under a restrictive seccomp policy.Platform support / limits
sleep()and CPU overruns via the realtime signal; the fd-shutdown is Linux-only.select-based event loops (rare on Linux, wherepollis preferred) and tight CPU loops inside a C extension that swallowEINTR.Tests
TestWorkerTimeout_*(interrupts slow request, interrupts a blocking socket read, does-not-fire-on-fast, disabled, pool-does-not-cross-signals) — all under-race.process_vm_readvround-trip, socket-vs-file classification, epollfdinfoenumeration.worker_timeout.Manual verification
Verified end-to-end on linux/arm64 against MariaDB 11.8 (PDO/mysqlnd):
worker_timeoutSELECT SLEEP(0)200, ~5 msSELECT SLEEP(30)Worker request timeout of 2 second(s) exceededSELECT SLEEP(0)200, ~12 ms — worker reconnected and recoveredDocs added in
docs/worker.md(anddocs/config.md).