feat(worker): add per-request worker_timeout (hard request timeout) by mansurs · Pull Request #2476 · php/frankenphp

mansurs · 2026-06-10T11:43:56Z

What

Adds an experimental worker_timeout worker option: a hard per-request timeout for worker mode — the worker-mode equivalent of PHP-FPM's request_terminate_timeout. When a worker request runs longer than the timeout, FrankenPHP aborts it with a fatal:

Worker request timeout of N second(s) exceeded

and the worker script restarts cleanly to serve the next request. No userland code is required.

Configurable per worker:

frankenphp {
    worker {
        file /path/to/worker.php
        worker_timeout 30s
    }
}

…and via the Go API: WithWorkerTimeout(30 * time.Second). Defaults to 0 (disabled).

Why this is more than `max_execution_time`

max_execution_time does not count time spent inside a blocking call — so a worker stuck on a slow SELECT SLEEP(30), a hung Redis/Elasticsearch/HTTP read, or a black-holed connect() holds its thread until the call returns on its own. Worse, a signal/EINTR alone cannot abort such a call: PHP retries EINTR, and mysqlnd even removes its socket from EG(regular_list), so it isn't reachable via PHP's resource list. (Verified: even PHP's own max_execution_time can't stop a SELECT SLEEP(30).)

How it works

A time.AfterFunc watchdog is armed per request (epoch-guarded, cancelled on finish). On fire it:

Sets a per-thread pending flag + EG(vm_interrupt) (reusing the existing force-kill slot — no new signal path), so a custom zend_interrupt_function raises the fatal at the next opcode boundary.
On Linux, inspects what the thread is parked in via /proc/self/task/<tid>/syscall and shuts down the socket(s) involved so a retried blocking read fails terminally. Only sockets are aborted this way (a read blocked on a file or pipe is left alone):
- read / recvfrom / recvmsg / connect → the fd is the syscall's first argument;
- poll / ppoll → the struct pollfd array is read from the process's own address space with process_vm_readv(2) (PHP's stream layer — and the Redis/HTTP/DB clients built on it — always polls before reading). Both syscalls are matched: glibc/musl implement poll() via the dedicated poll syscall on arches that have one (amd64, 386, arm) and via ppoll only where they don't (arm64, riscv64, loong64);
- epoll_wait / epoll_pwait → watched fds are enumerated from /proc/self/fdinfo/<epfd> (covers own-loop clients like curl_multi, gRPC).
Wakes EINTR-abortable waits (a long sleep()) via the realtime kill signal.

Safety: every fd is confirmed to be a socket before shutdown, and after recovering a pointer/table-derived fd the thread's syscall is re-read to confirm it is still parked there on the same argument — so a stale pointer or a reused fd cannot close an unrelated descriptor. The /proc and process_vm_readv reads are same-process, read-only, need no ptrace privilege, and fail closed under a restrictive seccomp policy.

Platform support / limits

Linux: full — including aborting an in-flight blocking socket read (the DB/Redis/HTTP case).
FreeBSD: sleep() and CPU overruns via the realtime signal; the fd-shutdown is Linux-only.
macOS / Windows: only the VM-interrupt flag is set — CPU-bound overruns are caught at the next opcode boundary, but a blocking syscall already in progress cannot be unblocked.
Not covered: select-based event loops (rare on Linux, where poll is preferred) and tight CPU loops inside a C extension that swallow EINTR.

Tests

TestWorkerTimeout_* (interrupts slow request, interrupts a blocking socket read, does-not-fire-on-fast, disabled, pool-does-not-cross-signals) — all under -race.
Unit tests for the Linux building blocks: process_vm_readv round-trip, socket-vs-file classification, epoll fdinfo enumeration.
Caddyfile parsing tests for worker_timeout.
No regressions in the existing worker / force-kill suites.

Manual verification

Verified end-to-end on linux/arm64 against MariaDB 11.8 (PDO/mysqlnd):

Query	`worker_timeout`	Result
`SELECT SLEEP(0)`	2s	`200`, ~5 ms
`SELECT SLEEP(30)`	2s	aborts at 2.008 s — `Worker request timeout of 2 second(s) exceeded`
`SELECT SLEEP(0)`	2s	`200`, ~12 ms — worker reconnected and recovered

Docs added in docs/worker.md (and docs/config.md).

Add an experimental `worker_timeout` worker option: a hard per-request timeout for worker mode, the equivalent of PHP-FPM's request_terminate_timeout. When a worker request runs longer than the timeout it is aborted with a "Worker request timeout of N second(s) exceeded" fatal and the worker restarts cleanly for the next request. Unlike max_execution_time, this also covers time spent blocked in an external call. A signal/EINTR alone cannot abort such a call (PHP retries EINTR, and mysqlnd even drops its socket from EG(regular_list)), so on Linux the watchdog inspects what the thread is parked in via /proc/self/task/<tid>/syscall and shuts down the socket(s) involved: - read/recvfrom/recvmsg/connect: fd is the syscall's first argument; - poll/ppoll: the pollfd array is read from the process's own memory with process_vm_readv(2) (PHP's stream layer, and Redis/HTTP/DB clients on it, always poll before reading). Both syscalls are matched: glibc and musl implement poll() via the dedicated poll syscall on arches that have one (e.g. amd64) and via ppoll only elsewhere (e.g. arm64); - epoll_wait/epoll_pwait: watched fds are enumerated from /proc/self/fdinfo/<epfd> (covers curl_multi, gRPC). Every fd is confirmed to be a socket, and after recovering a pointer/table-derived fd the thread's syscall is re-read to confirm it is still parked there before shutdown, so a stale pointer or reused fd cannot close an unrelated descriptor. The watchdog body runs under the same mutex as its cancellation, so a watchdog racing request completion can never interrupt the wrong request. A long sleep() is woken by the realtime kill signal (Linux/FreeBSD). The fatal is raised at the next opcode via a custom zend_interrupt_function (guarded against double installation across embedded Init/Shutdown cycles). On macOS/Windows only the VM-interrupt flag is set (CPU-bound overruns are caught; a blocking syscall already in progress cannot be unblocked). Configurable per worker via the Caddyfile `worker_timeout` directive and the WithWorkerTimeout API; defaults to 0 (disabled). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

AlliBalliBaba · 2026-06-11T22:03:27Z

Just quickly skimming over this, you should probably use go timers, otherwise this ends up being way too complex

mansurs · 2026-06-12T07:58:53Z

Thanks for skimming through! It actually does use a Go timer already: the whole thing is driven by a time.AfterFunc that gets armed per request and cancelled on finish. The C and /proc stuff isn't there to detect the timeout, it's there for the hard part: actually getting the thread back.

A timer alone just can't unblock a thread sitting in a blocking syscall. Cgo calls aren't preemptible and PHP happily retries EINTR, so signals don't help either. The best a pure Go version could do is send the client a 504 and walk away while the thread stays stuck, potentially forever on a black-holed connection. That's the exact pool exhaustion this option is meant to prevent. And once the blocking call finally returns, the script would keep running blind and cause side effects long after the client got its error.

AlliBalliBaba · 2026-06-12T21:44:40Z

Oh so all the platform specific code is just for interrupting a syscall. I was under the impression that the kill signal here would be enough

frankenphp/frankenphp.c

Line 188 in 3f56208

void frankenphp_force_kill_thread(force_kill_slot slot) {

Haven't tried it yet though with something like SELECT SLEEP(30).

There's also a relevant PR in php-src on this topic.

mansurs force-pushed the feat/worker-request-terminate-timeout branch 4 times, most recently from 475c287 to a3b27ef Compare June 11, 2026 13:01

mansurs force-pushed the feat/worker-request-terminate-timeout branch from a3b27ef to 69dba08 Compare June 11, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(worker): add per-request worker_timeout (hard request timeout)#2476

feat(worker): add per-request worker_timeout (hard request timeout)#2476
mansurs wants to merge 1 commit into
php:mainfrom
mansurs:feat/worker-request-terminate-timeout

mansurs commented Jun 10, 2026 •

edited

Loading

Uh oh!

AlliBalliBaba commented Jun 11, 2026

Uh oh!

mansurs commented Jun 12, 2026

Uh oh!

AlliBalliBaba commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mansurs commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why this is more than max_execution_time

How it works

Platform support / limits

Tests

Manual verification

Uh oh!

AlliBalliBaba commented Jun 11, 2026

Uh oh!

mansurs commented Jun 12, 2026

Uh oh!

AlliBalliBaba commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mansurs commented Jun 10, 2026 •

edited

Loading

Why this is more than `max_execution_time`

AlliBalliBaba commented Jun 12, 2026 •

edited

Loading