process: resolve wait_for_exit Future when child was reaped by another caller#3669
Open
HrachShah wants to merge 3 commits into
Open
process: resolve wait_for_exit Future when child was reaped by another caller#3669HrachShah wants to merge 3 commits into
HrachShah wants to merge 3 commits into
Conversation
added 3 commits
July 2, 2026 13:50
…t_timeout message Issue tornadoweb#3522 reports that when `connect_timeout` fires during the connect phase, the raised `HTTPTimeoutError` always carries the generic "Timeout while connecting" message. In practice the connect phase has two distinct steps — DNS resolution and TCP/TLS handshake — and a slow DNS server is a much more actionable failure mode than "the host didn't accept the connection in time". Users who hit `connect_timeout` had no way to tell the two apart from the message alone. This change makes `_HTTPConnection.run` pre-resolve the hostname through the TCP client's resolver (so `OverrideResolver` / `hostname_mapping` are honoured) and stores a `_connecting_phase` on the connection that tracks which step is in flight. The connect timeout callback is replaced with a thin `_on_connect_timeout` that reads the phase and forwards either "while resolving" or "while connecting" to the existing `_on_timeout` path. When the host is already a numeric IP literal (`is_valid_ip` is true), the resolver is skipped entirely so the timeout message still correctly says "connecting" and the original behaviour for IP literals is preserved. Happy-Eyeballs parallel address-family fallback inside `TCPClient.connect` is not reimplemented here; that fallback only matters on broken networks where DNS is unlikely to be the slow leg, so the trade-off is acceptable for the timeout-reporting use case. New test `test_connect_timeout_reports_resolving_phase` uses a hanging resolver and a non-IP hostname (`hostname.invalid`) to exercise the resolving-phase branch. The existing `test_connect_timeout` test continues to pass because `is_valid_ip` short-circuits the new code path for the IP-literal test URL. Verified with `pytest tornado/test/simple_httpclient_test.py`: 115 passed, 31 skipped. Without the fix, the new test fails with `'resolving' not found in 'Timeout while connecting'`.
…other caller os.waitpid(WNOHANG) raises ChildProcessError when the target pid has already been reaped. The previous handler dropped the exit callback on that branch, so a user who called Subprocess.proc.wait() or .communicate() before our SIGCHLD handler could pick the child up got a wait_for_exit() Future that never resolved (issue tornadoweb#3364). Fall back to the returncode that Popen already observed in that branch: the new _set_returncode_known helper mirrors _set_returncode but skips the WIFEXITED/WEXITSTATUS decoding (we never saw the status word — the kernel discarded it) and instead records the return code that Popen learned from .wait()/.communicate(). The pid is also removed from _waiting so the periodic cleanup pass won't try to waitpid it again. _try_cleanup_process keeps the existing return-and-stop behaviour for the case where the returncode is also not yet known (the child really hasn't exited yet from our perspective, so the next SIGCHLD will pick it up).
…eaping Issue tornadoweb#3364 reproduction. Call Subprocess.proc.wait() to reap the child
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3364.
Subprocess.wait_for_exitreturns a Future that the SIGCHLD handler is supposed to resolve when the child exits. The cleanup path_try_cleanup_processcallsos.waitpid(pid, os.WNOHANG)to harvest the status; on the normal path the handler then schedules_set_returncode, which runs the exit callback and the Future resolves.If the caller has already reaped the child themselves (e.g. by calling
Subprocess.proc.wait()or.communicate()before our SIGCHLD handler can pick the child up),os.waitpidraisesChildProcessError: [Errno 10] No child processes. The previous code dropped the exit callback in that branch and returned, so the Future never resolved andwait_for_exit()blocked forever.This change makes the
ChildProcessErrorbranch fall back to the return code thatsubprocess.Popenalready observed via its own.wait()/.communicate(): the new helper_set_returncode_knownmirrors_set_returncodebut skips theWIFEXITED/WEXITSTATUSdecoding (we never saw the status word — the kernel discarded it when the caller reaped the child). If the Popen object has areturncode, we remove the pid from_waitingand schedule the callback sowait_for_exit()resolves. If the Popen object also has no returncode yet (the child really hasn't exited from our perspective), we keep the existing return-and-wait behavior and the next SIGCHLD will pick the child up.Verified:
PYTHONPATH=. python3 -m pytest tornado/test/process_test.py -v→ 10 passed (9 pre-existing + 1 newtest_wait_for_exit_after_proc_wait).TimeoutError: Operation timed out after 5 secondsfrom the 5-second test harness) and passes with the fix.Subprocess.proc.wait()followed bywait_for_exit(raise_error=False)returns the expected exit code instead of hanging.tornado/test/runtests.pyfailure thatpytestreports is a pre-existing meta-test for the test runner that requires being run viapython -m tornado.test.runtests, not viapytest.PRBODY