Skip to content

simple_httpclient: distinguish DNS-resolve from TCP-connect in connect_timeout#3668

Open
HrachShah wants to merge 1 commit into
tornadoweb:masterfrom
HrachShah:fix/connect-timeout-distinguish-dns
Open

simple_httpclient: distinguish DNS-resolve from TCP-connect in connect_timeout#3668
HrachShah wants to merge 1 commit into
tornadoweb:masterfrom
HrachShah:fix/connect-timeout-distinguish-dns

Conversation

@HrachShah

Copy link
Copy Markdown

Fixes #3522

When connect_timeout fires during the connect phase, the raised
HTTPTimeoutError always carries the generic "Timeout while
connecting" message. In practice the connect phase has two distinct
steps — DNS resolution and TCP/TLS handshake — and a slow DNS
server is a much more actionable failure mode than "the host didn't
accept the connection in time". Users hitting connect_timeout had
no way to tell the two apart from the message alone.

This change makes _HTTPConnection.run pre-resolve the hostname
through the TCP client's resolver (so OverrideResolver /
hostname_mapping are honoured) and stores a _connecting_phase on
the connection that tracks which step is in flight. The connect
timeout callback is replaced with a thin _on_connect_timeout that
reads the phase and forwards either "while resolving" or
"while connecting" to the existing _on_timeout path.

When the host is already a numeric IP literal (is_valid_ip is
true), the resolver is skipped entirely so the timeout message
still correctly says "connecting" and the original behaviour for IP
literals is preserved. Happy-Eyeballs parallel address-family
fallback inside TCPClient.connect is not reimplemented here; that
fallback only matters on broken networks where DNS is unlikely to
be the slow leg, so the trade-off is acceptable for the
timeout-reporting use case.

New test test_connect_timeout_reports_resolving_phase uses a
hanging resolver and a non-IP hostname (hostname.invalid) to
exercise the resolving-phase branch. The existing
test_connect_timeout test continues to pass because is_valid_ip
short-circuits the new code path for the IP-literal test URL.

Verified with pytest tornado/test/simple_httpclient_test.py:
115 passed, 31 skipped. Without the fix, the new test fails
with 'resolving' not found in 'Timeout while connecting'.

…t_timeout message

Issue tornadoweb#3522 reports that when `connect_timeout` fires during the
connect phase, the raised `HTTPTimeoutError` always carries the
generic "Timeout while connecting" message.  In practice the connect
phase has two distinct steps — DNS resolution and TCP/TLS handshake —
and a slow DNS server is a much more actionable failure mode than
"the host didn't accept the connection in time".  Users who hit
`connect_timeout` had no way to tell the two apart from the message
alone.

This change makes `_HTTPConnection.run` pre-resolve the hostname
through the TCP client's resolver (so `OverrideResolver` /
`hostname_mapping` are honoured) and stores a `_connecting_phase`
on the connection that tracks which step is in flight.  The connect
timeout callback is replaced with a thin `_on_connect_timeout` that
reads the phase and forwards either "while resolving" or
"while connecting" to the existing `_on_timeout` path.

When the host is already a numeric IP literal (`is_valid_ip` is
true), the resolver is skipped entirely so the timeout message still
correctly says "connecting" and the original behaviour for IP
literals is preserved.  Happy-Eyeballs parallel address-family
fallback inside `TCPClient.connect` is not reimplemented here;
that fallback only matters on broken networks where DNS is unlikely
to be the slow leg, so the trade-off is acceptable for the
timeout-reporting use case.

New test `test_connect_timeout_reports_resolving_phase` uses a
hanging resolver and a non-IP hostname (`hostname.invalid`) to
exercise the resolving-phase branch.  The existing
`test_connect_timeout` test continues to pass because `is_valid_ip`
short-circuits the new code path for the IP-literal test URL.
Verified with `pytest tornado/test/simple_httpclient_test.py`:
115 passed, 31 skipped.  Without the fix, the new test fails
with `'resolving' not found in 'Timeout while connecting'`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

simple_httpclient: Improve error reporting on connect timeout

1 participant