skip Content-Length size check when Content-Encoding indicates compression#1883
skip Content-Length size check when Content-Encoding indicates compression#1883HrachShah wants to merge 1 commit into
Conversation
…ssion When a server returns Content-Encoding: gzip despite an Accept-Encoding: identity request, requests still auto-decompresses the body in iter_content. Per RFC 9110 §8.6 the Content-Length header reflects the *encoded* payload size, but Downloader.chunk_downloaded counts the *decoded* bytes yielded by requests. The Downloader.interrupted check then compared a compressed size to a decompressed byte count and erroneously flagged every compressed download as incomplete. Resolve this by treating any non-identity Content-Encoding the same as a missing Content-Length: drop total_size so the interrupted check short-circuits and the progress display falls back to a spinner. Tests cover gzip, deflate, and br encodings plus a tolerance for casing and surrounding whitespace in the header value.
| # RFC 9110 §8.6. Comparing those two numbers would always mark the | ||
| # download as incomplete, so skip the size tracking in that case. | ||
| # See <https://github.com/httpie/cli/issues/423>. | ||
| content_encoding = ( |
There was a problem hiding this comment.
The fix assumes that whenever Content-Encoding is non-identity, requests has decompressed the body in iter_content(). That's true for gzip/deflate (built into urllib3), but br (Brotli) only gets decoded if the optional brotli/brotlicffi package is installed. If it isn't, requests passes the body through unmodified, Content-Length does match the byte count, and this change now silently disables truncation detection for a case where the old (broken) comparison would've actually been correct. Could you check whether decompression actually happened — e.g. by comparing len(response.raw._fp.read()) behavior, or checking response.raw.headers after the fact, rather than inferring it purely from the request header? At minimum, worth a code comment noting the assumption and which encodings it holds for in this codebase's supported environment
| except (KeyError, ValueError, TypeError): | ||
| total_size = None | ||
| else: | ||
| total_size = None |
There was a problem hiding this comment.
Suggestion: The tests confirm a complete compressed download is no longer flagged as incomplete. Is there a test (or manual check) confirming a genuinely truncated compressed download still gets caught somehow, e.g. via a decompression error bubbling up from requests/urllib3 mid-stream — now that size tracking is fully disabled for these responses? Worth one test for that case so we know truncation detection isn't silently lost altogether for compressed responses.
When the server returns
Content-Encoding: gzipeven though the request asks forAccept-Encoding: identity, the underlyingrequestssession still transparently decompresses the body insideiter_content(). Per RFC 9110 §8.6 theContent-Lengthheader still describes the encoded payload size, butDownloader.chunk_downloaded()accumulates the decoded bytes produced byrequests.Downloader.interruptedthen compared the encodedtotal_sizeto the decoded byte count and always flagged the download as incomplete, even when every byte was received successfully.This shows up in practice as the spurious
http: error: Incomplete download: size=5084527; downloaded=42846965message in #1642, and the existing# FIXME: some servers still might sent Content-Encoding: gzipinhttpie/downloads.pywas the only acknowledgement of the gap.The fix is to skip the size tracking whenever the response carries a non-identity
Content-Encoding.total_sizeis set toNone(the same fallback used for a missingContent-Length), which already letsinterruptedshort-circuit and lets the progress reporter fall back to an indeterminate spinner — matching the behaviour ofcurlandwgetfor compressed downloads.Changes:
httpie/downloads.py: readContent-Encoding, normalise it (strip + lower), and treat anything other thanidentityas "size unknown" for the duration of the download.tests/test_downloads.py: new regression tests coveringgzip,deflate,br, mixed-case header values, and surrounding whitespace.Fixes #1642 (and the underlying gap noted by the existing
# FIXMEcomment that referenced issue #423).