Skip to content

skip Content-Length size check when Content-Encoding indicates compression#1883

Open
HrachShah wants to merge 1 commit into
httpie:masterfrom
HrachShah:fix/downloads-content-encoding-skip-size-check
Open

skip Content-Length size check when Content-Encoding indicates compression#1883
HrachShah wants to merge 1 commit into
httpie:masterfrom
HrachShah:fix/downloads-content-encoding-skip-size-check

Conversation

@HrachShah

Copy link
Copy Markdown

When the server returns Content-Encoding: gzip even though the request asks for Accept-Encoding: identity, the underlying requests session still transparently decompresses the body inside iter_content(). Per RFC 9110 §8.6 the Content-Length header still describes the encoded payload size, but Downloader.chunk_downloaded() accumulates the decoded bytes produced by requests. Downloader.interrupted then compared the encoded total_size to the decoded byte count and always flagged the download as incomplete, even when every byte was received successfully.

This shows up in practice as the spurious http: error: Incomplete download: size=5084527; downloaded=42846965 message in #1642, and the existing # FIXME: some servers still might sent Content-Encoding: gzip in httpie/downloads.py was the only acknowledgement of the gap.

The fix is to skip the size tracking whenever the response carries a non-identity Content-Encoding. total_size is set to None (the same fallback used for a missing Content-Length), which already lets interrupted short-circuit and lets the progress reporter fall back to an indeterminate spinner — matching the behaviour of curl and wget for compressed downloads.

Changes:

  • httpie/downloads.py: read Content-Encoding, normalise it (strip + lower), and treat anything other than identity as "size unknown" for the duration of the download.
  • tests/test_downloads.py: new regression tests covering gzip, deflate, br, mixed-case header values, and surrounding whitespace.

Fixes #1642 (and the underlying gap noted by the existing # FIXME comment that referenced issue #423).

…ssion

When a server returns Content-Encoding: gzip despite an Accept-Encoding:
identity request, requests still auto-decompresses the body in
iter_content. Per RFC 9110 §8.6 the Content-Length header reflects the
*encoded* payload size, but Downloader.chunk_downloaded counts the
*decoded* bytes yielded by requests. The Downloader.interrupted check
then compared a compressed size to a decompressed byte count and
erroneously flagged every compressed download as incomplete.

Resolve this by treating any non-identity Content-Encoding the same as a
missing Content-Length: drop total_size so the interrupted check
short-circuits and the progress display falls back to a spinner.
Tests cover gzip, deflate, and br encodings plus a tolerance for casing
and surrounding whitespace in the header value.

@AhmadAL-Quraan AhmadAL-Quraan left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions

Comment thread httpie/downloads.py
# RFC 9110 §8.6. Comparing those two numbers would always mark the
# download as incomplete, so skip the size tracking in that case.
# See <https://github.com/httpie/cli/issues/423>.
content_encoding = (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix assumes that whenever Content-Encoding is non-identity, requests has decompressed the body in iter_content(). That's true for gzip/deflate (built into urllib3), but br (Brotli) only gets decoded if the optional brotli/brotlicffi package is installed. If it isn't, requests passes the body through unmodified, Content-Length does match the byte count, and this change now silently disables truncation detection for a case where the old (broken) comparison would've actually been correct. Could you check whether decompression actually happened — e.g. by comparing len(response.raw._fp.read()) behavior, or checking response.raw.headers after the fact, rather than inferring it purely from the request header? At minimum, worth a code comment noting the assumption and which encodings it holds for in this codebase's supported environment

Comment thread httpie/downloads.py
except (KeyError, ValueError, TypeError):
total_size = None
else:
total_size = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The tests confirm a complete compressed download is no longer flagged as incomplete. Is there a test (or manual check) confirming a genuinely truncated compressed download still gets caught somehow, e.g. via a decompression error bubbling up from requests/urllib3 mid-stream — now that size tracking is fully disabled for these responses? Worth one test for that case so we know truncation detection isn't silently lost altogether for compressed responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: http --download misinterprets Content-Length when Content-Encoding: gzip is set

2 participants