Skip to content

GzipDecompressor: decompress concatenated gzip members instead of dropping them#3654

Open
HrachShah wants to merge 2 commits into
tornadoweb:masterfrom
HrachShah:fix/gzip-concat-members
Open

GzipDecompressor: decompress concatenated gzip members instead of dropping them#3654
HrachShah wants to merge 2 commits into
tornadoweb:masterfrom
HrachShah:fix/gzip-concat-members

Conversation

@HrachShah

Copy link
Copy Markdown

Issue #3560. tornado.util.GzipDecompressor was silently dropping every gzip member after the first one. zlib.decompressobj reports end-of-member via eof and the leftover bytes (which belong to the next member) via unused_data; the previous implementation never inspected either, so a response body that was gzipped in chunks was truncated to the first chunk's worth of data.

decompress now keeps a small _pending buffer, swaps in a fresh inner decompressor when the current member ends, and feeds it any bytes that arrived past the previous member's trailer. When called with no max_length cap the inner loop drains across all consecutive members transparently; when called with a cap, the trailing bytes of the current call are staged in _pending for the next call so the caller can still drive the stream via unconsumed_tail. flush walks every remaining member and concatenates their output into the returned bytes. unconsumed_tail is the sum of _pending and the inner object's unconsumed_tail.

Added GzipDecompressorTest in tornado/test/util_test.py covering: a single member (sanity), three members concatenated in one decompress call, and four members split across many decompress calls (the buffering path).

Zo Bot added 2 commits June 26, 2026 15:38
…pping them

Issue tornadoweb#3560. The previous implementation called
zlib.decompressobj.decompress once per chunk and returned whatever the
inner decompressor produced. When the wrapped HTTP response contains
two or more gzip members in sequence (some servers split a gzipped
response across members for obfuscation or to defeat naive transport
encoders), the inner decompressor reached EOF on the first member and
declared itself complete. Any bytes past the first trailer sat in the
inner unused_data and were never read, so the second member and every
member after it were silently dropped on the floor.

Rework GzipDecompressor so callers that pass max_length=0 (the default,
and the only mode used by HTTP1Connection) transparently advance across
consecutive members: after each member finishes, any leftover bytes in
the inner decompressobj's unused_data are folded back into the input
and a fresh zlib.decompressobj is wired up for the next member. The
public unconsumed_tail property continues to report what a caller
should re-feed on the next call (now backed by an explicit _pending
buffer combined with the inner tail). When max_length is non-zero the
single-call behaviour callers depend on is preserved and unused_data
is staged in _pending for the next call. flush advances across all
remaining members as well.

Added GzipDecompressorTest covering a single member (baseline), two
concatenated members fed in one decompress call, and four members
split across alternating decompress calls to exercise the
between-call buffering path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant