GzipDecompressor: decompress concatenated gzip members instead of dropping them by HrachShah · Pull Request #3654 · tornadoweb/tornado

HrachShah · 2026-06-30T03:49:49Z

Issue #3560. tornado.util.GzipDecompressor was silently dropping every gzip member after the first one. zlib.decompressobj reports end-of-member via eof and the leftover bytes (which belong to the next member) via unused_data; the previous implementation never inspected either, so a response body that was gzipped in chunks was truncated to the first chunk's worth of data.

decompress now keeps a small _pending buffer, swaps in a fresh inner decompressor when the current member ends, and feeds it any bytes that arrived past the previous member's trailer. When called with no max_length cap the inner loop drains across all consecutive members transparently; when called with a cap, the trailing bytes of the current call are staged in _pending for the next call so the caller can still drive the stream via unconsumed_tail. flush walks every remaining member and concatenates their output into the returned bytes. unconsumed_tail is the sum of _pending and the inner object's unconsumed_tail.

Added GzipDecompressorTest in tornado/test/util_test.py covering: a single member (sanity), three members concatenated in one decompress call, and four members split across many decompress calls (the buffering path).

…pping them Issue tornadoweb#3560. The previous implementation called zlib.decompressobj.decompress once per chunk and returned whatever the inner decompressor produced. When the wrapped HTTP response contains two or more gzip members in sequence (some servers split a gzipped response across members for obfuscation or to defeat naive transport encoders), the inner decompressor reached EOF on the first member and declared itself complete. Any bytes past the first trailer sat in the inner unused_data and were never read, so the second member and every member after it were silently dropped on the floor. Rework GzipDecompressor so callers that pass max_length=0 (the default, and the only mode used by HTTP1Connection) transparently advance across consecutive members: after each member finishes, any leftover bytes in the inner decompressobj's unused_data are folded back into the input and a fresh zlib.decompressobj is wired up for the next member. The public unconsumed_tail property continues to report what a caller should re-feed on the next call (now backed by an explicit _pending buffer combined with the inner tail). When max_length is non-zero the single-call behaviour callers depend on is preserved and unused_data is staged in _pending for the next call. flush advances across all remaining members as well. Added GzipDecompressorTest covering a single member (baseline), two concatenated members fed in one decompress call, and four members split across alternating decompress calls to exercise the between-call buffering path.

Zo Bot added 2 commits June 26, 2026 15:38

HTTPHeaders: raise TypeError on non-string keys in set/get/delitem

289f318

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GzipDecompressor: decompress concatenated gzip members instead of dropping them#3654

GzipDecompressor: decompress concatenated gzip members instead of dropping them#3654
HrachShah wants to merge 2 commits into
tornadoweb:masterfrom
HrachShah:fix/gzip-concat-members

HrachShah commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

HrachShah commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant