Skip to content

Unsupported content encoding to be treated as an error#698

Merged
ok2c merged 1 commit into
apache:masterfrom
ok2c:unsupported_content_encoding_as_error
Aug 5, 2025
Merged

Unsupported content encoding to be treated as an error#698
ok2c merged 1 commit into
apache:masterfrom
ok2c:unsupported_content_encoding_as_error

Conversation

@ok2c

@ok2c ok2c commented Aug 5, 2025

Copy link
Copy Markdown
Member

Treat unsupported content encoding as an error. @arturobernalg Please double-check

@ok2c ok2c requested a review from arturobernalg August 5, 2025 09:02

@arturobernalg arturobernalg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. LGTM @ok2c

@ok2c ok2c merged commit 97e4041 into apache:master Aug 5, 2025
10 checks passed
@ok2c ok2c deleted the unsupported_content_encoding_as_error branch October 26, 2025 17:31
@plesingr-aeb

plesingr-aeb commented Jan 12, 2026

Copy link
Copy Markdown

Hi, I have a problem with this change.
Using HttpClients.custom() with some adaptions (useSystemProperties, setTargetAuthenticationStrategy, setRoutePlanner(new SystemDefaultRoutePlanner(proxySelector)), I have a simple request like

GET something
Accept: text/plain

w/o specifying any accept-encoding. The client appears to fill in some default codecs in this header before it is sent (it was filled in the previous versions, too).
The response from the server is

Content-Encoding: UTF-8
Content-Type: text/plain
foo bar

It worked with the previous versions but now the request fails with Unsupported Content-Encoding: utf-8. Am I doing something wrong?

@ok2c

ok2c commented Jan 12, 2026

Copy link
Copy Markdown
Member Author

It worked with the previous versions but now the request fails with Unsupported Content-Encoding: utf-8. Am I doing something wrong?

@plesingr-aeb You are doing nothing wrong, but the origin server does. UTF-8 is a charset, not a content encoding. The previous versions of HttpClient ignored such violations. As of 5.6 HttpClient support many different content encodings and the handling logic got stricter.

@plesingr-aeb

plesingr-aeb commented Jan 12, 2026

Copy link
Copy Markdown

@ok2c Thanks for the explanation (meanwhile, I also got the issue and a solution as well). But what should I do if I have such a legacy issue and can neither fix it nor use a previous client version (because of some internal update policies)?

@ok2c

ok2c commented Jan 12, 2026

Copy link
Copy Markdown
Member Author

@plesingr-aeb One option is to turn off automatic content decompression. A better solution would be to introduce a custom exec interceptor that drops that silly header before the response gets processed by the standard execution pipeline.

@plesingr-aeb

Copy link
Copy Markdown

@ok2c Sounds good, thanks.

@strangelookingnerd

strangelookingnerd commented Feb 4, 2026

Copy link
Copy Markdown
Contributor

The previous versions of HttpClient ignored such violations. As of 5.6 HttpClient support many different content encodings and the handling logic got stricter.

We also noticed this change in behavior and I wanted to clarify on the reasoning behind that.
The client is now no longer able to ignore "invalid values" he receives. And since the client has no control over that, it just breaks? Is there any harm in not throwing an exception here but gracefully ignore it as before?

@ok2c

ok2c commented Feb 4, 2026

Copy link
Copy Markdown
Member Author

@strangelookingnerd You see, the server is not supposed to apply any content encoding other than those the client explicitly requests with Accept-Encoding header. If the server does so, it is in a direct violation of the protocol. The client may not be able to correctly decode content encoded with encoding it does not understand. As of version 5.6 HttpClient rejects such response messages as invalid because they basically are.

If, for whatever reason, your application needs to be able to accept such responses, you must either disable the automatic content decompression and implement your own content processing logic or, alternatively, you may implement an execution interceptor that re-writes the Content-Encoding header and remove unexpected codecs from the header prior to passing it to the decompression interceptor in case you are sure they are harmless.

@strangelookingnerd

Copy link
Copy Markdown
Contributor

I understand the reasoning but still wanted to express that there might be corner-cases where this new behavior makes it unnecessarily harder for clients.
In https://issues.jenkins.io/browse/JENKINS-76353 it is described that a client sending a request with no Accept-Encoding receives a response with Content-Encoding: none. While I'd agree that this is superfluous, it is not really wrong either.

@ok2c

ok2c commented Feb 4, 2026

Copy link
Copy Markdown
Member Author

@strangelookingnerd Based on my interpretation of the spec it is actually very wrong, unless there is a statement in the specification I have overlooked. Has none been defined as a no-op codec anywhere in RFC 9110 [1]?

[1] https://www.rfc-editor.org/rfc/rfc9110.html

@strangelookingnerd

Copy link
Copy Markdown
Contributor

There is no definition of no-op, yet there is no other definition either.

The "Content-Encoding" header field indicates what content codings have been applied to the representation, beyond those inherent in the media type, and thus what decoding mechanisms have to be applied in order to obtain data in the media type referenced by the Content-Type header field.

One could argue that by above logic none is as much as a valid answer to "what encodings have been applied?" as gzip. At least there is some room for interpretation.

But again, I understand your reasoning. Would it be an option to keep the current behavior as default but make it again a configuration/option to ignore invalid values as it was before this change?

@ok2c

ok2c commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

Would it be an option to keep the current behavior as default but make it again a configuration/option to ignore invalid values as it was before this change?

@strangelookingnerd If Httpclient ignores invalid codec values indiscriminately it risks generating garbage content stream. In your specific context none codecs may be harmless but it may not be in others.

Just drop Content-Encoding: none headers with an interceptor or, better yet, ask the server side developers to stop sending it in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants