Summary
In UriModifyingContentModifier.modifyContent(byte[], MediaType), the incoming bytes are decoded using the charset declared on the Content-Type header, but the modified result is written back with String.getBytes() and no
charset argument, which falls back to the JVM's default file.encoding. When those two charsets differ, non-ASCII bytes in the body can end up corrupted in the recorded output.
The recently merged fix in #1034 (for #1033) addressed a very similar decode/encode mismatch in MockMvcRequestConverter, so I wanted to flag that the same shape of issue may also be present here.
Affected code
spring-restdocs-core/src/main/java/org/springframework/restdocs/operation/preprocess/UriModifyingOperationPreprocessor.java, lines 194–205:
@Override
public byte[] modifyContent(byte[] content, @Nullable MediaType contentType) {
String input;
if (contentType != null && contentType.getCharset() != null) {
input = new String(content, contentType.getCharset()); // decode: uses declared charset
}
else {
input = new String(content);
}
return modify(input).getBytes(); // encode: uses JVM default charset
}
A similar content modifier in the same package, PatternReplacingContentModifier, happens to handle this by using a single charset for both directions, which may be useful as a reference:
Charset charset = (contentType != null && contentType.getCharset() != null) ? contentType.getCharset()
: this.fallbackCharset;
String original = new String(content, charset);
...
return builder.toString().getBytes(charset);
Scenario — non-ASCII content with a non-default charset
The issue shows up when a request or response declares a Content-Type charset that differs from the JVM's file.encoding and the body contains non-ASCII characters. Two cases that come to mind:
- A legacy service serving
text/...; charset=ISO-8859-1, documented from a JVM running with file.encoding=UTF-8 (the default since JEP 400 / Java 18).
- A service serving
application/json; charset=UTF-8, documented from a JVM whose default differs (e.g. file.encoding=MS949 on a Korean Windows machine, or Cp1252 on a Western Windows machine).
Here is a small test that reproduces the request side using ISO-8859-1, which makes the asymmetry observable regardless of the host platform:
@Test
void requestContentWithNonAsciiCharactersIsPreservedWhenCharsetIsIso88591() {
this.preprocessor.scheme("https");
String original = "café http://localhost:12345 done";
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.parseMediaType("text/plain;charset=ISO-8859-1"));
OperationRequest request = this.requestFactory.create(URI.create("http://localhost"), HttpMethod.GET,
original.getBytes(StandardCharsets.ISO_8859_1), headers,
Collections.<OperationRequestPart>emptyList());
OperationRequest processed = this.preprocessor.preprocess(request);
String result = new String(processed.getContent(), StandardCharsets.ISO_8859_1);
assertThat(result).isEqualTo("café https://localhost:12345 done");
}
I tried this locally against main (commit 631b6c22). The Gradle test JVM runs with -Dfile.encoding=UTF-8, and the assertion fails as follows:
expected: "café https://localhost:12345 done"
but was: "café https://localhost:12345 done"
For what it's worth, the mechanics seem to line up with the code:
- The byte
0xE9 is decoded as ISO-8859-1, and the Java String ends up with the character é as expected.
modify(...) returns the modified String with é intact.
.getBytes() (no charset) then encodes é using the JVM default — UTF-8 in this environment — producing 0xC3 0xA9 (two bytes).
- Reading those bytes back as ISO-8859-1 yields
é instead of é, so the body emitted by the preprocessor no longer matches the encoding declared in the Content-Type header.
Since UriModifyingContentModifier is used for both requests and responses, the same thing seems to happen on the response side too.
Possible fix
One option would be to use the same charset for both decoding and encoding, so that the bytes that come out continue to match the encoding declared in the Content-Type header. For example:
@Override
public byte[] modifyContent(byte[] content, @Nullable MediaType contentType) {
Charset charset = (contentType != null && contentType.getCharset() != null) ? contentType.getCharset()
: Charset.defaultCharset();
return modify(new String(content, charset)).getBytes(charset);
}
I'm not sure what the preferred fallback would be when no charset is declared — Charset.defaultCharset() would preserve the current behaviour in that branch, but a fixed StandardCharsets.UTF_8 might be more in line with
how REST Docs handles encoding elsewhere. Happy to go with whichever you'd prefer.
Environment
- Spring REST Docs:
main branch, commit 631b6c22
- Java 17, Gradle test JVM running with
-Dfile.encoding=UTF-8
- Reproduced locally via two new tests in
UriModifyingOperationPreprocessorTests covering the request and response paths; both fail against the current code at UriModifyingOperationPreprocessor.java:204.
This may well be the same root cause as #1033, just in a different class. If the analysis looks reasonable, I'd be glad to put together a PR with a fix and regression tests for both the request and response paths — but happy to wait for your thoughts first.
Summary
In
UriModifyingContentModifier.modifyContent(byte[], MediaType), the incoming bytes are decoded using the charset declared on theContent-Typeheader, but the modified result is written back withString.getBytes()and nocharset argument, which falls back to the JVM's default
file.encoding. When those two charsets differ, non-ASCII bytes in the body can end up corrupted in the recorded output.The recently merged fix in #1034 (for #1033) addressed a very similar decode/encode mismatch in
MockMvcRequestConverter, so I wanted to flag that the same shape of issue may also be present here.Affected code
spring-restdocs-core/src/main/java/org/springframework/restdocs/operation/preprocess/UriModifyingOperationPreprocessor.java, lines 194–205:A similar content modifier in the same package,
PatternReplacingContentModifier, happens to handle this by using a single charset for both directions, which may be useful as a reference:Scenario — non-ASCII content with a non-default charset
The issue shows up when a request or response declares a
Content-Typecharset that differs from the JVM'sfile.encodingand the body contains non-ASCII characters. Two cases that come to mind:text/...; charset=ISO-8859-1, documented from a JVM running withfile.encoding=UTF-8(the default since JEP 400 / Java 18).application/json; charset=UTF-8, documented from a JVM whose default differs (e.g.file.encoding=MS949on a Korean Windows machine, orCp1252on a Western Windows machine).Here is a small test that reproduces the request side using ISO-8859-1, which makes the asymmetry observable regardless of the host platform:
I tried this locally against
main(commit631b6c22). The Gradle test JVM runs with-Dfile.encoding=UTF-8, and the assertion fails as follows:For what it's worth, the mechanics seem to line up with the code:
0xE9is decoded as ISO-8859-1, and the JavaStringends up with the characteréas expected.modify(...)returns the modifiedStringwithéintact..getBytes()(no charset) then encodeséusing the JVM default — UTF-8 in this environment — producing0xC3 0xA9(two bytes).éinstead ofé, so the body emitted by the preprocessor no longer matches the encoding declared in theContent-Typeheader.Since
UriModifyingContentModifieris used for both requests and responses, the same thing seems to happen on the response side too.Possible fix
One option would be to use the same charset for both decoding and encoding, so that the bytes that come out continue to match the encoding declared in the
Content-Typeheader. For example:I'm not sure what the preferred fallback would be when no charset is declared —
Charset.defaultCharset()would preserve the current behaviour in that branch, but a fixedStandardCharsets.UTF_8might be more in line withhow REST Docs handles encoding elsewhere. Happy to go with whichever you'd prefer.
Environment
mainbranch, commit631b6c22-Dfile.encoding=UTF-8UriModifyingOperationPreprocessorTestscovering the request and response paths; both fail against the current code atUriModifyingOperationPreprocessor.java:204.This may well be the same root cause as #1033, just in a different class. If the analysis looks reasonable, I'd be glad to put together a PR with a fix and regression tests for both the request and response paths — but happy to wait for your thoughts first.