UriModifyingContentModifier.modifyContent decodes with the request charset but re-encodes with the platform default

 ## Summary                                                                                                                                                                                                                        
                                                                                                                                                                                                                                    
  In `UriModifyingContentModifier.modifyContent(byte[], MediaType)`, the incoming bytes are decoded using the charset declared on the `Content-Type` header, but the modified result is written back with `String.getBytes()` and no
   charset argument, which falls back to the JVM's default `file.encoding`. When those two charsets differ, non-ASCII bytes in the body can end up corrupted in the recorded output.                                                
                                                                                                                                                                                                                                    
  The recently merged fix in #1034 (for #1033) addressed a very similar decode/encode mismatch in `MockMvcRequestConverter`, so I wanted to flag that the same shape of issue may also be present here.

  ## Affected code

  `spring-restdocs-core/src/main/java/org/springframework/restdocs/operation/preprocess/UriModifyingOperationPreprocessor.java`, lines 194–205:

  ```java
  @Override
  public byte[] modifyContent(byte[] content, @Nullable MediaType contentType) {
      String input;
      if (contentType != null && contentType.getCharset() != null) {
          input = new String(content, contentType.getCharset());   // decode: uses declared charset
      }
      else {
          input = new String(content);
      }

      return modify(input).getBytes();   // encode: uses JVM default charset
  }
  ```

  A similar content modifier in the same package, `PatternReplacingContentModifier`, happens to handle this by using a single charset for both directions, which may be useful as a reference:

  ```java
  Charset charset = (contentType != null && contentType.getCharset() != null) ? contentType.getCharset()
          : this.fallbackCharset;
  String original = new String(content, charset);
  ...
  return builder.toString().getBytes(charset);
  ```

  ## Scenario — non-ASCII content with a non-default charset

  The issue shows up when a request or response declares a `Content-Type` charset that differs from the JVM's `file.encoding` and the body contains non-ASCII characters. Two cases that come to mind:

  - A legacy service serving `text/...; charset=ISO-8859-1`, documented from a JVM running with `file.encoding=UTF-8` (the default since [JEP 400](https://openjdk.org/jeps/400) / Java 18).
  - A service serving `application/json; charset=UTF-8`, documented from a JVM whose default differs (e.g. `file.encoding=MS949` on a Korean Windows machine, or `Cp1252` on a Western Windows machine).

  Here is a small test that reproduces the request side using ISO-8859-1, which makes the asymmetry observable regardless of the host platform:

  ```java
  @Test
  void requestContentWithNonAsciiCharactersIsPreservedWhenCharsetIsIso88591() {
      this.preprocessor.scheme("https");
      String original = "café http://localhost:12345 done";
      HttpHeaders headers = new HttpHeaders();
      headers.setContentType(MediaType.parseMediaType("text/plain;charset=ISO-8859-1"));
      OperationRequest request = this.requestFactory.create(URI.create("http://localhost"), HttpMethod.GET,
              original.getBytes(StandardCharsets.ISO_8859_1), headers,
              Collections.<OperationRequestPart>emptyList());
      OperationRequest processed = this.preprocessor.preprocess(request);
      String result = new String(processed.getContent(), StandardCharsets.ISO_8859_1);
      assertThat(result).isEqualTo("café https://localhost:12345 done");
  }
  ```

  I tried this locally against `main` (commit `631b6c22`). The Gradle test JVM runs with `-Dfile.encoding=UTF-8`, and the assertion fails as follows:

  ```
  expected: "café https://localhost:12345 done"
   but was: "cafÃ© https://localhost:12345 done"
  ```

  For what it's worth, the mechanics seem to line up with the code:

  1. The byte `0xE9` is decoded as ISO-8859-1, and the Java `String` ends up with the character `é` as expected.
  2. `modify(...)` returns the modified `String` with `é` intact.
  3. `.getBytes()` (no charset) then encodes `é` using the JVM default — UTF-8 in this environment — producing `0xC3 0xA9` (two bytes).
  4. Reading those bytes back as ISO-8859-1 yields `Ã©` instead of `é`, so the body emitted by the preprocessor no longer matches the encoding declared in the `Content-Type` header.

  Since `UriModifyingContentModifier` is used for both requests and responses, the same thing seems to happen on the response side too.

  ## Possible fix

  One option would be to use the same charset for both decoding and encoding, so that the bytes that come out continue to match the encoding declared in the `Content-Type` header. For example:

  ```java
  @Override
  public byte[] modifyContent(byte[] content, @Nullable MediaType contentType) {
      Charset charset = (contentType != null && contentType.getCharset() != null) ? contentType.getCharset()
              : Charset.defaultCharset();
      return modify(new String(content, charset)).getBytes(charset);
  }
  ```

  I'm not sure what the preferred fallback would be when no charset is declared — `Charset.defaultCharset()` would preserve the current behaviour in that branch, but a fixed `StandardCharsets.UTF_8` might be more in line with
  how REST Docs handles encoding elsewhere. Happy to go with whichever you'd prefer.

  ## Environment

  - Spring REST Docs: `main` branch, commit `631b6c22`
  - Java 17, Gradle test JVM running with `-Dfile.encoding=UTF-8`
  - Reproduced locally via two new tests in `UriModifyingOperationPreprocessorTests` covering the request and response paths; both fail against the current code at `UriModifyingOperationPreprocessor.java:204`.

  ---

  This may well be the same root cause as #1033, just in a different class. If the analysis looks reasonable, I'd be glad to put together a PR with a fix and regression tests for both the request and response paths — but happy to wait for your thoughts first.

  ---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UriModifyingContentModifier.modifyContent decodes with the request charset but re-encodes with the platform default #1039

Summary

Affected code

Scenario — non-ASCII content with a non-default charset

Possible fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

UriModifyingContentModifier.modifyContent decodes with the request charset but re-encodes with the platform default #1039

Description

Summary

Affected code

Scenario — non-ASCII content with a non-default charset

Possible fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions