AVRO-4258: [java] remove some allocations from decoder.getBytes() and Utf8 handling #3776
AVRO-4258: [java] remove some allocations from decoder.getBytes() and Utf8 handling #3776mkeskells wants to merge 9 commits into
Conversation
|
Hi @opwvhk |
|
I've taken the liberty to run spotless. |
|
Hi @opwvhk |
|
I have merged main |
Now colocated with `readBytes(ByteBuffer)` and `skipBytes()` implementations.
…58-getBytes # Conflicts: # lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java # lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java # lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java # lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java
|
Hi @opwvhk |
|
Hi |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Decoder.readBytes() API that returns a raw byte[] to reduce allocations (avoiding intermediate ByteBuffer creation) and updates several call sites (bytes→string promotion, file header metadata parsing, conversions) to use it.
Changes:
- Add
Decoder.readBytes()and implement/override it in core decoder implementations (BinaryDecoder,DirectBinaryDecoder,JsonDecoder,ResolvingDecoder,ValidatingDecoder). - Switch internal call sites that immediately materialize
byte[](or promote bytes→string/Utf8) to use the newreadBytes()path. - Add small tests to ensure the new
readBytes()method is exercised for EOF/negative length/VM limit cases.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java | Adds new readBytes() API intended to avoid ByteBuffer allocations. |
| lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java | Implements readBytes() returning an exactly-sized byte array. |
| lang/java/avro/src/main/java/org/apache/avro/io/DirectBinaryDecoder.java | Implements readBytes() for direct (non-buffering) decoding. |
| lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java | Implements readBytes() for JSON decoding. |
| lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java | Uses readBytes() for bytes/string resolution and removes some intermediate allocations. |
| lang/java/avro/src/main/java/org/apache/avro/io/ValidatingDecoder.java | Adds validating wrapper implementation of readBytes(). |
| lang/java/avro/src/main/java/org/apache/avro/io/FastReaderBuilder.java | Uses new raw-bytes API for bytes→string promotion and Utf8 reuse. |
| lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java | Uses raw-bytes API when reader expects byte[]. |
| lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java | Reads header metadata values via readBytes() directly into byte[]. |
| lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader12.java | Reads metadata values via readBytes() directly into byte[]. |
| lang/java/avro/src/main/java/org/apache/avro/Conversions.java | Uses readBytes() directly instead of ByteBuffer.array(). |
| lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryDecoder.java | Adds basic coverage for readBytes() EOF/negative length/VM limit. |
| lang/java/avro/src/test/java/org/apache/avro/util/TestUtf8.java | Adds light coverage to ensure the new Utf8.set(byte[]) path is exercised. |
| lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java | Adds Utf8.set(byte[]) for reuse without extra wrapper allocations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Reads a byte-string written by {@link Encoder#writeBytes}. | ||
| * <p> | ||
| * This is useful when you want to avoid the creation of a ByteBuffer, and only | ||
| * want the byte[], e.g.: | ||
| * | ||
| * <pre> | ||
| * ByteBuffer buffer = decoder.readBytes(null); | ||
| * byte[] array = buffer.array(); | ||
| * </pre> | ||
| * | ||
| * @throws AvroTypeException If this is a stateful reader and byte-string is not | ||
| * the type of the next value to be read | ||
| */ | ||
| public abstract byte[] readBytes() throws IOException; |
There was a problem hiding this comment.
I am not aware of the expectations on up/downgrade, and if this is appropriate. Happy to make a change if i is the right thing to do, but it looks to me as if if that is the case the default implementation should be removed on the next opportunity for a braking change, be that major or minor version, based on the policy of the avro project
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Contribution Checklist
What is the purpose of the change
To reduce some allocations found during profiling for other changes (#3746)
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
There is some trivial addition to the tests but only to ensure the methods are called
Note - I can't verify the change locally as Arvo doesn't build for me locally
Documentation