Skip to content

AVRO-4258: [java] remove some allocations from decoder.getBytes() and Utf8 handling #3776

Open
mkeskells wants to merge 9 commits into
apache:mainfrom
mkeskells:AVRO-4258-getBytes
Open

AVRO-4258: [java] remove some allocations from decoder.getBytes() and Utf8 handling #3776
mkeskells wants to merge 9 commits into
apache:mainfrom
mkeskells:AVRO-4258-getBytes

Conversation

@mkeskells

Copy link
Copy Markdown
Contributor

Contribution Checklist

What is the purpose of the change

To reduce some allocations found during profiling for other changes (#3746)

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.
There is some trivial addition to the tests but only to ensure the methods are called

Note - I can't verify the change locally as Arvo doesn't build for me locally

Documentation

  • Does this pull request introduce a new feature? (no)

@github-actions github-actions Bot added the Java Pull Requests for Java binding label May 12, 2026
@mkeskells

Copy link
Copy Markdown
Contributor Author

Hi @opwvhk
I see spotless, and test errors on the run you started (thanks), but they seem to be on files I have not chnaged.
I am travelling today, so can only look on my phone so cant be sure

@opwvhk

opwvhk commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

I've taken the liberty to run spotless.

@mkeskells

Copy link
Copy Markdown
Contributor Author

Hi @opwvhk
feel free to make any chnages you feel appropriate, I am not precious on my PRs!
I cant run spotless before push, as it corrupts the code to a state where it doesnt compile
I am travelling and on hoiday, so I dont have a dev env with me, so happy to take any input on the 3 PRs I have open

@mkeskells

Copy link
Copy Markdown
Contributor Author

I have merged main

mkeskells and others added 6 commits June 1, 2026 12:00
Now colocated with `readBytes(ByteBuffer)` and `skipBytes()`
implementations.
move the implementation to child classes
reset utf8 string
…58-getBytes

# Conflicts:
#	lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java
#	lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java
#	lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
#	lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java
@mkeskells

Copy link
Copy Markdown
Contributor Author

Hi @opwvhk
I did some chnages while offline (much the same, push down and utf fixes), and merged with yours

@Fokko Fokko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mkeskells

mkeskells commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Hi
I don't know the process here, is this able to be merged?
Also if someone can advise me on the best way to progress these issues, I can learn and use that knowledge on the other PRs that I have :-)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Decoder.readBytes() API that returns a raw byte[] to reduce allocations (avoiding intermediate ByteBuffer creation) and updates several call sites (bytes→string promotion, file header metadata parsing, conversions) to use it.

Changes:

  • Add Decoder.readBytes() and implement/override it in core decoder implementations (BinaryDecoder, DirectBinaryDecoder, JsonDecoder, ResolvingDecoder, ValidatingDecoder).
  • Switch internal call sites that immediately materialize byte[] (or promote bytes→string/Utf8) to use the new readBytes() path.
  • Add small tests to ensure the new readBytes() method is exercised for EOF/negative length/VM limit cases.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java Adds new readBytes() API intended to avoid ByteBuffer allocations.
lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java Implements readBytes() returning an exactly-sized byte array.
lang/java/avro/src/main/java/org/apache/avro/io/DirectBinaryDecoder.java Implements readBytes() for direct (non-buffering) decoding.
lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java Implements readBytes() for JSON decoding.
lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java Uses readBytes() for bytes/string resolution and removes some intermediate allocations.
lang/java/avro/src/main/java/org/apache/avro/io/ValidatingDecoder.java Adds validating wrapper implementation of readBytes().
lang/java/avro/src/main/java/org/apache/avro/io/FastReaderBuilder.java Uses new raw-bytes API for bytes→string promotion and Utf8 reuse.
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java Uses raw-bytes API when reader expects byte[].
lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java Reads header metadata values via readBytes() directly into byte[].
lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader12.java Reads metadata values via readBytes() directly into byte[].
lang/java/avro/src/main/java/org/apache/avro/Conversions.java Uses readBytes() directly instead of ByteBuffer.array().
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryDecoder.java Adds basic coverage for readBytes() EOF/negative length/VM limit.
lang/java/avro/src/test/java/org/apache/avro/util/TestUtf8.java Adds light coverage to ensure the new Utf8.set(byte[]) path is exercised.
lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java Adds Utf8.set(byte[]) for reuse without extra wrapper allocations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +132 to +146
/**
* Reads a byte-string written by {@link Encoder#writeBytes}.
* <p>
* This is useful when you want to avoid the creation of a ByteBuffer, and only
* want the byte[], e.g.:
*
* <pre>
* ByteBuffer buffer = decoder.readBytes(null);
* byte[] array = buffer.array();
* </pre>
*
* @throws AvroTypeException If this is a stateful reader and byte-string is not
* the type of the next value to be read
*/
public abstract byte[] readBytes() throws IOException;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of the expectations on up/downgrade, and if this is appropriate. Happy to make a change if i is the right thing to do, but it looks to me as if if that is the case the default implementation should be removed on the next opportunity for a braking change, be that major or minor version, based on the policy of the avro project

Comment thread lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants