Use bulk writes instead of per-element writes in vector serialization by r-devulap · Pull Request #681 · datastax/jvector

r-devulap · 2026-06-17T04:12:48Z

Replace inefficient element-by-element write loops with bulk write operations in MemorySegmentVectorProvider:

writeFloatVector: Extract underlying float array and use writeFloats() instead of looping with writeFloat()
writeByteSequence: Extract underlying byte array and use write() instead of looping with writeByte()

With highway as the SIMD backend #668 , the scalar writes show up as a bottleneck when constructing index on a AWS instance x8i.24xlarge (2-socket 96 core Intel GNR) .

Dataset	Index Build Time Before (s)	Index Build Time After (s)
openai-1536-1m	105.59	45.22
openai-3072-1m	164.63	102.75

Replace inefficient element-by-element write loops with bulk write operations in MemorySegmentVectorProvider: - writeFloatVector: Extract underlying float array and use writeFloats() instead of looping with writeFloat() - writeByteSequence: Extract underlying byte array and use write() instead of looping with writeByte()

github-actions · 2026-06-17T04:13:01Z

Before you submit for review:

Does your PR follow guidelines from CONTRIBUTIONS.md?
Did you summarize what this PR does clearly and concisely?
Did you include performance data for changes which may be performance impacting?
Did you include useful docs for any user-facing changes or features?
Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
Did you trigger and review regression testing results against the base branch via Run Bench Main?
Did you adhere to the code formatting guidelines (TBD)
Did you group your changes for easy review, providing meaningful descriptions for each commit?
Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

tlwillke

writeFloatVector is fine. writeByteSequence has a bug. You are ignoring slice offsets. MemorySegmentByteSequence supports slicing, but .heapBase().get() returns the original unsliced array. I have suggested a change that fixes this and added minimal testing.

Once this is addressed and the tests pass, it is good to go.

On performance, while I appreciate the end-to-end benchmarking and consider it necessary, I would also like to see a unit-level microbenchmark isolating writeFloatVector and writeByteSequence to quantify the exact bulk-write speedup.

…eSequence ,heapBase().get() ignores slicing and returns the base of the original ByteArray.

r-devulap · 2026-06-18T04:44:14Z

Had to move MemorySegmentVectorProviderTest to jvector-native module, it fails to run on JAVA 20.

…tVector and writeByteSequence

r-devulap · 2026-06-18T07:59:38Z

Added microbenchmarks. Let me know if that is what you what you were looking for. Here is the comparison with main branch on a x8i.24xlarge (96 core Intel GNR)

| Benchmark        | Length | Mode  | Cnt | Branch Score | Main Score  | Speedup (x) | Units |
|------------------|--------|-------|-----|--------------|-------------|-------------|-------|
| writeByteVector  | 512    | thrpt | 10  | 6029596.287  | 3904220.037 | 1.54x       | ops/s |
| writeByteVector  | 1024   | thrpt | 10  | 3230877.500  | 1924332.336 | 1.68x       | ops/s |
| writeByteVector  | 1536   | thrpt | 10  | 2076349.428  | 1273820.153 | 1.63x       | ops/s |
| writeFloatVector | 512    | thrpt | 10  | 1996524.347  | 1486490.747 | 1.34x       | ops/s |
| writeFloatVector | 1024   | thrpt | 10  | 978881.406   | 520671.373  | 1.88x       | ops/s |
| writeFloatVector | 1536   | thrpt | 10  | 672933.883   | 438508.050  | 1.53x       | ops/s |

tlwillke

Thanks for the fix and adding the microbenchmarks. Looks good.

MarkWolters

Looks fine to me

jshook

Looks good

r-devulap requested review from MarkWolters, ashkrisk, jshook and tlwillke as code owners June 17, 2026 04:12

Adding essential tests.

13bde4a

tlwillke requested changes Jun 17, 2026

View reviewed changes

Comment thread jvector-native/src/main/java/io/github/jbellis/jvector/vector/MemorySegmentVectorProvider.java

tlwillke assigned r-devulap Jun 17, 2026

tlwillke added the performance improvement A contribution that aims to improve library performance, possibly along with functionality. label Jun 17, 2026

Use .asByteBuffer() instead of .heapBase().get() for MemorySegmentByt…

df2aed3

…eSequence ,heapBase().get() ignores slicing and returns the base of the original ByteArray.

r-devulap added 2 commits June 18, 2026 05:41

Move MemorySegmentVectorProviderTest to jvector-native module

cd46c3a

Add license

3c0057b

r-devulap force-pushed the bulk-writes-memorySegment branch from 4d5f59d to 3c0057b Compare June 18, 2026 05:53

Add microbenchmark to measure MemorySegmentVectorProvider's writeFloa…

29e3ef9

…tVector and writeByteSequence

Moved teset to the vector package.

c8ee4bb

tlwillke approved these changes Jun 18, 2026

View reviewed changes

MarkWolters approved these changes Jun 18, 2026

View reviewed changes

jshook approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use bulk writes instead of per-element writes in vector serialization#681

Use bulk writes instead of per-element writes in vector serialization#681
r-devulap wants to merge 7 commits into
mainfrom
bulk-writes-memorySegment

r-devulap commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited by r-devulap

Loading

Uh oh!

tlwillke left a comment

Uh oh!

Uh oh!

r-devulap commented Jun 18, 2026

Uh oh!

r-devulap commented Jun 18, 2026 •

edited

Loading

Uh oh!

tlwillke left a comment

Uh oh!

MarkWolters left a comment

Uh oh!

jshook left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

r-devulap commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited by r-devulap Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlwillke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

r-devulap commented Jun 18, 2026

Uh oh!

r-devulap commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlwillke left a comment

Choose a reason for hiding this comment

Uh oh!

MarkWolters left a comment

Choose a reason for hiding this comment

Uh oh!

jshook left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

r-devulap commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited by r-devulap

Loading

r-devulap commented Jun 18, 2026 •

edited

Loading