perf(arrow): avoid per-element allocation in BFloat16Array::from by LuciferYang · Pull Request #7500 · lance-format/lance

LuciferYang · 2026-06-26T09:24:00Z

What

impl From<Vec<bf16>> for BFloat16Array built its byte buffer with:

let bytes = data.iter().flat_map(|val| {
    let bytes = val.to_bits().to_le_bytes();  // [u8; 2] on the stack
    bytes.to_vec()                            // heap allocation per element
});
buffer.extend(bytes);

to_vec() heap-allocates a 2-byte Vec for every element. On the vector-ingestion path (coerce_float_vector → BFloat16Array::from) that is ~2M tiny allocations for a 4096×512 bf16 batch.

Fix

Write each value's little-endian bytes straight into the pre-sized MutableBuffer:

for val in &data {
    buffer.extend_from_slice(&val.to_bits().to_le_bytes());
}

No per-element allocation; the bytes produced are identical.

Test

Behavior-preserving — the existing test_basics already asserts BFloat16Array::from(values) == BFloat16Array::from_iter_values(values) and checks the decoded values, and test_nulls / test_coerce_float_vector_bfloat16 cover the surrounding paths. All pass. cargo clippy -p lance-arrow --tests -- -D warnings and cargo fmt -- --check are green.

…bf16>) The From<Vec<bf16>> impl built the byte buffer via flat_map + to_vec, heap-allocating a 2-byte Vec for every element (~2M allocations for a 4096x512 bf16 batch on the vector path). Write each value's little-endian bytes straight into the MutableBuffer with extend_from_slice. Behavior is unchanged; test_basics already pins from() == from_iter_values().

codecov · 2026-06-26T10:02:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

LuciferYang · 2026-06-29T05:24:46Z

cc @Xuanwo

wjones127 · 2026-06-29T15:36:32Z

+        // Write each value's little-endian bytes straight into the buffer. Going
+        // through an intermediate `Vec` per element would allocate once per value.
+        for val in &data {
+            buffer.extend_from_slice(&val.to_bits().to_le_bytes());
+        }


suggestion: I wonder if we can do one better and just pass ownership of the buffer directly via Buffer::from_vec. This would require the memory-layout to be the same, but I would think it is.

seems can't go direct in this pr, bf16 doesn't implement ArrowNativeType, so Buffer::from_vec rejects it (arrow-rs registers the trait for f16/f32/f64 among the floats, but not bf16). The root cause sits one layer deeper: Apache Arrow's logical type system has no bf16 primitive at all, which is why Lance falls back to FixedSizeBinary(2) in the first place.

Maybe two ways to bridge it for a true zero-copy path, both larger than this pr's scope:

unsafe transmute Vec<bf16> → Vec<u16> via Vec::from_raw_parts, then Buffer::from_vec. Sound because bf16 is #[repr(transparent)] over u16 — same size and alignment, so the allocator Layout is preserved — but adds a new unsafe block, which feels out of scope for a perf-only change.

Enable the bytemuck feature on half in the workspace and use bytemuck::cast_vec::<bf16, u16>. Safe at the call site; the layouts match statically via repr(transparent), so the runtime size/align check can't fire. Cost is a workspace dependency feature bump.

This pr already removes the per-element allocation, which was the hot-path cost on 4096×512 batches. What's left is one linear pass over ~4 MB of bf16 data — likely a smaller win than the allocation fix this
PR captures.

Personally, I prefer to keep the current PR as-is and pursue one of the options as a follow-up, such as Option 2. WDYT?

github-actions Bot added the performance label Jun 26, 2026

wjones127 approved these changes Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(arrow): avoid per-element allocation in BFloat16Array::from#7500

perf(arrow): avoid per-element allocation in BFloat16Array::from#7500
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:perf/bf16-from-vec-no-per-elem-alloc

LuciferYang commented Jun 26, 2026

Uh oh!

codecov Bot commented Jun 26, 2026

Uh oh!

LuciferYang commented Jun 29, 2026

Uh oh!

wjones127 Jun 29, 2026

Uh oh!

LuciferYang Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LuciferYang commented Jun 26, 2026

What

Fix

Test

Uh oh!

codecov Bot commented Jun 26, 2026

Codecov Report

Uh oh!

LuciferYang commented Jun 29, 2026

Uh oh!

wjones127 Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LuciferYang Jun 30, 2026 •

edited

Loading