Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions rust/lance-arrow/src/bfloat16.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,12 +161,12 @@ impl From<Vec<bf16>> for BFloat16Array {
fn from(data: Vec<bf16>) -> Self {
let mut buffer = MutableBuffer::with_capacity(data.len() * 2);

let bytes = data.iter().flat_map(|val| {
let bytes = val.to_bits().to_le_bytes();
bytes.to_vec()
});
// Write each value's little-endian bytes straight into the buffer. Going
// through an intermediate `Vec` per element would allocate once per value.
for val in &data {
buffer.extend_from_slice(&val.to_bits().to_le_bytes());
}
Comment on lines +164 to +168

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I wonder if we can do one better and just pass ownership of the buffer directly via Buffer::from_vec. This would require the memory-layout to be the same, but I would think it is.

@LuciferYang LuciferYang Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems can't go direct in this pr, bf16 doesn't implement ArrowNativeType, so Buffer::from_vec rejects it (arrow-rs registers the trait for f16/f32/f64 among the floats, but not bf16). The root cause sits one layer deeper: Apache Arrow's logical type system has no bf16 primitive at all, which is why Lance falls back to FixedSizeBinary(2) in the first place.

Maybe two ways to bridge it for a true zero-copy path, both larger than this pr's scope:

  1. unsafe transmute Vec<bf16>Vec<u16> via Vec::from_raw_parts, then Buffer::from_vec. Sound because bf16 is #[repr(transparent)] over u16 — same size and alignment, so the allocator Layout is preserved — but adds a new unsafe block, which feels out of scope for a perf-only change.

  2. Enable the bytemuck feature on half in the workspace and use bytemuck::cast_vec::<bf16, u16>. Safe at the call site; the layouts match statically via repr(transparent), so the runtime size/align check can't fire. Cost is a workspace dependency feature bump.

This pr already removes the per-element allocation, which was the hot-path cost on 4096×512 batches. What's left is one linear pass over ~4 MB of bf16 data — likely a smaller win than the allocation fix this
PR captures.

Personally, I prefer to keep the current PR as-is and pursue one of the options as a follow-up, such as Option 2. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to keep as is. I will merge. But either of those sound good as potential follow ups.


buffer.extend(bytes);
let array_data = ArrayData::builder(DataType::FixedSizeBinary(2))
.len(data.len())
.add_buffer(buffer.into());
Expand Down
Loading