perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128
perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128JakeDern wants to merge 19 commits into
Conversation
|
Pretty good improvement - ~42% for the dictionary case and ~20% for delta dictionary cases. Not 100% sure why less improvement on the delta side yet, but I think this is worth it to take on its own and can investigate further later. Perf results from #10122: perf results from this branch: |
|
CC: @alamb and @Rich-T-kid - I think we got pretty good results here! I also tried to clean up a few things here and there where I could like removing some unnecessary parameter drilling. This has the benchmarks from #10122 as well, will rebase once that goes in. |
|
I can take a look at this early next week. |
Rich-T-kid
left a comment
There was a problem hiding this comment.
I think this PR looks mostly fine. it would be nice to include a round trip test similar to ##10097 that validate that nothing is being broken.
left a couple non-blocking comments. Ill try and take a second pass through tommorow if I can get a chance/ if this is still open
| // Collect all dicts that need a dictionary message i.e. ones that were | ||
| // either not in the tracker previously or ones that were but need a | ||
| // replacement or delta. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Is it possible to remove the lint bypass? I know its not tied directly to this PR but it'd be nice to incrementally clean up this file.
| compression_context, | ||
| )?); | ||
| } | ||
| let mut fbb = FlatBufferBuilder::new(); |

Which issue does this PR close?
Rationale for this change
This is a follow on to #10044, applying basically the same optimization for dictionary batches.
This needs to wait for #10122 before merge.
What changes are included in this PR?
Are these changes tested?
Yes, existing unit tests should cover the change.
Are there any user-facing changes?
No.