feat: reduce temp copy when decompressing lz4 frame#9308
feat: reduce temp copy when decompressing lz4 frame#9308zhaohaidao wants to merge 1 commit intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thanks for this contribution @zhaohaidao
Did you run any benchmarks to measure the performance improvement of this PR?
If you need some new benchmarks, it would be nice to add them as a separate PR so that we can use our automated testing infrastructure to compare
This strongly suggests the streaming FrameDecoder path (state machine + Read trait + buffer resize/zero-init) adds significant overhead beyond core LZ4 block decompression.
If the issue is optimizing the lz4 frame decoding, that might be better done in the lz4 layer 🤔 Have considered contributing this change upstream to the lz4_flex crate?
| [features] | ||
| default = [] | ||
| lz4 = ["lz4_flex"] | ||
| lz4_direct = ["lz4", "twox-hash"] |
There was a problem hiding this comment.
Why is this feature flagged? Is there any reason a user would NOT want this feature?
If we are going to add a new feature it also need to be documented
|
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look |
Thanks for marking it. I'm currently adding benchmarks based on the comments. The hotspot I'm encountering right now might be related to buffer/IO management in the arrow IPC path, so I'm still inclined to optimize arrow-rs for now. However, I will follow your suggestion and add benchmarks. |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?