Add zstd windowLog override to WriterProperties#9367
Add zstd windowLog override to WriterProperties#9367amirshukayev wants to merge 3 commits intoapache:mainfrom
Conversation
|
Thanks @amirshukayev, this looks interesting. Since the compression codec itself is already on I also wonder if we should instead modify |
f18a35c to
00aca79
Compare
Great point! I've made the update to allow it to be set on
I think this makes sense, this the approach I started with. Whats your take here? would it be worth breaking things? I'd be happy to implement it like that. This is the approach I used before, which I assume is what you mean? // Before:
pub struct ZstdLevel(i32);
// After (vendored):
pub struct ZstdLevel {
level: i32,
window_log_override: Option<u32>,
} |
❤️
Yes, this is what I was thinking. If we did this, then users could simply use I wonder if others could opine here before starting down that road. CC @alamb @jhorstmann @Dandandan @Jefffrey |
Which issue does this PR close?
Rationale for this change
Tuning window size with zstd is important for wider columns such as html strings. In my personal uses I saw a 32% size decrease with small performance penalty with level = 9 and windowLog = 27 (128 MB windows up from 8 MB default for level 9).
What changes are included in this PR?
We add a
zstd_window_log_override: Option<u32>param toCodecOptionsto just allow the value to be set through tozstd-safe. Just plumbing.Are these changes tested?
Pass all the cargo tests, one unit test provided, just to show a usage example/guard the behaviour, doesn't necessarily test the behaviour of the underlying zstd library.
Are there any user-facing changes?
No backwards incompatible changes, but this change adds a new param to
CodecOptionsBuilder, and makes changes to cargo docs.notes
I think taking the approach of making this an override makes the most sense, for the following reasons:
CodecOptionswhich also includes an encoder specific flag inbackward_compatible_lz4.WriterProperties, which I'm not sure is entirely natural, but I couldn't think of another way to do this. We could guard this with thezstdflag but I'm not sure if that is consistent with the other zstd code.Please let me know if you think of a better way to do this.