perf: Optimizes Huffman decoding and compression margins#301
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
fc16b45 to
ec02712
Compare
Rearrange the `len_total` field in Huffman decoder lookup table entries to occupy the top nibble. This allows the hot decode loop to extract the per-symbol shift amount using a direct right shift (`>> 28`) without an additional mask, trimming one ALU operation from the critical path recurrence and improving decode performance.
…allelism Moves the decoded symbol's output storage and pointer increment operations to occur after the input bitstream accumulator shift. This reordering allows these independent operations to potentially execute in parallel with the accumulator update, improving instruction-level parallelism and overall decoder performance.
Replaces hardcoded integer division with named margin shift constants (`ZXC_RLE_MARGIN_SHIFT`, `ZXC_HUF_MARGIN_SHIFT`). This clarifies the encoding selection thresholds for RLE and Huffman. Also updates `ZXC_HUF_MIN_LITERALS` to depend on these new constants.
Removes intermediate `slX` accumulators for consumed bits within the hot decode loop. Instead of collecting consumed bits in `slX` and applying the total to `bbX` once per batch, `bbX` is now directly decremented by `_tX` in each step. This change reduces variable overhead and eliminates several addition/subtraction operations within the critical path, improving decoder performance.
This change excludes OpenZL from the benchmark workflow, streamlining builds and focusing performance testing efforts on Huffman optimizations.
Introduces a new macro `ZXC_HUF_LUT_IDX` to compute the Huffman lookup table index. This macro leverages the `_bzhi_u64` instruction from the BMI2 instruction set when available. This improves performance in the hot decoding loops on supported architectures.
The logic for choosing ZXC_SECTION_ENCODING_HUFFMAN_DICT is refactored for clarity and consistency. This change consolidates the comparison thresholds against existing encodings (block-Huffman, RLE, RAW).
Adjusts the `ZXC_HUF_LUT_IDX` macro to use `_bzhi_u32` instead of `_bzhi_u64` for the BMI2-optimized Huffman lookup table index calculation. The Huffman lookup bits typically fit within a 32-bit range, making `_bzhi_u32` sufficient and potentially more efficient than `_bzhi_u64`.
fe2b32e to
04e85bb
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Enhances performance and configurability across compression components:
Huffman Decoder Optimizations:
_bzhi_u64for accelerated lookup table indexing where supported.Configurable Encoding Selection Margins:
ZXC_RLE_MARGIN_SHIFTandZXC_HUF_MARGIN_SHIFTconstants.ZXC_HUF_MIN_LITERALScalculation to automatically adapt to the new margin shifts.Documentation Improvements:
Benchmark Workflow Update: