Skip to content

perf: Optimizes Huffman decoding and compression margins#301

Merged
hellobertrand merged 9 commits into
mainfrom
perf/huffman-optim
Jul 2, 2026
Merged

perf: Optimizes Huffman decoding and compression margins#301
hellobertrand merged 9 commits into
mainfrom
perf/huffman-optim

Conversation

@hellobertrand

Copy link
Copy Markdown
Owner

Enhances performance and configurability across compression components:

  • Huffman Decoder Optimizations:

    • Reorders decode steps within the hot loop to improve instruction-level parallelism.
    • Optimizes bit length extraction from the lookup table for reduced ALU operations.
    • Directly updates bit buffers in the decoder, eliminating intermediate accumulators.
    • Introduces BMI2 _bzhi_u64 for accelerated lookup table indexing where supported.
  • Configurable Encoding Selection Margins:

    • Introduces ZXC_RLE_MARGIN_SHIFT and ZXC_HUF_MARGIN_SHIFT constants.
    • Allows dynamic adjustment of thresholds for RLE, Huffman, and Huffman-Dict encoding choices, replacing hardcoded values.
    • Updates the ZXC_HUF_MIN_LITERALS calculation to automatically adapt to the new margin shifts.
  • Documentation Improvements:

    • Provides clearer documentation for the Huffman bit writer struct and related constants.
  • Benchmark Workflow Update:

    • Skips building the OpenZL codec in benchmark workflows.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Rearrange the `len_total` field in Huffman decoder lookup table entries to occupy the top nibble. This allows the hot decode loop to extract the per-symbol shift amount using a direct right shift (`>> 28`) without an additional mask, trimming one ALU operation from the critical path recurrence and improving decode performance.
…allelism

Moves the decoded symbol's output storage and pointer increment operations to occur after the input bitstream accumulator shift. This reordering allows these independent operations to potentially execute in parallel with the accumulator update, improving instruction-level parallelism and overall decoder performance.
Replaces hardcoded integer division with named margin shift constants (`ZXC_RLE_MARGIN_SHIFT`, `ZXC_HUF_MARGIN_SHIFT`). This clarifies the encoding selection thresholds for RLE and Huffman. Also updates `ZXC_HUF_MIN_LITERALS` to depend on these new constants.
Removes intermediate `slX` accumulators for consumed bits within the hot decode loop. Instead of collecting consumed bits in `slX` and applying the total to `bbX` once per batch, `bbX` is now directly decremented by `_tX` in each step.

This change reduces variable overhead and eliminates several addition/subtraction operations within the critical path, improving decoder performance.
This change excludes OpenZL from the benchmark workflow, streamlining builds and focusing performance testing efforts on Huffman optimizations.
Introduces a new macro `ZXC_HUF_LUT_IDX` to compute the Huffman lookup table index. This macro leverages the `_bzhi_u64` instruction from the BMI2 instruction set when available. This improves performance in the hot decoding loops on supported architectures.
The logic for choosing ZXC_SECTION_ENCODING_HUFFMAN_DICT is refactored for clarity and consistency. This change consolidates the comparison thresholds against existing encodings (block-Huffman, RLE, RAW).
Adjusts the `ZXC_HUF_LUT_IDX` macro to use `_bzhi_u32` instead of `_bzhi_u64` for the BMI2-optimized Huffman lookup table index calculation. The Huffman lookup bits typically fit within a 32-bit range, making `_bzhi_u32` sufficient and potentially more efficient than `_bzhi_u64`.
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

@hellobertrand hellobertrand merged commit a45592f into main Jul 2, 2026
84 checks passed
@hellobertrand hellobertrand deleted the perf/huffman-optim branch July 2, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant