perf: Optimizes Huffman decoding and compression margins by hellobertrand · Pull Request #301 · hellobertrand/zxc

hellobertrand · 2026-06-29T06:03:42Z

Enhances performance and configurability across compression components:

Huffman Decoder Optimizations:
- Reorders decode steps within the hot loop to improve instruction-level parallelism.
- Optimizes bit length extraction from the lookup table for reduced ALU operations.
- Directly updates bit buffers in the decoder, eliminating intermediate accumulators.
- Introduces BMI2 _bzhi_u64 for accelerated lookup table indexing where supported.
Configurable Encoding Selection Margins:
- Introduces ZXC_RLE_MARGIN_SHIFT and ZXC_HUF_MARGIN_SHIFT constants.
- Allows dynamic adjustment of thresholds for RLE, Huffman, and Huffman-Dict encoding choices, replacing hardcoded values.
- Updates the ZXC_HUF_MIN_LITERALS calculation to automatically adapt to the new margin shifts.
Documentation Improvements:
- Provides clearer documentation for the Huffman bit writer struct and related constants.
Benchmark Workflow Update:
- Skips building the OpenZL codec in benchmark workflows.

codecov · 2026-06-29T06:05:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Rearrange the `len_total` field in Huffman decoder lookup table entries to occupy the top nibble. This allows the hot decode loop to extract the per-symbol shift amount using a direct right shift (`>> 28`) without an additional mask, trimming one ALU operation from the critical path recurrence and improving decode performance.

…allelism Moves the decoded symbol's output storage and pointer increment operations to occur after the input bitstream accumulator shift. This reordering allows these independent operations to potentially execute in parallel with the accumulator update, improving instruction-level parallelism and overall decoder performance.

Replaces hardcoded integer division with named margin shift constants (`ZXC_RLE_MARGIN_SHIFT`, `ZXC_HUF_MARGIN_SHIFT`). This clarifies the encoding selection thresholds for RLE and Huffman. Also updates `ZXC_HUF_MIN_LITERALS` to depend on these new constants.

Removes intermediate `slX` accumulators for consumed bits within the hot decode loop. Instead of collecting consumed bits in `slX` and applying the total to `bbX` once per batch, `bbX` is now directly decremented by `_tX` in each step. This change reduces variable overhead and eliminates several addition/subtraction operations within the critical path, improving decoder performance.

This change excludes OpenZL from the benchmark workflow, streamlining builds and focusing performance testing efforts on Huffman optimizations.

Introduces a new macro `ZXC_HUF_LUT_IDX` to compute the Huffman lookup table index. This macro leverages the `_bzhi_u64` instruction from the BMI2 instruction set when available. This improves performance in the hot decoding loops on supported architectures.

The logic for choosing ZXC_SECTION_ENCODING_HUFFMAN_DICT is refactored for clarity and consistency. This change consolidates the comparison thresholds against existing encodings (block-Huffman, RLE, RAW).

Adjusts the `ZXC_HUF_LUT_IDX` macro to use `_bzhi_u32` instead of `_bzhi_u64` for the BMI2-optimized Huffman lookup table index calculation. The Huffman lookup bits typically fit within a 32-bit range, making `_bzhi_u32` sufficient and potentially more efficient than `_bzhi_u64`.

sonarqubecloud · 2026-07-01T14:36:54Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

hellobertrand force-pushed the perf/huffman-optim branch from fc16b45 to ec02712 Compare June 29, 2026 06:09

hellobertrand added 9 commits July 1, 2026 16:36

doc: Improve Huffman bit writer struct and constant group documentation

53fe588

perf: Skip OpenZL codec build in benchmarks

4c2bc63

This change excludes OpenZL from the benchmark workflow, streamlining builds and focusing performance testing efforts on Huffman optimizations.

perf: Refactor Huffman dictionary encoding selection

179991a

The logic for choosing ZXC_SECTION_ENCODING_HUFFMAN_DICT is refactored for clarity and consistency. This change consolidates the comparison thresholds against existing encodings (block-Huffman, RLE, RAW).

hellobertrand force-pushed the perf/huffman-optim branch from fe2b32e to 04e85bb Compare July 1, 2026 14:36

hellobertrand merged commit a45592f into main Jul 2, 2026
84 checks passed

hellobertrand deleted the perf/huffman-optim branch July 2, 2026 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Optimizes Huffman decoding and compression margins#301

perf: Optimizes Huffman decoding and compression margins#301
hellobertrand merged 9 commits into
mainfrom
perf/huffman-optim

hellobertrand commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026

Uh oh!

sonarqubecloud Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hellobertrand commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026

Codecov Report

Uh oh!

sonarqubecloud Bot commented Jul 1, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant