perf(serialize): optimized sorting algorithm by zsilbi · Pull Request #151 · unjs/ohash

zsilbi · 2025-03-09T22:38:27Z

For smaller object keys and Setvalues with less than ~10 elements (which is quite common) we can use a simple insertion sort.
Otherwise we fall back to the native sort().

Compared to target: #144

 ✓ test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small 2190ms
     name                       hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  2,734,883.45  0.0003  3.0094  0.0004  0.0003  0.0007  0.0008  0.0090  ±1.55%  1367442
   · ohash @ dev      4,263,019.50  0.0002  0.2779  0.0002  0.0002  0.0004  0.0005  0.0006  ±0.21%  2131510   fastest

 ✓ test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small 1210ms
     name                    hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  13,446.55  0.0647  0.4242  0.0744  0.0702  0.2418  0.2760  0.3444  ±0.89%     6727
   · ohash @ dev      18,352.54  0.0480  0.4818  0.0545  0.0513  0.2242  0.2515  0.3314  ±0.93%     9177   fastest

 BENCH  Summary

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small
    1.56x faster than ohash @ d7f7f9d

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small
    1.36x faster than ohash @ d7f7f9d

clk: ~5.31 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: bun 1.2.5 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              207.86 ns/iter 199.94 ns █▇
                    (190.37 ns … 560.65 ns) 427.13 ns ██
                    (  0.00  b … 396.00  b)   4.83  b ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                  170.54 ns/iter 160.53 ns █▇
                    (149.46 ns … 601.63 ns) 415.21 ns ██
                    (  0.00  b … 726.00  b)  11.54  b ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

summary
  ohash @ dev
   1.22x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               66.88 µs/iter  59.40 µs  █
                       (50.93 µs … 2.02 ms) 141.50 µs  █
                    (  0.00  b … 528.00 kb)  13.95 kb ▃█▅▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                   49.34 µs/iter  49.51 µs        █ █
                      (48.01 µs … 51.76 µs)  50.43 µs ▅  ▅   █▅█ ▅▅ ▅     ▅
                    (619.00  b …  13.08 kb)   2.48 kb █▁▁█▁▁▁███▁██▁█▁▁▁▁▁█

summary
  ohash @ dev
   1.36x faster than ohash @ d7f7f9d

pi0 · 2025-03-25T20:46:39Z

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

zsilbi · 2025-03-25T21:07:14Z

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

I ran this again on node with the latest benchmark from #142 (this includes est. instruction counts too)
Also tested v20, and the difference is the same.

This is small object with 5 keys in random order.

clk: ~5.25 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: node 23.9.0 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              260.16 ns/iter 260.28 ns  ▃█
                    (239.77 ns … 686.80 ns) 434.16 ns ▆██
                    (  1.78 kb …   2.09 kb)   1.84 kb ███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.16 ipc ( 95.84% cache)    0.64 branch misses
          1.37k cycles   5.69k instructions   73.30 c-refs    3.05 c-misses

ohash @ dev                  193.24 ns/iter 200.30 ns   █
                    (181.85 ns … 286.39 ns) 227.37 ns  ▅██
                    (787.89  b …   1.29 kb) 992.75  b ▃███▆▄▃▃▆▇▆▅▃▂▂▁▂▁▁▁▁
                   4.41 ipc ( 96.45% cache)    0.18 branch misses
          1.02k cycles   4.48k instructions   41.53 c-refs    1.48 c-misses

summary
  ohash @ dev
   1.35x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               74.17 µs/iter  69.04 µs █▆
                     (63.98 µs … 459.12 µs) 215.41 µs ██
                    ( 14.24 kb … 779.95 kb) 461.35 kb ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   3.92 ipc ( 95.74% cache)  526.21 branch misses
        390.40k cycles   1.53M instructions  22.40k c-refs  953.59 c-misses

ohash @ dev                   57.23 µs/iter  54.75 µs █
                     (50.63 µs … 377.78 µs) 217.18 µs █
                    ( 13.18 kb … 450.00 kb) 239.15 kb █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.17 ipc ( 96.48% cache)  367.38 branch misses
        299.92k cycles   1.25M instructions  13.98k c-refs  492.69 c-misses

summary
  ohash @ dev
   1.3x faster than ohash @ d7f7f9d

zsilbi added 3 commits March 9, 2025 23:27

test(serialize): added sorting tests with more elements

e23974c

perf(serialize): use insertion sort for small arrays

b7ec6f5

test: update bunde size

c808dbb

zsilbi changed the title ~~perf(serialize): optimized sorting for small objects~~ perf(serialize): optimized sorting for small objects and arrays Mar 9, 2025

zsilbi mentioned this pull request Mar 9, 2025

perf(serialize): faster compare and sorting #144

Draft

1 task

zsilbi added 3 commits March 10, 2025 00:31

perf(serialize): increase max length to 20

5de60d1

chore: edit comment

8b6e95c

perf(serialize): optimized compare

9d08e0b

zsilbi marked this pull request as ready for review March 10, 2025 10:50

zsilbi added 7 commits March 10, 2025 16:58

perf(serialize): remove unnecessary param

a6831d8

perf(serialize): remove unnecessary param

fa61821

perf(serialize): reduce bundle size by keeping it as fast as possible

7418b42

style(serialize): improve readability

e2d4b89

perf(serialize): adjust arr length limit

b7055ea

perf(serialize): remove else from loop

68d3276

perf(serialize): change paths in loop

552c05b

zsilbi force-pushed the perf/optimized-sort branch from 74c561b to 552c05b Compare March 11, 2025 16:56

perf(serialize): opt. bundle size

2477251

zsilbi changed the title ~~perf(serialize): optimized sorting for small objects and arrays~~ perf(serialize): optimized sorting algorithm Mar 14, 2025

zsilbi and others added 5 commits March 14, 2025 15:47

chore: update eslint config

2751bc1

perf(serialize): optimized string compare

9c68b1b

chore: apply automated updates

c56f25b

perf(serialize): change comparison order

e0c21b1

perf(serialize): revert to default string compare

d76fb98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(serialize): optimized sorting algorithm#151

perf(serialize): optimized sorting algorithm#151
zsilbi wants to merge 19 commits intoperf/faster-comparefrom
perf/optimized-sort

zsilbi commented Mar 9, 2025 •

edited

Loading

Uh oh!

pi0 commented Mar 25, 2025

Uh oh!

zsilbi commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zsilbi commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compared to target: #144

Uh oh!

pi0 commented Mar 25, 2025

Uh oh!

zsilbi commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zsilbi commented Mar 9, 2025 •

edited

Loading