Skip to content

perf(serialize): optimized sorting algorithm#151

Open
zsilbi wants to merge 19 commits intoperf/faster-comparefrom
perf/optimized-sort
Open

perf(serialize): optimized sorting algorithm#151
zsilbi wants to merge 19 commits intoperf/faster-comparefrom
perf/optimized-sort

Conversation

@zsilbi
Copy link
Copy Markdown
Collaborator

@zsilbi zsilbi commented Mar 9, 2025

For smaller object keys and Setvalues with less than ~10 elements (which is quite common) we can use a simple insertion sort.
Otherwise we fall back to the native sort().

Compared to target: #144

  test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small 2190ms
     name                       hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  2,734,883.45  0.0003  3.0094  0.0004  0.0003  0.0007  0.0008  0.0090  ±1.55%  1367442
   · ohash @ dev      4,263,019.50  0.0002  0.2779  0.0002  0.0002  0.0004  0.0005  0.0006  ±0.21%  2131510   fastest

  test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small 1210ms
     name                    hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  13,446.55  0.0647  0.4242  0.0744  0.0702  0.2418  0.2760  0.3444  ±0.89%     6727
   · ohash @ dev      18,352.54  0.0480  0.4818  0.0545  0.0513  0.2242  0.2515  0.3314  ±0.93%     9177   fastest

 BENCH  Summary

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small
    1.56x faster than ohash @ d7f7f9d

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small
    1.36x faster than ohash @ d7f7f9d
clk: ~5.31 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: bun 1.2.5 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              207.86 ns/iter 199.94 ns █▇
                    (190.37 ns … 560.65 ns) 427.13 ns ██
                    (  0.00  b … 396.00  b)   4.83  b ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                  170.54 ns/iter 160.53 ns █▇
                    (149.46 ns … 601.63 ns) 415.21 ns ██
                    (  0.00  b … 726.00  b)  11.54  b ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

summary
  ohash @ dev
   1.22x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               66.88 µs/iter  59.40 µs  
                       (50.93 µs … 2.02 ms) 141.50 µs  
                    (  0.00  b … 528.00 kb)  13.95 kb ▃█▅▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                   49.34 µs/iter  49.51 µs         
                      (48.01 µs … 51.76 µs)  50.43 µs      █▅█ ▅▅      
                    (619.00  b …  13.08 kb)   2.48 kb █▁▁█▁▁▁███▁██▁█▁▁▁▁▁█

summary
  ohash @ dev
   1.36x faster than ohash @ d7f7f9d

@zsilbi zsilbi changed the title perf(serialize): optimized sorting for small objects perf(serialize): optimized sorting for small objects and arrays Mar 9, 2025
@zsilbi zsilbi marked this pull request as ready for review March 10, 2025 10:50
@zsilbi zsilbi force-pushed the perf/optimized-sort branch from 74c561b to 552c05b Compare March 11, 2025 16:56
@zsilbi zsilbi changed the title perf(serialize): optimized sorting for small objects and arrays perf(serialize): optimized sorting algorithm Mar 14, 2025
@pi0
Copy link
Copy Markdown
Member

pi0 commented Mar 25, 2025

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

@zsilbi
Copy link
Copy Markdown
Collaborator Author

zsilbi commented Mar 25, 2025

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

I ran this again on node with the latest benchmark from #142 (this includes est. instruction counts too)
Also tested v20, and the difference is the same.

This is small object with 5 keys in random order.

clk: ~5.25 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: node 23.9.0 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              260.16 ns/iter 260.28 ns  ▃█
                    (239.77 ns … 686.80 ns) 434.16 ns ▆██
                    (  1.78 kb …   2.09 kb)   1.84 kb ███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.16 ipc ( 95.84% cache)    0.64 branch misses
          1.37k cycles   5.69k instructions   73.30 c-refs    3.05 c-misses

ohash @ dev                  193.24 ns/iter 200.30 ns   
                    (181.85 ns … 286.39 ns) 227.37 ns  ▅██
                    (787.89  b …   1.29 kb) 992.75  b ▃███▆▄▃▃▆▇▆▅▃▂▂▁▂▁▁▁▁
                   4.41 ipc ( 96.45% cache)    0.18 branch misses
          1.02k cycles   4.48k instructions   41.53 c-refs    1.48 c-misses

summary
  ohash @ dev
   1.35x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               74.17 µs/iter  69.04 µs █▆
                     (63.98 µs … 459.12 µs) 215.41 µs ██
                    ( 14.24 kb … 779.95 kb) 461.35 kb ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   3.92 ipc ( 95.74% cache)  526.21 branch misses
        390.40k cycles   1.53M instructions  22.40k c-refs  953.59 c-misses

ohash @ dev                   57.23 µs/iter  54.75 µs 
                     (50.63 µs … 377.78 µs) 217.18 µs 
                    ( 13.18 kb … 450.00 kb) 239.15 kb █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.17 ipc ( 96.48% cache)  367.38 branch misses
        299.92k cycles   1.25M instructions  13.98k c-refs  492.69 c-misses

summary
  ohash @ dev
   1.3x faster than ohash @ d7f7f9d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants