A physics-inspired parallel sorting algorithm for 32-bit unsigned integers that adapts its bucket spacing to the input distribution in a single O(n) pass.
Paper: adaptive_gravity_sort_paper.tex (compile with pdflatex)
Integer values are treated as particles dropped into a viscous fluid. Under Stokes drag, heavier particles (larger values) settle lower. Bucket boundaries placed at equal time-slices of fall time correspond to a power-law spacing of the value range, parameterised by a single exponent λ estimated from the arithmetic mean of the input:
λ̂ = 2(μ − x_min) / (x_max − x_min)
- λ̂ ≈ 1 → linear spacing (recovers Flash Sort for uniform data)
- λ̂ ≈ 0 → logarithmic spacing (optimal for power-law data)
- λ̂ ≈ 0.5 → quadratic spacing (Stokes drag model, optimal for clustered/bimodal)
No prior knowledge of the distribution is needed — the mean encodes it.
| Distribution | std::sort (ms) | AGS speedup |
|---|---|---|
| Pipe-organ | 23.6 | 9.16× |
| Bimodal | 29.9 | 9.07× |
| Clustered (32 groups) | 22.9 | 7.74× |
| Small range [0, 65535] | 28.6 | 5.69× |
| Uniform random | 29.8 | 5.05× |
| Power-law | 30.1 | 4.77× |
Geometric mean across all 7 disordered distributions: 6.07× over std::sort.
AGS is slower than std::sort on pre-sorted and reverse-sorted inputs (structural limitation of all distribution sorts — see paper for details).
GPU-AGS achieves 75–343× over CPU std::sort at n = 500K–50M. However, CUB DeviceRadixSort is 4–10× faster than GPU-AGS on GPU. The cache-pressure advantage of AGS over radix sort on CPU does not transfer to GPU, where large L2 cache eliminates the bottleneck. See Section 8 of the paper for the full analysis.
CPU benchmark:
g++ -std=c++17 -O3 -march=native -o gravity_bench gravity_sort_benchmark.cpp
./gravity_bench 500000 --reps 5 --warmup 1
./gravity_bench --fuzz # correctness: 713/713 expectedGPU benchmark (requires CUDA 13+ and NVIDIA GPU):
nvcc -O3 -arch=sm_89 -std=c++17 gpu_sort_bench.cu -o gpu_bench
./gpu_bench # outputs JSON resultsStreamlit visualisation:
pip install streamlit
streamlit run gravity_sort_app.py| File | Description |
|---|---|
gravity_sort_benchmark.cpp |
All CPU algorithms + benchmark harness (C++17, no dependencies) |
gpu_sort_bench.cu |
GPU benchmark: thrust::sort, CUB radix, GPU-AGS |
gpu_bench_results.json |
Raw results from L40S benchmark run |
adaptive_gravity_sort_paper.tex |
Full paper (LaTeX) |
gravity_sort_app.py |
Streamlit visualisation app |
ideas.md |
Development log: physics analogies tried, rejected, and integrated |
If you use this code, algorithm, or paper in any work — including research, software, products, or presentations — you must cite:
@techreport{newmannorlund2025gravitysort,
author = {Newman-Norlund, Roger},
title = {Adaptive Gravity Sort: A Physics-Inspired Distribution Sort
with Automatic Bucket-Spacing from the Arithmetic Mean},
institution = {Independent Research},
year = {2025},
url = {https://github.com/rnorlund/adaptive-gravity-sort}
}See CITATION.cff for additional citation formats. Use of this work without attribution violates the license terms.
Academic Attribution License — free to use, modify, and distribute for any purpose, with mandatory attribution and citation. See LICENSE for full terms.
This algorithm was developed over approximately one week, iterating through several physics analogies before finding the viscous fluid model:
- Particle collisions → pairwise comparisons = bubble sort (rejected)
- Brownian motion → randomised pivot = known quicksort trick (rejected)
- Centrifuge → bit rotation breaks sort order, reduces to radix sort (rejected)
- Magnetic deflection → Lorentz force is 2D, not mappable to 1D ordering (rejected)
- Osmosis → O(n) per recursion level = O(n log n), no improvement (rejected)
- Shock waves → histogram overhead, bimodal regression 9.5→2.8× (rejected)
- Isotope separation → entropy-optimal bit ≠ arithmetic sort order, produces wrong output (rejected)
- Viscous fluid settling → monotone, closed-form, distribution-aware ✓
The full exploration is documented in ideas.md.