High-performance memory allocators competitive with mimalloc and tcmalloc
Public source release with Ubuntu/Linux, macOS, and Windows-native build and benchmark entrypoints.
Part of the hakorune project.
- hz3 (hakozuna): Local-heavy performance + minimal RSS footprint. Default for most workloads.
- hz4 (hakozuna-mt): Message-passing, remote-heavy scaling (best at high thread counts).
- Profile selection guide: PROFILE_GUIDE.md
- Ubuntu/Linux: public build and preload entrypoints under
linux/(x86_64 and arm64 lanes) - Windows native: public build and benchmark entrypoints under
win/ - macOS: public build and preload entrypoints under
mac/(separate Apple Silicon development lane) - Windows guide:
docs/WINDOWS_BUILD.md - Windows public summaries:
docs/benchmarks/windows/
./linux/build_linux_release_lane.sh
./linux/run_linux_preload_smoke.sh hz3 /bin/true
./linux/run_linux_preload_smoke.sh hz4 /bin/true
./linux/run_linux_bench_compare.shNotes:
- Ubuntu/Linux uses a single entrypoint layer for both
x86_64andarm64. - On Ubuntu arm64, use
./linux/build_linux_arm64_release_lane.shfor the explicit lane wrapper. - For benchmark compare runs,
./linux/run_linux_bench_compare.shprepares localmimalloc/tcmalloccaches and uses the CRT smoke binary. - Record the CPU architecture in Linux benchmark summaries to keep lanes separate.
./mac/check_mac_env.sh
./mac/build_mac_release_lane.sh
./mac/run_mac_preload_smoke.sh hz3 /usr/bin/env truepowershell -ExecutionPolicy Bypass -File .\win\check_windows_env.ps1
powershell -ExecutionPolicy Bypass -File .\win\build_win_allocator_suite.ps1
powershell -ExecutionPolicy Bypass -File .\win\run_win_allocator_matrix.ps1This repository already includes public Windows-native allocator comparisons and paper-aligned benchmark lanes.
- GitHub Issues are welcome for bugs, performance regressions, and compatibility trouble.
- Issue templates are available from the
New issuepage. - For benchmark or integration reports, start from
docs/REPRO_REPORT_TEMPLATE.md. - Please include allocator, platform, commit or release, workload or lane, exact command, and median result when possible.
- ACE-Alloc Paper (English):
docs/paper/main_en.pdf - ACE-Alloc Paper (Japanese):
docs/paper/main_ja.pdf - Local paper workspace:
private/paper/ - Public paper PDFs currently match the
v3.2paper revision;v3.3is a source/artifact release focused on Linux arm64 coverage and ownership-routing bug fixes - Latest archived Zenodo record (v3.3): https://zenodo.org/records/19139939
- DOI (v3.3): https://doi.org/10.5281/zenodo.19139939
- GitHub Releases: https://github.com/hakorune/hakozuna/releases
- Citation metadata:
CITATION.cff - Changelog:
CHANGELOG.md(BREAKING changes are explicitly listed per release) - GitHub Release body template:
docs/releases/GITHUB_RELEASE_v3.3.md
Latest matrix (RUNS=10, MT lane x remote%) and redis-like (RUNS=10, memtier 15s) show a clear split:
hz3: strongest in local-heavy and redis-like workloads.hz4: strongest in remote-heavy and high-thread cross workloads.- Full benchmark log:
docs/benchmarks/2026-02-18_PAPER_BENCH_RESULTS.md
| Lane | hz3 | hz4 | mimalloc | tcmalloc |
|---|---|---|---|---|
main_r0 |
375.4M | 137.4M | 224.2M | 232.7M |
main_r50 |
66.5M | 78.1M | 17.9M | 84.3M |
main_r90 |
62.6M | 67.6M | 13.0M | 54.9M |
guard_r0 |
376.4M | 266.7M | 310.0M | 372.0M |
cross128_r90 |
1.80M | 50.65M | 10.94M | 7.50M |
Lane legend:
r0/r50/r90: target remote-free ratio of0%,50%, and90%main_*: standard MTrandom_mixedlane atT=16, size range16..32768guard_*: small-only guard lane atT=16, size range16..2048, used to isolate small-object fixed costcross128_*: harsher cross-thread lane atT=16, size range16..131072, used to stress mixed large-path and cross-thread behavior
| Allocator | ops/s |
|---|---|
| hz3 | 571,199 |
| mimalloc | 568,740 |
| tcmalloc | 568,052 |
| hz4 | 560,576 |
- Default profile:
hz3(scalelane). - Remote-heavy / high-thread profile:
hz4. hz4redis preload crash (rc=139) was fixed viamalloc_usable_sizeinterpose; redis-like rerun is now stable (n_ok=10).
- Linux Entrypoints
- Windows Build
- Mac Build
- Mac Entrypoints
- Repo Structure
- Windows Redis Matrix
- Windows Memcached Recovery
- Windows Memcached libevent
- Windows Memcached Native MSVC Plan
- Windows Memcached Shim
- Windows Memcached Minimal Main
- Build Flags Index
- Paper Notes
- Repro Report Template
- Profile Guide
- Safe Defaults
- Compatibility Notes
- Boundary concentration: Minimize hot path / control layer crossings
- Reversibility: All optimizations toggleable via compile-time flags
- Observability: SSOT (atexit one-shot stats) for reproducible profiling
- Fail-fast: Detect invalid states at boundaries, abort early
Some performance flags disable debug invariants:
HZ3_S97_REMOTE_STASH_SKIP_TAIL_NULL=1is incompatible withHZ3_LIST_FAILFAST,HZ3_CENTRAL_DEBUG,HZ3_XFER_DEBUG
Apache License 2.0
Version: 2026.02.18 (release anchor: 3.0)