Skip to content

hakorune/hakozuna

Repository files navigation

hakozuna (hz3) / hakozuna-mt (hz4)

DOI

High-performance memory allocators competitive with mimalloc and tcmalloc

Public source release with Ubuntu/Linux, macOS, and Windows-native build and benchmark entrypoints.

Part of the hakorune project.


Variants

  • hz3 (hakozuna): Local-heavy performance + minimal RSS footprint. Default for most workloads.
  • hz4 (hakozuna-mt): Message-passing, remote-heavy scaling (best at high thread counts).
  • Profile selection guide: PROFILE_GUIDE.md

Platform Support

  • Ubuntu/Linux: public build and preload entrypoints under linux/ (x86_64 and arm64 lanes)
  • Windows native: public build and benchmark entrypoints under win/
  • macOS: public build and preload entrypoints under mac/ (separate Apple Silicon development lane)
  • Windows guide: docs/WINDOWS_BUILD.md
  • Windows public summaries: docs/benchmarks/windows/

Quick Start

Ubuntu/Linux

./linux/build_linux_release_lane.sh
./linux/run_linux_preload_smoke.sh hz3 /bin/true
./linux/run_linux_preload_smoke.sh hz4 /bin/true
./linux/run_linux_bench_compare.sh

Notes:

  • Ubuntu/Linux uses a single entrypoint layer for both x86_64 and arm64.
  • On Ubuntu arm64, use ./linux/build_linux_arm64_release_lane.sh for the explicit lane wrapper.
  • For benchmark compare runs, ./linux/run_linux_bench_compare.sh prepares local mimalloc / tcmalloc caches and uses the CRT smoke binary.
  • Record the CPU architecture in Linux benchmark summaries to keep lanes separate.

macOS

./mac/check_mac_env.sh
./mac/build_mac_release_lane.sh
./mac/run_mac_preload_smoke.sh hz3 /usr/bin/env true

Windows native

powershell -ExecutionPolicy Bypass -File .\win\check_windows_env.ps1
powershell -ExecutionPolicy Bypass -File .\win\build_win_allocator_suite.ps1
powershell -ExecutionPolicy Bypass -File .\win\run_win_allocator_matrix.ps1

This repository already includes public Windows-native allocator comparisons and paper-aligned benchmark lanes.

Feedback / Repro Reports

  • GitHub Issues are welcome for bugs, performance regressions, and compatibility trouble.
  • Issue templates are available from the New issue page.
  • For benchmark or integration reports, start from docs/REPRO_REPORT_TEMPLATE.md.
  • Please include allocator, platform, commit or release, workload or lane, exact command, and median result when possible.

Paper / Artifacts

  • ACE-Alloc Paper (English): docs/paper/main_en.pdf
  • ACE-Alloc Paper (Japanese): docs/paper/main_ja.pdf
  • Local paper workspace: private/paper/
  • Public paper PDFs currently match the v3.2 paper revision; v3.3 is a source/artifact release focused on Linux arm64 coverage and ownership-routing bug fixes
  • Latest archived Zenodo record (v3.3): https://zenodo.org/records/19139939
  • DOI (v3.3): https://doi.org/10.5281/zenodo.19139939
  • GitHub Releases: https://github.com/hakorune/hakozuna/releases
  • Citation metadata: CITATION.cff
  • Changelog: CHANGELOG.md (BREAKING changes are explicitly listed per release)
  • GitHub Release body template: docs/releases/GITHUB_RELEASE_v3.3.md

Benchmark Snapshot (2026-02-18, Ubuntu native)

Latest matrix (RUNS=10, MT lane x remote%) and redis-like (RUNS=10, memtier 15s) show a clear split:

  • hz3: strongest in local-heavy and redis-like workloads.
  • hz4: strongest in remote-heavy and high-thread cross workloads.
  • Full benchmark log: docs/benchmarks/2026-02-18_PAPER_BENCH_RESULTS.md

MT lane x remote% (median ops/s)

Lane hz3 hz4 mimalloc tcmalloc
main_r0 375.4M 137.4M 224.2M 232.7M
main_r50 66.5M 78.1M 17.9M 84.3M
main_r90 62.6M 67.6M 13.0M 54.9M
guard_r0 376.4M 266.7M 310.0M 372.0M
cross128_r90 1.80M 50.65M 10.94M 7.50M

Lane legend:

  • r0 / r50 / r90: target remote-free ratio of 0%, 50%, and 90%
  • main_*: standard MT random_mixed lane at T=16, size range 16..32768
  • guard_*: small-only guard lane at T=16, size range 16..2048, used to isolate small-object fixed cost
  • cross128_*: harsher cross-thread lane at T=16, size range 16..131072, used to stress mixed large-path and cross-thread behavior

Redis-like (median ops/s, RUNS=10)

Allocator ops/s
hz3 571,199
mimalloc 568,740
tcmalloc 568,052
hz4 560,576

Practical profile guidance

  • Default profile: hz3 (scale lane).
  • Remote-heavy / high-thread profile: hz4.
  • hz4 redis preload crash (rc=139) was fixed via malloc_usable_size interpose; redis-like rerun is now stable (n_ok=10).

Documentation

Design Principles (Box Theory)

  1. Boundary concentration: Minimize hot path / control layer crossings
  2. Reversibility: All optimizations toggleable via compile-time flags
  3. Observability: SSOT (atexit one-shot stats) for reproducible profiling
  4. Fail-fast: Detect invalid states at boundaries, abort early

Safety Notes

Some performance flags disable debug invariants:

  • HZ3_S97_REMOTE_STASH_SKIP_TAIL_NULL=1 is incompatible with HZ3_LIST_FAILFAST, HZ3_CENTRAL_DEBUG, HZ3_XFER_DEBUG

License

Apache License 2.0


Version: 2026.02.18 (release anchor: 3.0)

About

Box Theory-based allocator for multi-threaded apps. 3-layer design: fast TLS cache, PTAG32 O(1) ptr->bin lookup, MPSC remote-free queues. Competitive with mimalloc/tcmalloc. Emphasizes boundary concentration, reversible optimizations, and fail-fast checks. LD_PRELOAD-ready with multiple lanes.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors