Skip to content

HolosStack/binaryfuse

Repository files navigation

Binary Fuse Filter

A fast and space-efficient Binary Fuse Filter implementation in Rust with Python bindings. Based on the paper "Binary Fuse Filters: Fast and Smaller Than Xor Filters" by Daniel Lemire et al.

Binary Fuse Filters are approximate set membership structures (like Bloom filters) but are smaller and faster for lookups. They are particularly effective when the set of keys is known in advance (static filters).

Features

  • High Performance: Implementation in Rust for maximum speed.
  • Space Efficient: Only ~1.15x overhead over the theoretical minimum (better than Bloom/Xor filters).
  • Python Bindings: Easy to use from Python with BinaryFuse8 (8-bit fingerprints, ~0.4% FPR) and BinaryFuse16 (16-bit fingerprints, ~0.0015% FPR).
  • Serialization: Full support for Python's pickle module.

Installation

pip install binaryfuse

Usage (Python)

from binary_fuse import BinaryFuse8

# List of integer keys
keys = [1, 2, 3, 100, 200, 300]

# Build the filter
bf = BinaryFuse8(keys)

# Check membership
print(1 in bf)    # True
print(101 in bf)  # False (most likely)

# Serialization
import pickle
data = pickle.dumps(bf)
bf2 = pickle.loads(data)
print(1 in bf2)   # True

Compatibility & Zero-Compile Installation

We ship abi3 wheels (Stable ABI), which means:

  • One binary works everywhere: A single wheel supports all Python versions from 3.8 to 3.13+.
  • No compilation needed: Works out-of-the-box on Linux (glibc/musl), macOS (Intel/Apple Silicon), and Windows.
  • Future-proof: The wheel you install today will likely work on Python 3.14+ without any updates.

Technical Details

Binary Fuse Filters use a fused segment layout where each key is mapped to three locations ($h_0, h_1, h_2$) in a linear array. The locations are chosen from overlapping segments, which improves the success rate of the construction algorithm compared to standard Xor filters.

This implementation uses:

  • Arity 3: Each key maps to 3 locations.
  • 32 Segments: Standard for modern Binary Fuse Filter implementations.
  • Hashing: Robust mixing of 64-bit hashes to ensure uniform distribution.

Comparative Results (100,000 keys)

Filter Build (s) Hit (s) Miss (s) FPR (%) Size (KB) Bits/Key
BinaryFuse8 (Ours) 0.0170 0.0029 0.0032 0.37% 131.0 10.7
BinaryFuse16 (Ours) 0.0132 0.0029 0.0030 0.00% 262.0 21.4
rbloom (Rust Bloom) 0.0053 0.0019 0.0026 1.00% 117.0 9.6
Fuse8 (Lemire C) 0.0195 0.0203 0.0206 0.40% 1056.0* N/A
pybloom-live (Py) 0.0915 0.0829 0.0602 0.99% 117.3 9.6

* Note: Some C-binding sizes (like Fuse8) are estimated via RSS delta and may be inaccurate.

Why use Binary Fuse Filter?

While standard Bloom filters (like rbloom) are slightly faster for lookups (~20ns vs ~30ns), Binary Fuse Filters are significantly more space-efficient for the same accuracy.

  • Accuracy vs Size: For roughly the same memory (~120-130KB), our BinaryFuse8 offers 0.37% False Positive Rate, while the Bloom filter has 1.00% error. You get 2.7x better accuracy for the same cost.
  • To match our accuracy: A Bloom filter would need ~11.5 bits/key, making it ~20% larger than our filter.
  • Immutability: Binary Fuse filters are static. If you need to add items incrementally, use a Bloom filter. If you have a static dataset (e.g., a daily blocklist, a dictionary), Binary Fuse is mathematically superior.

Thread Safety

Our implementation is thread-safe for concurrent lookups. Because the filter is immutable after construction, multiple Python threads can perform membership tests (key in bf) simultaneously. The underlying Rust logic does not require any write locks or synchronization, making it ideal for high-concurrency environments.

License

MIT