A fast and space-efficient Binary Fuse Filter implementation in Rust with Python bindings. Based on the paper "Binary Fuse Filters: Fast and Smaller Than Xor Filters" by Daniel Lemire et al.
Binary Fuse Filters are approximate set membership structures (like Bloom filters) but are smaller and faster for lookups. They are particularly effective when the set of keys is known in advance (static filters).
- High Performance: Implementation in Rust for maximum speed.
- Space Efficient: Only ~1.15x overhead over the theoretical minimum (better than Bloom/Xor filters).
- Python Bindings: Easy to use from Python with
BinaryFuse8(8-bit fingerprints, ~0.4% FPR) andBinaryFuse16(16-bit fingerprints, ~0.0015% FPR). - Serialization: Full support for Python's
picklemodule.
pip install binaryfusefrom binary_fuse import BinaryFuse8
# List of integer keys
keys = [1, 2, 3, 100, 200, 300]
# Build the filter
bf = BinaryFuse8(keys)
# Check membership
print(1 in bf) # True
print(101 in bf) # False (most likely)
# Serialization
import pickle
data = pickle.dumps(bf)
bf2 = pickle.loads(data)
print(1 in bf2) # TrueWe ship abi3 wheels (Stable ABI), which means:
- One binary works everywhere: A single wheel supports all Python versions from 3.8 to 3.13+.
- No compilation needed: Works out-of-the-box on Linux (glibc/musl), macOS (Intel/Apple Silicon), and Windows.
- Future-proof: The wheel you install today will likely work on Python 3.14+ without any updates.
Binary Fuse Filters use a fused segment layout where each key is mapped to three locations (
This implementation uses:
- Arity 3: Each key maps to 3 locations.
- 32 Segments: Standard for modern Binary Fuse Filter implementations.
- Hashing: Robust mixing of 64-bit hashes to ensure uniform distribution.
| Filter | Build (s) | Hit (s) | Miss (s) | FPR (%) | Size (KB) | Bits/Key |
|---|---|---|---|---|---|---|
| BinaryFuse8 (Ours) | 0.0170 | 0.0029 | 0.0032 | 0.37% | 131.0 | 10.7 |
| BinaryFuse16 (Ours) | 0.0132 | 0.0029 | 0.0030 | 0.00% | 262.0 | 21.4 |
| rbloom (Rust Bloom) | 0.0053 | 0.0019 | 0.0026 | 1.00% | 117.0 | 9.6 |
| Fuse8 (Lemire C) | 0.0195 | 0.0203 | 0.0206 | 0.40% | 1056.0* | N/A |
| pybloom-live (Py) | 0.0915 | 0.0829 | 0.0602 | 0.99% | 117.3 | 9.6 |
* Note: Some C-binding sizes (like Fuse8) are estimated via RSS delta and may be inaccurate.
While standard Bloom filters (like rbloom) are slightly faster for lookups (~20ns vs ~30ns), Binary Fuse Filters are significantly more space-efficient for the same accuracy.
- Accuracy vs Size: For roughly the same memory (~120-130KB), our
BinaryFuse8offers 0.37% False Positive Rate, while the Bloom filter has 1.00% error. You get 2.7x better accuracy for the same cost. - To match our accuracy: A Bloom filter would need ~11.5 bits/key, making it ~20% larger than our filter.
- Immutability: Binary Fuse filters are static. If you need to add items incrementally, use a Bloom filter. If you have a static dataset (e.g., a daily blocklist, a dictionary), Binary Fuse is mathematically superior.
Our implementation is thread-safe for concurrent lookups.
Because the filter is immutable after construction, multiple Python threads can perform membership tests (key in bf) simultaneously. The underlying Rust logic does not require any write locks or synchronization, making it ideal for high-concurrency environments.
MIT