Benchmark Results: SQ uint8 vs Float32
Setup: 3x r6i.4xlarge (16 vCPU, 128GB RAM), JVM 16GB heap, 1 segment/shard (~45.4M docs each), total 136,362,605 docs (MSMarco V2 SPLADE).
Query: 3,903 queries, recall@10 against ground truth, top_n=3 (server-side token pruning).
Search: Recall & Latency (top_n=3)
| heap_factor |
SQ uint8 recall |
float32 recall |
recall gap |
SQ uint8 p50 |
float32 p50 |
latency delta |
| 1.03 |
0.8283 |
0.8413 |
-1.3% |
10ms |
3ms |
+7ms* |
| 1.05 |
0.8428 |
0.8559 |
-1.3% |
2ms |
3ms |
-1ms |
| 1.07 |
0.8552 |
0.8682 |
-1.3% |
2ms |
3ms |
-1ms |
| 1.08 |
0.8606 |
0.8736 |
-1.3% |
2ms |
3ms |
-1ms |
| 1.10 |
0.8700 |
0.8834 |
-1.3% |
2ms |
3ms |
-1ms |
| 1.15 |
0.8874 |
0.9036 |
-1.6% |
2ms |
3ms |
-1ms |
| 1.20 |
0.8998 |
0.9172 |
-1.7% |
3ms |
3ms |
same |
| 1.50 |
0.9262 |
0.9467 |
-2.1% |
4ms |
4ms |
same |
| 2.00 |
0.9311 |
0.9514 |
-2.0% |
6ms |
7ms |
-1ms |
*hf=1.03 anomaly for SQ is a warmup artifact (first sweep point pays loading cost).
Index Build Time (SQ uint8, optimized batched clustering)
| Phase |
Duration |
Notes |
| Lucene segment merge |
34 min |
Segment I/O (3 shards in parallel) |
| Batch add (CSR construction) |
1 min |
With reserve() pre-allocation (previously 10 min without pre-allocation) |
| K-means clustering (32 threads) |
15 min |
Memory-aware batched clustering with per-batch inverted list construction + immediate free |
| Save to disk |
2 min |
32GB sequential write |
| Total |
51 min |
|
Optimizations applied
- Batch add with
reserve(): Pre-allocates CSR vectors storage based on estimated total NNZ from first batch. Reduces batch add from ~10 min to 1 min by eliminating repeated std::vector resizing and reallocation.
- Memory-aware batched clustering: Reads
/proc/meminfo MemAvailable to determine batch count. Processes posting lists in batches that fit within available memory, freeing each batch's inverted lists immediately after clustering. Prevents glibc heap fragmentation that previously retained ~38GB unreturnable memory post-build.
release_build_memory() after save: Explicitly releases vectors_ and clustered_inverted_lists immediately after writing to disk (before deleteIndex), combined with mallopt(M_MMAP_THRESHOLD, 128KB) to force large allocations through mmap for individual reclamation.
Resource Usage
| Metric |
SQ uint8 |
| .nsparse file size per shard |
32 GB |
| Post-load RSS (search steady state) |
49.5 GB |
| Peak RSS during build |
103.8 GB |
Build Parameters
- SQ uint8:
idmap,seismic_sq,quantizer=8bit|vmin=0.0|vmax=4.0, quantization_ceiling_search=4.0
- Float32:
idmap,seismic
- Both: lambda=22,724 (auto), beta=2,272 (auto, 0.1×lambda), alpha=0.4, OMP_THREADS=32
Summary
SQ uint8 trades 1.3% recall for:
- 33% faster search (2ms vs 3ms p50 at hf=1.08)
- 50% less disk/RAM (32GB vs 62GB per shard)
Recommended operating points:
- hf=1.08: 86% recall @ 2ms p50 (best latency)
- hf=1.15: 89% recall @ 2ms p50 (best recall/latency balance)
Benchmark Results: SQ uint8 vs Float32
Setup: 3x r6i.4xlarge (16 vCPU, 128GB RAM), JVM 16GB heap, 1 segment/shard (~45.4M docs each), total 136,362,605 docs (MSMarco V2 SPLADE).
Query: 3,903 queries, recall@10 against ground truth, top_n=3 (server-side token pruning).
Search: Recall & Latency (top_n=3)
*hf=1.03 anomaly for SQ is a warmup artifact (first sweep point pays loading cost).
Index Build Time (SQ uint8, optimized batched clustering)
reserve()pre-allocation (previously 10 min without pre-allocation)Optimizations applied
reserve(): Pre-allocates CSR vectors storage based on estimated total NNZ from first batch. Reduces batch add from ~10 min to 1 min by eliminating repeatedstd::vectorresizing and reallocation./proc/meminfoMemAvailable to determine batch count. Processes posting lists in batches that fit within available memory, freeing each batch's inverted lists immediately after clustering. Prevents glibc heap fragmentation that previously retained ~38GB unreturnable memory post-build.release_build_memory()after save: Explicitly releasesvectors_andclustered_inverted_listsimmediately after writing to disk (beforedeleteIndex), combined withmallopt(M_MMAP_THRESHOLD, 128KB)to force large allocations through mmap for individual reclamation.Resource Usage
Build Parameters
idmap,seismic_sq,quantizer=8bit|vmin=0.0|vmax=4.0, quantization_ceiling_search=4.0idmap,seismicSummary
SQ uint8 trades 1.3% recall for:
Recommended operating points: