Benchmark: SQ uint8 vs float32 on 138M docs (MSMarco V2 SPLADE)

## Benchmark Results: SQ uint8 vs Float32

**Setup**: 3x r6i.4xlarge (16 vCPU, 128GB RAM), JVM 16GB heap, 1 segment/shard (~45.4M docs each), total 136,362,605 docs (MSMarco V2 SPLADE).
**Query**: 3,903 queries, recall@10 against ground truth, top_n=3 (server-side token pruning).

### Search: Recall & Latency (top_n=3)

| heap_factor | SQ uint8 recall | float32 recall | recall gap | SQ uint8 p50 | float32 p50 | latency delta |
|---|---|---|---|---|---|---|
| 1.03 | 0.8283 | 0.8413 | -1.3% | 10ms | 3ms | +7ms* |
| 1.05 | 0.8428 | 0.8559 | -1.3% | 2ms | 3ms | -1ms |
| 1.07 | 0.8552 | 0.8682 | -1.3% | 2ms | 3ms | -1ms |
| **1.08** | **0.8606** | **0.8736** | **-1.3%** | **2ms** | **3ms** | **-1ms** |
| 1.10 | 0.8700 | 0.8834 | -1.3% | 2ms | 3ms | -1ms |
| 1.15 | 0.8874 | 0.9036 | -1.6% | 2ms | 3ms | -1ms |
| 1.20 | 0.8998 | 0.9172 | -1.7% | 3ms | 3ms | same |
| 1.50 | 0.9262 | 0.9467 | -2.1% | 4ms | 4ms | same |
| 2.00 | 0.9311 | 0.9514 | -2.0% | 6ms | 7ms | -1ms |

*hf=1.03 anomaly for SQ is a warmup artifact (first sweep point pays loading cost).

### Index Build Time (SQ uint8, optimized batched clustering)

| Phase | Duration | Notes |
|-------|----------|-------|
| Lucene segment merge | 34 min | Segment I/O (3 shards in parallel) |
| Batch add (CSR construction) | 1 min | With `reserve()` pre-allocation (previously 10 min without pre-allocation) |
| K-means clustering (32 threads) | 15 min | Memory-aware batched clustering with per-batch inverted list construction + immediate free |
| Save to disk | 2 min | 32GB sequential write |
| **Total** | **51 min** | |

#### Optimizations applied

1. **Batch add with `reserve()`**: Pre-allocates CSR vectors storage based on estimated total NNZ from first batch. Reduces batch add from ~10 min to 1 min by eliminating repeated `std::vector` resizing and reallocation.
2. **Memory-aware batched clustering**: Reads `/proc/meminfo` MemAvailable to determine batch count. Processes posting lists in batches that fit within available memory, freeing each batch's inverted lists immediately after clustering. Prevents glibc heap fragmentation that previously retained ~38GB unreturnable memory post-build.
3. **`release_build_memory()` after save**: Explicitly releases `vectors_` and `clustered_inverted_lists` immediately after writing to disk (before `deleteIndex`), combined with `mallopt(M_MMAP_THRESHOLD, 128KB)` to force large allocations through mmap for individual reclamation.

### Resource Usage

| Metric | SQ uint8 |
|--------|----------|
| .nsparse file size per shard | 32 GB |
| Post-load RSS (search steady state) | 49.5 GB |
| Peak RSS during build | 103.8 GB |

### Build Parameters

- **SQ uint8**: `idmap,seismic_sq,quantizer=8bit|vmin=0.0|vmax=4.0`, quantization_ceiling_search=4.0
- **Float32**: `idmap,seismic`
- Both: lambda=22,724 (auto), beta=2,272 (auto, 0.1×lambda), alpha=0.4, OMP_THREADS=32

### Summary

SQ uint8 trades 1.3% recall for:
- **33% faster search** (2ms vs 3ms p50 at hf=1.08)
- **50% less disk/RAM** (32GB vs 62GB per shard)

Recommended operating points:
- hf=1.08: 86% recall @ 2ms p50 (best latency)
- hf=1.15: 89% recall @ 2ms p50 (best recall/latency balance)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: SQ uint8 vs float32 on 138M docs (MSMarco V2 SPLADE) #12

Benchmark Results: SQ uint8 vs Float32

Search: Recall & Latency (top_n=3)

Index Build Time (SQ uint8, optimized batched clustering)

Optimizations applied

Resource Usage

Build Parameters

Summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

heap_factor	SQ uint8 recall	float32 recall	recall gap	SQ uint8 p50	float32 p50	latency delta
1.03	0.8283	0.8413	-1.3%	10ms	3ms	+7ms*
1.05	0.8428	0.8559	-1.3%	2ms	3ms	-1ms
1.07	0.8552	0.8682	-1.3%	2ms	3ms	-1ms
1.08	0.8606	0.8736	-1.3%	2ms	3ms	-1ms
1.10	0.8700	0.8834	-1.3%	2ms	3ms	-1ms
1.15	0.8874	0.9036	-1.6%	2ms	3ms	-1ms
1.20	0.8998	0.9172	-1.7%	3ms	3ms	same
1.50	0.9262	0.9467	-2.1%	4ms	4ms	same
2.00	0.9311	0.9514	-2.0%	6ms	7ms	-1ms

Phase	Duration	Notes
Lucene segment merge	34 min	Segment I/O (3 shards in parallel)
Batch add (CSR construction)	1 min	With `reserve()` pre-allocation (previously 10 min without pre-allocation)
K-means clustering (32 threads)	15 min	Memory-aware batched clustering with per-batch inverted list construction + immediate free
Save to disk	2 min	32GB sequential write
Total	51 min

Metric	SQ uint8
.nsparse file size per shard	32 GB
Post-load RSS (search steady state)	49.5 GB
Peak RSS during build	103.8 GB

Uh oh!

Benchmark: SQ uint8 vs float32 on 138M docs (MSMarco V2 SPLADE) #12

Description

Benchmark Results: SQ uint8 vs Float32

Search: Recall & Latency (top_n=3)

Index Build Time (SQ uint8, optimized batched clustering)

Optimizations applied

Resource Usage

Build Parameters

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions