bench: investigate synthetic benchmark deltas after 0.5.0 release

## Summary

After the 0.5.0 crate release, investigate the synthetic `bench_rank` deltas against the committed README baseline. This is curious and worth understanding, but it is **not a 0.5.0 release blocker**.

A 10-run aggregate on the current 0.5.0-pre branch shows several large speedups that are plausible given recent kernel and two-stage optimization work, plus a few slower rows that should be explained before refreshing public synthetic claims.

## Evidence

Command:

```sh
cargo run --release --example bench_rank
```

Method used for this note:

- baseline: committed README / `benchmarks/rank_modes_results.txt` synthetic table
- current: 10 consecutive runs on the current release-prep branch
- machine: same local x86_64 AVX-512 desktop class as the existing synthetic fixture
- quality columns are deterministic for this seeded synthetic corpus; latency and throughput are wall-clock measurements

Headline current 10-run p50 aggregate versus README baseline:

| Mode | Current p50 mean +/- sd ms | Current CV | README p50 ms | Delta vs README | Current R@10 |
| --- | ---: | ---: | ---: | ---: | ---: |
| Rank asym | 3.6400 +/- 0.0748 | 2.1% | 3.7118 | -1.9% | 0.8330 |
| RankQuant b=4 asym | 0.3258 +/- 0.0071 | 2.2% | 0.3132 | +4.0% | 0.8095 |
| RankQuant b=2 asym | 0.2640 +/- 0.0082 | 3.1% | 0.2382 | +10.8% | 0.5785 |
| RankQuant b=2 FastScan | 0.1086 +/- 0.0024 | 2.2% | 0.0901 | +20.6% | 0.5845 |
| RankQuant b=2 byte-LUT | 0.6212 +/- 0.0168 | 2.7% | 0.7542 | -17.6% | 0.5785 |
| RankQuant b=4 byte-LUT | 1.2246 +/- 0.0314 | 2.6% | 1.6437 | -25.5% | 0.8095 |
| Bitmap n_top=64 | 0.0634 +/- 0.0022 | 3.5% | 0.0812 | -21.9% | 0.2495 |
| SignBitmap probe | 0.0498 +/- 0.0015 | 3.0% | 0.0912 | -45.4% | 0.2745 |
| TwoStage b=2 M=100 | 0.0538 +/- 0.0012 | 2.3% | 0.0977 | -44.9% | 0.5795 |
| TwoStage b=2 M=500 | 0.0675 +/- 0.0024 | 3.5% | 0.1089 | -38.0% | 0.5785 |
| TwoStage b=2 M=1000 | 0.0804 +/- 0.0029 | 3.6% | 0.1225 | -34.4% | 0.5785 |
| TwoStage b=2 M=5000 | 0.1884 +/- 0.0073 | 3.8% | 0.2398 | -21.4% | 0.5785 |
| SignTwoStage b=2 M=500 | 0.0676 +/- 0.0022 | 3.2% | 0.1057 | -36.0% | 0.5785 |

## Initial read

The large speedups in bitmap/sign/two-stage and byte-LUT rows are likely explained by recent kernel and caller-owned/two-stage optimization work. The b=2 asym and FastScan slowdowns are larger than the observed run-to-run CV, so they deserve a focused follow-up rather than being dismissed as jitter.

## Suggested investigation after 0.5.0

- Make the built-in synthetic benchmark optionally run and emit N-sample aggregates, probably with a flag such as `--samples 10`.
- Preserve single-run output for quick local smoke runs.
- Compare current `main` against the README baseline and, where useful, against nearby historical commits around the kernel/two-stage changes.
- Attribute deltas to specific changes where possible: byte-LUT scoring, bitmap/sign probe kernels, two-stage candidate generation, exact rerank, FastScan layout/scoring, and RNG/corpus/ground-truth changes.
- Refresh `benchmarks/rank_modes_results.txt` and README synthetic rows only after the investigation separates expected kernel wins from benchmark harness or corpus drift.

## Non-goals

- Do not block the 0.5.0 release on this.
- Do not update public performance claims from one-off local timings.
- Do not change BEIR/real-corpus claims as part of this issue; use the BEIR harness for those.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench: investigate synthetic benchmark deltas after 0.5.0 release #246

Summary

Evidence

Initial read

Suggested investigation after 0.5.0

Non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mode	Current p50 mean +/- sd ms	Current CV	README p50 ms	Delta vs README	Current R@10
Rank asym	3.6400 +/- 0.0748	2.1%	3.7118	-1.9%	0.8330
RankQuant b=4 asym	0.3258 +/- 0.0071	2.2%	0.3132	+4.0%	0.8095
RankQuant b=2 asym	0.2640 +/- 0.0082	3.1%	0.2382	+10.8%	0.5785
RankQuant b=2 FastScan	0.1086 +/- 0.0024	2.2%	0.0901	+20.6%	0.5845
RankQuant b=2 byte-LUT	0.6212 +/- 0.0168	2.7%	0.7542	-17.6%	0.5785
RankQuant b=4 byte-LUT	1.2246 +/- 0.0314	2.6%	1.6437	-25.5%	0.8095
Bitmap n_top=64	0.0634 +/- 0.0022	3.5%	0.0812	-21.9%	0.2495
SignBitmap probe	0.0498 +/- 0.0015	3.0%	0.0912	-45.4%	0.2745
TwoStage b=2 M=100	0.0538 +/- 0.0012	2.3%	0.0977	-44.9%	0.5795
TwoStage b=2 M=500	0.0675 +/- 0.0024	3.5%	0.1089	-38.0%	0.5785
TwoStage b=2 M=1000	0.0804 +/- 0.0029	3.6%	0.1225	-34.4%	0.5785
TwoStage b=2 M=5000	0.1884 +/- 0.0073	3.8%	0.2398	-21.4%	0.5785
SignTwoStage b=2 M=500	0.0676 +/- 0.0022	3.2%	0.1057	-36.0%	0.5785

Uh oh!

bench: investigate synthetic benchmark deltas after 0.5.0 release #246

Description

Summary

Evidence

Initial read

Suggested investigation after 0.5.0

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions