Skip to content

bench: investigate synthetic benchmark deltas after 0.5.0 release #246

@Fieldnote-Echo

Description

Summary

After the 0.5.0 crate release, investigate the synthetic bench_rank deltas against the committed README baseline. This is curious and worth understanding, but it is not a 0.5.0 release blocker.

A 10-run aggregate on the current 0.5.0-pre branch shows several large speedups that are plausible given recent kernel and two-stage optimization work, plus a few slower rows that should be explained before refreshing public synthetic claims.

Evidence

Command:

cargo run --release --example bench_rank

Method used for this note:

  • baseline: committed README / benchmarks/rank_modes_results.txt synthetic table
  • current: 10 consecutive runs on the current release-prep branch
  • machine: same local x86_64 AVX-512 desktop class as the existing synthetic fixture
  • quality columns are deterministic for this seeded synthetic corpus; latency and throughput are wall-clock measurements

Headline current 10-run p50 aggregate versus README baseline:

Mode Current p50 mean +/- sd ms Current CV README p50 ms Delta vs README Current R@10
Rank asym 3.6400 +/- 0.0748 2.1% 3.7118 -1.9% 0.8330
RankQuant b=4 asym 0.3258 +/- 0.0071 2.2% 0.3132 +4.0% 0.8095
RankQuant b=2 asym 0.2640 +/- 0.0082 3.1% 0.2382 +10.8% 0.5785
RankQuant b=2 FastScan 0.1086 +/- 0.0024 2.2% 0.0901 +20.6% 0.5845
RankQuant b=2 byte-LUT 0.6212 +/- 0.0168 2.7% 0.7542 -17.6% 0.5785
RankQuant b=4 byte-LUT 1.2246 +/- 0.0314 2.6% 1.6437 -25.5% 0.8095
Bitmap n_top=64 0.0634 +/- 0.0022 3.5% 0.0812 -21.9% 0.2495
SignBitmap probe 0.0498 +/- 0.0015 3.0% 0.0912 -45.4% 0.2745
TwoStage b=2 M=100 0.0538 +/- 0.0012 2.3% 0.0977 -44.9% 0.5795
TwoStage b=2 M=500 0.0675 +/- 0.0024 3.5% 0.1089 -38.0% 0.5785
TwoStage b=2 M=1000 0.0804 +/- 0.0029 3.6% 0.1225 -34.4% 0.5785
TwoStage b=2 M=5000 0.1884 +/- 0.0073 3.8% 0.2398 -21.4% 0.5785
SignTwoStage b=2 M=500 0.0676 +/- 0.0022 3.2% 0.1057 -36.0% 0.5785

Initial read

The large speedups in bitmap/sign/two-stage and byte-LUT rows are likely explained by recent kernel and caller-owned/two-stage optimization work. The b=2 asym and FastScan slowdowns are larger than the observed run-to-run CV, so they deserve a focused follow-up rather than being dismissed as jitter.

Suggested investigation after 0.5.0

  • Make the built-in synthetic benchmark optionally run and emit N-sample aggregates, probably with a flag such as --samples 10.
  • Preserve single-run output for quick local smoke runs.
  • Compare current main against the README baseline and, where useful, against nearby historical commits around the kernel/two-stage changes.
  • Attribute deltas to specific changes where possible: byte-LUT scoring, bitmap/sign probe kernels, two-stage candidate generation, exact rerank, FastScan layout/scoring, and RNG/corpus/ground-truth changes.
  • Refresh benchmarks/rank_modes_results.txt and README synthetic rows only after the investigation separates expected kernel wins from benchmark harness or corpus drift.

Non-goals

  • Do not block the 0.5.0 release on this.
  • Do not update public performance claims from one-off local timings.
  • Do not change BEIR/real-corpus claims as part of this issue; use the BEIR harness for those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    perfPerformance-relevant: scan/SIMD/alloc/memory/parallelismtestingTesting / CI / fuzz / bench

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions