Skip to content

v0.2.0: Early termination + rayon parallelism for high-dim performance#1

Merged
tugberkcapraz merged 3 commits into
mainfrom
feat/v0.2.0-perf-optimizations
Mar 13, 2026
Merged

v0.2.0: Early termination + rayon parallelism for high-dim performance#1
tugberkcapraz merged 3 commits into
mainfrom
feat/v0.2.0-perf-optimizations

Conversation

@tugberkcapraz
Copy link
Copy Markdown
Owner

Summary

  • Early termination in Euclidean distance computation: accumulates squared differences in chunks of 4 dimensions, bails out as soon as partial sum exceeds eps². For high-dimensional embeddings (996-dim, eps=1.2), most non-neighbor pairs are rejected after computing only 5-15% of dimensions.
  • Rayon parallel scan for datasets above 1,000 points: the brute-force neighbor query is parallelized across CPU cores.
  • Benchmark added (cargo bench --bench batch_scaling): 10 batches x 1,500 points, 996-dim L2-normalized vectors.
  • Version bumped to 0.2.0 across Cargo.toml, pyproject.toml, __init__.py.

Benchmark results (10 batches x 1500 points, 996-dim, eps=1.2)

Batch Points v0.1.0 v0.2.0 Speedup
1 1,500 1.6s 0.5s 3.0x
5 7,500 14.5s 2.3s 6.4x
10 15,000 29.6s 4.3s 6.8x
Total 160.2s 24.9s 6.4x

Test plan

  • All 25 Rust unit tests pass (including new test_early_termination_correctness)
  • CI: Rust tests pass on ubuntu
  • CI: Python tests pass across 3.9-3.13

🤖 Generated with Claude Code

tugberkcapraz and others added 3 commits March 13, 2026 15:35
…ormance

Spatial index query_radius now uses early termination for Euclidean distance:
squared differences are accumulated in chunks of 4 dimensions and bailed out
early when the partial sum exceeds eps². For 996-dim embeddings with eps=1.2,
most non-neighbor pairs are rejected after ~5-15% of dimensions.

Above 1000 points, the brute-force scan is parallelized across CPU cores
via rayon. Below that threshold, sequential scan avoids thread pool overhead.

Benchmark (10 batches x 1500 points, 996-dim, eps=1.2):
  v0.1.0: 160.2s total
  v0.2.0:  24.9s total (6.4x faster)

Bump version to 0.2.0 across Cargo.toml, pyproject.toml, and __init__.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build wheels for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and
Windows (x86_64) on every push/PR. This catches packaging failures before
merge rather than at release time. No publishing — just build verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Linux aarch64 cross-compilation container lacks Python interpreters,
causing maturin to fail. Specify -i 3.12 explicitly (matching the release
workflow pattern). One version is sufficient to verify the build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tugberkcapraz tugberkcapraz merged commit c21eba0 into main Mar 13, 2026
21 of 22 checks passed
@tugberkcapraz tugberkcapraz deleted the feat/v0.2.0-perf-optimizations branch March 13, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant