Skip to content

perf(api): add SignProbeScratch and allocation-reusing sign candidate APIs #234

@Fieldnote-Echo

Description

Context

Downstream DB integrations now use the caller-owned two-stage path:

  • SignBitmap::top_m_candidates_batched_serial_csr() for stage-1 sign probing
  • RankQuant::search_asymmetric_subset_batched_serial_into() for rerank

The rerank half has SubsetScratch and caller-provided output buffers. The sign-probe half still routes through top_m_candidates(), which allocates per query:

  • query bitmap
  • scores: Vec<u32>(n_vectors)
  • idx: Vec<u32>(n_vectors)
  • head: Vec<u32>(m)

Now that the sign scan kernels are fast, these allocations and index materialization are a visible part of the integration overhead.

Proposed API

Add caller-owned scratch and _into APIs for sign candidate generation, something along these lines:

pub struct SignProbeScratch { ... }

impl SignBitmap {
    pub fn top_m_candidates_into(
        &self,
        query: &[f32],
        m: usize,
        scratch: &mut SignProbeScratch,
        out: &mut Vec<u32>,
    );

    pub fn top_m_candidates_batched_serial_csr_into(
        &self,
        queries: &[f32],
        m: usize,
        scratch: &mut SignProbeScratch,
        offsets: &mut Vec<usize>,
        candidates: &mut Vec<u32>,
    );
}

Exact shape can change; the important contract is caller-owned capacity reuse across repeated calls.

Acceptance criteria

  • Same candidate order and deterministic tie policy as current top_m_candidates().
  • No heap allocation on warmed repeated calls for fixed (n_vectors, dim, m, nq) on the SIMD path.
  • Reuses score/index/query-bitmap buffers in scratch.
  • Keeps current allocating APIs as convenience wrappers.
  • Tests compare _into vs existing APIs across empty, small, tied, and normal cases.
  • Include a focused microbench at Harrier-1024 and BGE-768 shapes.

Motivation

OrdinalDB can already use the new caller-owned rerank path, but stage-1 still has allocation overhead. This should be the next practical performance API for DB integrations after the 0.5.0 batched subset rerank work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    core-apiCore search/index public API surface (pre-1.0)perfPerformance-relevant: scan/SIMD/alloc/memory/parallelismrustPull requests that update rust code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions