perf(api): add SignProbeScratch and allocation-reusing sign candidate APIs

## Context

Downstream DB integrations now use the caller-owned two-stage path:

- `SignBitmap::top_m_candidates_batched_serial_csr()` for stage-1 sign probing
- `RankQuant::search_asymmetric_subset_batched_serial_into()` for rerank

The rerank half has `SubsetScratch` and caller-provided output buffers. The sign-probe half still routes through `top_m_candidates()`, which allocates per query:

- query bitmap
- `scores: Vec<u32>(n_vectors)`
- `idx: Vec<u32>(n_vectors)`
- `head: Vec<u32>(m)`

Now that the sign scan kernels are fast, these allocations and index materialization are a visible part of the integration overhead.

## Proposed API

Add caller-owned scratch and `_into` APIs for sign candidate generation, something along these lines:

```rust
pub struct SignProbeScratch { ... }

impl SignBitmap {
    pub fn top_m_candidates_into(
        &self,
        query: &[f32],
        m: usize,
        scratch: &mut SignProbeScratch,
        out: &mut Vec<u32>,
    );

    pub fn top_m_candidates_batched_serial_csr_into(
        &self,
        queries: &[f32],
        m: usize,
        scratch: &mut SignProbeScratch,
        offsets: &mut Vec<usize>,
        candidates: &mut Vec<u32>,
    );
}
```

Exact shape can change; the important contract is caller-owned capacity reuse across repeated calls.

## Acceptance criteria

- Same candidate order and deterministic tie policy as current `top_m_candidates()`.
- No heap allocation on warmed repeated calls for fixed `(n_vectors, dim, m, nq)` on the SIMD path.
- Reuses score/index/query-bitmap buffers in scratch.
- Keeps current allocating APIs as convenience wrappers.
- Tests compare `_into` vs existing APIs across empty, small, tied, and normal cases.
- Include a focused microbench at Harrier-1024 and BGE-768 shapes.

## Motivation

OrdinalDB can already use the new caller-owned rerank path, but stage-1 still has allocation overhead. This should be the next practical performance API for DB integrations after the 0.5.0 batched subset rerank work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(api): add SignProbeScratch and allocation-reusing sign candidate APIs #234

Context

Proposed API

Acceptance criteria

Motivation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

perf(api): add SignProbeScratch and allocation-reusing sign candidate APIs #234

Description

Context

Proposed API

Acceptance criteria

Motivation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions