Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,8 @@ The runtime dependency floor is `numpy>=2.2`.
### Threading / concurrency

`ordvec` supports concurrent read-only/search use. Mutation is exclusive.
The consolidated cross-language ownership and lifetime contract is in
[`docs/bindings-safety.md`](docs/bindings-safety.md).

Python search, candidate-generation, and scoring methods release the GIL and
read NumPy inputs in place. Callers must not mutate query, corpus, candidate,
Expand Down Expand Up @@ -286,6 +288,10 @@ candidate slices passed to `Search` until the call returns.
through its own package gate; use the GitHub checkout for `ordvec-ffi/`,
`ordvec-go/`, and
[`docs/c-api.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/c-api.md).
- **Bindings safety and ownership contract:**
[`docs/bindings-safety.md`](docs/bindings-safety.md)
- **Artifact and platform matrix:**
[`docs/artifact-platform-matrix.md`](docs/artifact-platform-matrix.md)
- **Pre-1.0 compatibility policy:**
[`docs/compatibility-policy.md`](docs/compatibility-policy.md) defines the
stable, experimental, repo-local sidecar, persisted-format, examples/docs,
Expand Down
2 changes: 2 additions & 0 deletions RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,8 @@ filename. Until a record is updated, the corresponding gated publish fails
`*.intoto.jsonl` all present);
- `gh attestation verify <file> -R Fieldnote-Echo/ordvec` on a downloaded
artifact;
- compare the observed release assets against
[`docs/artifact-platform-matrix.md`](docs/artifact-platform-matrix.md);
- for a coordinated release, the Zenodo deposit.

## Coordinated release note
Expand Down
85 changes: 85 additions & 0 deletions docs/artifact-platform-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Artifact and Platform Matrix

This matrix is the release-facing inventory for what `ordvec` publishes, what
is repo-local, and which platform expectations are checked by the release
workflow. It complements `RELEASING.md` and `docs/msrv-and-features.md`; the
workflow remains the source of truth for the exact build jobs. The matrix
documents packaging and distribution compatibility for release verification. It
is not a service support commitment, runtime SLA, or guarantee that every host
environment matching a platform family is supported.

## Published Artifacts

| Surface | Published where | Platform/build contract | Release verification |
| --- | --- | --- | --- |
| `ordvec` Rust crate | crates.io package `ordvec`; GitHub Release `.crate` asset | Rust 1.89 MSRV; default features empty; pure Rust, no BLAS/LAPACK/system numeric dependency | `cargo package --locked`; GitHub/Sigstore/SLSA provenance; pre-publish and post-publish byte identity against crates.io |
| `ordvec-manifest` Rust crate | crates.io package `ordvec-manifest`; GitHub Release `.crate` asset | Rust 1.89 MSRV; default features empty; optional `cli`, `sqlite`, and `sqlite-bundled` features | Built after matching `ordvec` exists; GitHub/Sigstore/SLSA provenance; byte identity against crates.io |
| Python `ordvec` | PyPI package `ordvec`; GitHub Release wheels and sdist | CPython 3.10+ abi3; `numpy>=2.2`; wheels for Linux x86_64 and Linux aarch64 are manylinux/glibc wheels; no musllinux/Alpine wheel is shipped yet; macOS aarch64 and Windows x64 wheels are also published; native extension modules are embedded in the wheel and do not load a separate `ordvec_ffi` library | Canonical wheel/sdist selection; linux/aarch64 native smoke; PyPI hash verification; PEP 740 attestation on fresh upload |
| Python `ordvec-manifest` | PyPI package `ordvec-manifest`; GitHub Release wheels and sdist | CPython 3.10+ abi3; Linux wheels are manylinux/glibc for x86_64 and aarch64; no musllinux/Alpine wheel is shipped yet; macOS aarch64 and Windows x64 wheels are also published; native extension modules are embedded in the wheel | Canonical wheel/sdist selection; linux/aarch64 native smoke; PyPI hash verification; PEP 740 attestation on fresh upload |
| Node/WASM | Not shipped; no npm package is published yet | Placeholder for issue #138; no JavaScript, TypeScript, or wasm package support is promised by this release | No release verification until a future packaging lane adds build jobs |
| JVM | Not shipped; no Maven/Gradle package is published yet | Placeholder for issue #139; no Java/Kotlin package support is promised by this release | No release verification until a future packaging lane adds build jobs |

The Python release currently expects exactly four wheels plus one sdist for
each Python package. There is no macOS x86_64 wheel leg in the current release
workflow. Linux users on musl-based distributions should build from source or
from the sdist unless a future release adds a `musllinux` wheel leg.

## Repo-Local Sidecars

| Surface | Published where | Platform/build contract | Release role |
| --- | --- | --- | --- |
| `ordvec-ffi` | Not published to crates.io; built from the repository | Rust 1.89; emits `rlib`, `cdylib`, and `staticlib`; ABI v1 header is committed under `ordvec-ffi/include/`; cdylibs are named `libordvec_ffi.so`, `libordvec_ffi.dylib`, or `ordvec_ffi.dll`; static archives are named such as `libordvec_ffi.a` | C ABI compatibility surface for embedders; CI checks header drift and C link smoke; embedders must pair header and native library from the same git tag, require `ordvec_abi_version() == 1`, and compare `ordvec_version_string()` with the packaged native library |
| `ordvec-go` | Not published as a Go module release from this repo | Thin cgo wrapper over `ordvec-ffi`; links the local Rust library from the same git tag and ABI version | Binding smoke and race/cgocheck coverage for the C ABI contract; consumers must not mix Go wrapper, generated header, and native library from different tags |
| `benchmarks/beir-bench` | Not shipped in the published crate or wheels | Workspace benchmark crate with `publish = false` | Release-adjacent benchmark harness only; not a shipped user dependency |
| `fuzz/` | Not a workspace member and not published | `cargo-fuzz` crate with its own lockfile | Loader and parser hardening gate; release workflow runs smoke fuzz jobs |

Comment thread
Fieldnote-Echo marked this conversation as resolved.
Loading/linking strategy for the repo-local native sidecars is part of the
release contract: C embedders load or link the `ordvec-ffi` dynamic/static
library that matches the checked-in ABI v1 header; Go uses cgo to link that
same local `ordvec-ffi` build; Python wheels embed their own native extension
modules and do not load the repo-local `ordvec_ffi` shared library. Keep the
header, Go wrapper, and native libraries on the same git tag and ABI version.

## Native Libraries and Version Alignment

- The Rust crates published to crates.io do not require a separately installed
native numeric library.
- The repo-local `ordvec-ffi` crate is named `ordvec_ffi`; dynamic builds emit
`libordvec_ffi.so` on Linux, `libordvec_ffi.dylib` on macOS, and
`ordvec_ffi.dll` on Windows. Static builds emit archives such as
`libordvec_ffi.a`.
- Python wheels embed their native extension modules inside the wheel. They do
not load a separately installed `ordvec_ffi` shared library.
- The Go wrapper links through cgo against a local `ordvec-ffi` build. Use the
Go package, generated header, and native library from the same git tag and ABI
version.
- C and Go embedders should check `ordvec_abi_version() == 1` and
`ordvec_version_string()` against the packaged native library. Do not mix
headers, Go wrappers, and native libraries from different tags.

## SBOM Policy

The release workflow generates CycloneDX SBOMs for the Rust crate, manifest
crate, Python binding crate, and manifest Python binding crate as workflow
artifacts. Current PyPI distributions and GitHub Release assets do not embed or
attach those SBOM files. Published release assets are the canonical `.crate`,
wheel, and sdist files plus Sigstore and SLSA/in-toto provenance assets.

## Platform Notes

- SIMD dispatch in the core crate is not feature-gated. x86_64 dispatches
AVX-512 and AVX2 at runtime where available, aarch64 uses NEON, wasm32 can
use `simd128` when built with that target feature, and unsupported targets
use scalar fallback paths.
Comment thread
Fieldnote-Echo marked this conversation as resolved.
- Native library naming, loading/linking strategy, and same-tag version
alignment are documented above in "Native Libraries and Version Alignment";
those rules are part of this platform matrix, not optional packaging notes.
- Published Python wheels are abi3, so one wheel per platform covers CPython
3.10 and newer for that platform.
- The release workflow keeps the GitHub Release draft until both Rust crates
and both Python packages have published successfully. A registry failure
leaves the release draft unpublished.
- GitHub Release assets include the canonical crate, wheel, and sdist files
plus provenance/attestation assets generated by the workflow. Verify
downloaded assets with `gh attestation verify` and registry-served hash
checks before treating them as deployment inputs.
105 changes: 105 additions & 0 deletions docs/bindings-safety.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Bindings Safety and Ownership

This is the cross-language contract for embedders using the Rust crate, Python
package, C ABI, or Go wrapper. It consolidates the binding notes that otherwise
live near each implementation. It does not add a new runtime policy: callers
still own scheduling, path trust, input mutability, and deployment provenance.

## Concurrency

`ordvec` is read-concurrent and mutation-exclusive.

- Rust index values can be searched concurrently through shared references.
Mutation methods such as `add` require exclusive access.
- Python search, candidate-generation, scoring, and `add` methods release the
GIL while Rust performs the heavy work. PyO3 still enforces object borrow
rules, but caller-owned NumPy arrays are read in place while the GIL is
released.
- The C ABI permits concurrent `ordvec_index_search`,
`ordvec_index_probe`, and `ordvec_index_info` calls on one loaded handle.
`ordvec_index_free` must not race with any other call on that handle.
- The Go wrapper serializes `Close` against `Search` and `Info`; after
`Close`, both methods return `ErrClosed`.

## Borrowed Inputs

Caller-provided buffers are borrowed for the duration of the call and are not
retained after the function returns.

- Do not mutate Rust slices, NumPy arrays, C buffers, or Go slices while a call
that received them is in progress.
- Query, corpus, candidate, output, hit, and stats buffers remain caller-owned
unless a specific API says otherwise.
- Candidate lists are entry lists, not sets. Duplicate candidate IDs are scored
Comment thread
Fieldnote-Echo marked this conversation as resolved.
independently, count toward candidate and vector-scored statistics, and can
produce duplicate hits. Deduplicate before calling when unique row IDs or
waste-free scoring matter.

## Returned Memory

Current C ABI search calls do not return heap-owned result buffers. Callers
allocate and retain ownership of `hits_out`, `returned_out`, and `stats_out`.
`ordvec_index_load` returns an opaque handle that must be freed exactly once
with `ordvec_index_free` after all concurrent calls using that handle have
finished; ABI v1 has no general `ordvec_free` for result memory.

The Go wrapper copies C search hits into Go-owned slices. It frees temporary
`C.CString` values internally and releases the C index handle through `Close`
or its finalizer; Go callers do not free C result buffers.

## Rows and External IDs

Core search results use internal row ordinals. The primitive persisted formats
do not carry an application ID map.

`ordvec-manifest` can bind an application-owned ID sidecar as a required
auxiliary artifact, but the primitive Rust, C, Go, and Python search paths still
return row ordinals. Host systems should maintain their own row-to-application
ID map and verify it together with the index when crossing a trust boundary.

## Paths and Trust

`write` and `load` paths are trusted input. The core crate, Python binding, C
ABI, and Go wrapper forward paths to the filesystem without path traversal
sanitization or sandboxing.
Comment thread
Fieldnote-Echo marked this conversation as resolved.

Services that derive paths from user input should canonicalize and constrain
paths before calling `ordvec`, or use an application storage layer that never
exposes raw path choice to callers. Resolve paths against an allowed base
directory after symlink resolution, then reject any resolved path outside that
base. In Rust, use `std::fs::canonicalize`; in Python, use `pathlib.Path.resolve`;
in Go, combine lexical cleanup such as `filepath.Clean` with symlink-aware
resolution such as `filepath.EvalSymlinks`. For artifact integrity and sidecar
binding, use `ordvec-manifest`; it verifies hashes, declared metadata,
auxiliary artifacts, and attestation shape, but it does not sign files or
decide key policy.

## Errors and Panics

- The Rust crate keeps fail-loud panicking constructors and methods where that
is the documented API. Existing `try_*` helpers return `OrdvecError` only
where explicitly provided.
- Python validates dimensions, dtypes, contiguity where required, finite
values, candidate ranges, and capacities at the boundary so common invalid
inputs raise typed Python exceptions instead of surfacing opaque Rust panics.
- The C ABI catches Rust panics at status-returning FFI boundaries and returns
`ORDVEC_STATUS_PANIC`; no Rust unwind crosses the ABI boundary. The same
thread's `ordvec_last_error()` is set to the panic payload when it is a string
or to fallback panic text otherwise. Successful status-returning calls clear
that thread-local error. Non-status helpers such as `ordvec_last_error()`,
version/status accessors, and `ordvec_index_free` do not report fallible
status.
- The Go wrapper maps C status values to Go errors and preserves the C ABI
pointer and lifetime rules.

## Release Review Checklist

When a change touches a binding, review these questions before release:

- Does the change preserve read-concurrent, mutation-exclusive behavior?
- Are borrowed buffers still borrowed only for the documented call duration?
- Are path-trust assumptions unchanged or documented?
- Are row ordinals, duplicate candidates, and result shapes still described
consistently across Rust, Python, C, and Go?
- If a validation rule changes, is it a documented hardening fix rather than a
silent compatibility break?
4 changes: 3 additions & 1 deletion docs/c-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,9 @@ calling thread's `ordvec_last_error()` string. On failure, they set it to a
human-readable detail string. The pointer returned by `ordvec_last_error()` is
thread-local and valid until the next fallible `ordvec` C call on that same
thread. ABI v1 compatibility expectations are governed by the
[pre-1.0 compatibility policy](compatibility-policy.md).
[pre-1.0 compatibility policy](compatibility-policy.md). The cross-language
ownership, threading, and path-trust contract is summarized in
[bindings-safety.md](bindings-safety.md).

Panics are caught and returned as `ORDVEC_STATUS_PANIC`; no Rust unwind crosses
the C ABI. The library does not install a global panic hook, so the Rust
Expand Down
3 changes: 2 additions & 1 deletion docs/msrv-and-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
This matrix is the release-facing build contract for downstream embedders,
packagers, and host systems. It complements the
[pre-1.0 compatibility policy](compatibility-policy.md), which defines how
compatibility-impacting changes are classified.
compatibility-impacting changes are classified. The release artifact and wheel
target inventory lives in [artifact-platform-matrix.md](artifact-platform-matrix.md).

Current MSRV: Rust 1.89.

Expand Down
21 changes: 20 additions & 1 deletion ordvec-ffi/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -978,7 +978,7 @@ pub unsafe extern "C" fn ordvec_index_search(
mod tests {
use super::*;
use ordvec::{Bitmap, Rank, SignBitmap};
use std::ffi::CString;
use std::ffi::{CStr, CString};
use std::io::Write;

fn temp_path(name: &str, ext: &str) -> std::path::PathBuf {
Expand Down Expand Up @@ -1042,6 +1042,25 @@ mod tests {
assert_eq!(std::mem::size_of::<ordvec_search_stats_t>(), 184);
}

#[test]
fn ffi_boundary_translates_panic_to_status_and_last_error() {
clear_last_error();

let status = ffi_boundary(|| -> Result<(), FfiError> {
panic!("ffi boundary panic smoke");
});
assert_eq!(status, ORDVEC_STATUS_PANIC);

let last_error = unsafe { CStr::from_ptr(ordvec_last_error()) }
.to_str()
.unwrap();
assert_eq!(last_error, "ffi boundary panic smoke");

assert_eq!(ffi_boundary(|| Ok(())), ORDVEC_STATUS_OK);
let cleared = unsafe { CStr::from_ptr(ordvec_last_error()) };
assert_eq!(cleared.to_bytes(), b"");
}

#[test]
fn load_info_and_free_rankquant() {
let path = make_rankquant_fixture();
Expand Down
2 changes: 2 additions & 0 deletions ordvec-go/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# ordvec-go

Thin cgo wrapper over the local `ordvec-ffi` C ABI.
The shared Rust/Python/C/Go ownership and lifetime contract is documented in
[`../docs/bindings-safety.md`](../docs/bindings-safety.md).

Build the Rust library before running Go tests or linking a Go program:

Expand Down
3 changes: 3 additions & 0 deletions ordvec-go/doc.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,7 @@
// Candidate slices are entry lists, not sets. Duplicate candidate IDs are scored
// independently and can produce duplicate hits; callers that require unique row
// IDs should deduplicate before Search.
//
// See ../docs/bindings-safety.md for the cross-language ownership and lifetime
// contract shared by the Rust, Python, C, and Go surfaces.
package ordvec
9 changes: 9 additions & 0 deletions ordvec-python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,15 @@ Wheels target CPython 3.10+ (abi3) and require `numpy>=2.2`. Building from
source needs a Rust toolchain (MSRV 1.89) and
[maturin](https://www.maturin.rs/).

## Safety contract

The Python binding releases the GIL while Rust searches, scores, and mutates
indexes. NumPy arrays passed to those methods are read in place while the call
is active; do not mutate them from another thread until the method returns.
The cross-language ownership and lifetime contract is maintained in
[`docs/bindings-safety.md`](https://github.com/Fieldnote-Echo/ordvec/blob/v0.5.0/docs/bindings-safety.md)
for this release line.

## Type stubs

The package ships hand-written type stubs (`_ordvec.pyi`) and a `py.typed`
Expand Down
Loading