Skip to content

Initial 0.1.0 release: EmbeddingGemma ONNX inference#1

Merged
uqio merged 11 commits intomainfrom
0.1.0
May 2, 2026
Merged

Initial 0.1.0 release: EmbeddingGemma ONNX inference#1
uqio merged 11 commits intomainfrom
0.1.0

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented May 2, 2026

Summary

Replaces the template-rs scaffold with the initial release of
egemma — a Rust ONNX Runtime wrapper for Google's
google/embeddinggemma-300m text encoder. Produces 768-dim
L2-normalized sentence embeddings via [ort] and [tokenizers].

Mirrors the SigLIP2 text-tower API style (from_files /
from_files_with_options / from_ort_session_with_options,
embed / embed_batch / warmup), but tailored to the
[batch, sequence_length] → [batch, 768] contract of the
EmbeddingGemma ONNX export (input_ids + attention_mask in,
sentence_embedding out).

Public API

  • TextEncoder — owns one ort::Session and one Tokenizer.
    Send + !Sync. Constructors: from_files, from_files_with_options,
    from_ort_session, from_ort_session_with_options. Methods:
    embed, embed_batch (chunked), warmup.
  • Embedding(Arc<[f32]>) — 768-dim, L2-normalized at construction.
    Zero panic surface: try_cosine returns Result<f32, Error> instead
    of panicking on dim mismatch. into_inner() exposes the Arc<[f32]>
    cheaply (no copy). TryFrom<Vec<f32>> validates dim and unit-norm
    for caller-supplied vectors. Embedding deliberately does not derive
    Serialize/Deserialize — round-trip via the inner slice so the
    invariants are re-checked.
  • Options / BatchOptions / ThreadOptionsconst fn builders
    (with_*/set_*); zero public fields. Validation runs at encoder
    construction (batch_size ∈ 1..=max_batch_size, max_seq_len > 0).
  • Error#[non_exhaustive], thiserror-derived. Includes
    EmbeddingDim, NotNormalized, BatchTooLarge, Batch { index, source }, SessionShapeMismatch, SessionContractMismatch (dtype
    errors), InvalidBatchSize, InvalidMaxSeqLen, etc.

Cargo features

Feature Default Effect
inference Pulls ort + tokenizers; activates TextEncoder. Native targets only.
serde Serialize / Deserialize on Options, BatchOptions, ThreadOptions. dep:serde is opt-in.
cuda NVIDIA GPUs. Requires CUDA toolkit + cuDNN at build/run time.
tensorrt NVIDIA, optimized inference. Falls back to CUDA, then CPU.
directml Windows GPUs (any vendor) via DirectX 12.
rocm AMD GPUs. Requires ROCm SDK.
coreml macOS / iOS via Core ML (Neural Engine + GPU + Metal).

The execution-provider features are off by default — none required for
CPU inference, and each requires its vendor SDK at build time.

SIMD

Embedding::try_cosine dispatches the 768-element f32 dot product
through a runtime-detected backend:

  • NEON on aarch64 (ISA baseline; always available).
  • AVX2 + FMA on x86_64, gated on is_x86_feature_detected! for both.
  • Scalar four-accumulator fallback elsewhere.

The unsafe per-arch kernels take &[f32; 768] rather than &[f32]
the type-level length invariant is what makes the raw-pointer reads
sound. A wrong-length slice can never reach the unsafe boundary
(release-mode debug_assert! would have been the alternative — and
strippable). The dispatcher short-circuits to scalar under
cfg!(miri) so the Miri matrix exercises the same call sites without
entering platform intrinsics it can't model.

Target / feature contract

The inference feature is native-only. It pulls ort (ONNX
Runtime FFI) and tokenizers (which transitively depends on C-only
libraries like onig_sys), neither of which builds on wasm32-*.

Wasm consumers must opt out:

cargo check --target wasm32-unknown-unknown --no-default-features

Without inference, the public surface is Embedding, Options /
BatchOptions / ThreadOptions, and Error — useful for browser /
edge runtimes that compute embeddings server-side and need only the
value types and try_cosine.

Testing

  • Unit tests: 34 with default features, 24 without. Cover
    Embedding invariants, BatchOptions::validate, try_cosine error
    paths, SIMD agreement (scalar / NEON / dispatcher), tokenizer constants,
    per-row vs chunk-level error wrapping in embed_batch.
  • Integration tests: 4 tests gated on EGEMMA_MODEL_DIR. Cover the
    full from_filesembed_batchtry_cosine flow against the
    released embedding-gemma ONNX. Print a grep-able [INTEGRATION-SKIP]
    banner in CI logs when the env var is unset; developer-local
    responsibility to run with the model present before merging changes
    to text_enc.rs / simd/ / embedding.rs.
  • Miri: tested on aarch64-apple-darwin, x86_64-linux, i686-linux,
    powerpc64-linux, s390x-linux, riscv64-linux. Runs with
    --no-default-features (Miri can't FFI into ort/tokenizers); the
    cfg!(miri) short-circuit covers the SIMD call sites via scalar.
  • Sanitizers: ASAN/LSAN/MSAN/TSAN on x86_64-linux with
    --features inference,serde (EP features need vendor SDKs).

CI matrix

  • clippy / build / test: ubuntu-latest, macos-latest,
    windows-latest. Test job uses cargo hack --feature-powerset.
    Windows test step excludes inference and default to dodge an
    upstream MSVC C-runtime mismatch — ort_sys builds with /MD,
    esaxx-rs/onig_sys build with /MT, and cargo test triggers a
    link step (it compiles examples too) where MSVC rejects the mix.
    Lib-level Windows coverage is still provided by clippy and build.
  • cross: 15 targets (wasm32-* and tier-2/3 native) built with
    --no-default-features.
  • miri-tb / miri-sb: 7-target matrix with --no-default-features.
  • coverage (tarpaulin): --features inference,serde only, since
    EP features can't compile on a stock runner.

Caveats

  • Auto-discovery prefers model.onnx (canonical fp32 export from
    onnx-community/embeddinggemma-300m-ONNX). The model card flags fp16
    as an unsupported activation dtype; model_fp16.onnx is not
    auto-discovered — pass it explicitly via EGEMMA_MODEL_FILE only if
    you've validated quality for your workload.
  • EP features are not built in CI — they pull ort's vendor-SDK
    flags. Coverage for them lives on provisioned runners outside this
    repo.
  • docs.rs builds with inference,serde only, not all-features,
    for the same reason.
  • embed_batch failure indexing is row-precise for empty-text /
    per-row tokenizer / per-row normalization failures, and chunk-level
    (base_index) for tensor-build / ORT-run / output-extract failures.
    See the method's docstring.

Migrations / breaking changes

This is the initial release; nothing to migrate from. The crate name
flipped from template-rs to egemma and the public surface is
entirely new.

@al8n al8n requested a review from Copilot May 2, 2026 02:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR turns the template crate into the initial egemma 0.1.0 release: a Rust library for producing EmbeddingGemma text embeddings via ONNX Runtime, with a public Embedding/TextEncoder API and supporting options, error handling, SIMD, tests, and CI updates.

Changes:

  • Replaces the template library with the core egemma implementation: embeddings, text encoding, session construction, options, errors, and SIMD dot-product backends.
  • Adds developer-facing validation assets around the new API: unit tests, an opt-in integration test, and an example CLI.
  • Renames/package-configures the crate for release and updates CI to reflect the new feature matrix and target constraints.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/integration.rs Adds opt-in end-to-end inference tests against real model assets.
tests/foo.rs Removes template placeholder test file.
src/text_enc.rs Implements TextEncoder, batching, tokenizer setup, session validation, and embedding inference flow.
src/simd/x86.rs Adds AVX2+FMA x86_64 dot-product backend and tests.
src/simd/scalar.rs Adds scalar fallback/reference dot-product backend and tests.
src/simd/neon.rs Adds aarch64 NEON dot-product backend and tests.
src/simd/mod.rs Adds SIMD dispatch layer and shared backend documentation/tests.
src/session.rs Adds shared ONNX Runtime session builder and execution-provider registration.
src/options.rs Adds public batch/thread/session option types plus serde support and tests.
src/lib.rs Replaces template crate root with the new public API surface and crate docs.
src/error.rs Adds crate error types for inference, validation, and batching failures.
src/embedding.rs Adds validated Embedding type, cosine similarity, normalization, and tests.
examples/foo.rs Removes template placeholder example.
examples/embed_text.rs Adds a runnable embedding example CLI.
benches/foo.rs Removes template placeholder benchmark.
Cargo.toml Renames and configures the crate, dependencies, features, example/test targets, and docs metadata.
.gitignore Adds ignore rules related to local docs/tooling output.
.github/workflows/ci.yml Adjusts CI to the new feature matrix, cross-target contract, and coverage settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/error.rs Outdated
Comment thread src/error.rs
Comment on lines +27 to +31
#[error("session shape mismatch on `{input}`: expected {expected}, got {got:?}")]
SessionShapeMismatch {
input: &'static str,
expected: &'static str,
got: Vec<i64>,
Comment thread tests/integration.rs
Comment on lines +22 to +25
//! GitHub Actions does **not** set `EGEMMA_MODEL_DIR`. When unset, every
//! test in this file emits a `[INTEGRATION-SKIP]` banner and returns
//! `Ok(())` without loading a model. CI therefore reports them as
//! `ok` even though no `ort::Session::run` ever happened. This is a
Comment thread .github/workflows/ci.yml
Comment on lines 112 to +114
run: |
rustup target add ${{ matrix.target }}
cargo build --target ${{ matrix.target }}
cargo build --target ${{ matrix.target }} --no-default-features
Comment thread src/options.rs Outdated
Comment thread Cargo.toml
Comment on lines +15 to +16
"README.md",
"CHANGELOG.md",
Comment thread src/text_enc.rs Outdated
Comment thread src/simd/mod.rs Outdated
Comment thread Cargo.toml Outdated
Comment thread src/text_enc.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@uqio uqio changed the title 0.1.0 Initial 0.1.0 release: EmbeddingGemma ONNX inference May 2, 2026
@uqio uqio merged commit ec85c7a into main May 2, 2026
41 of 43 checks passed
@uqio uqio deleted the 0.1.0 branch May 2, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants