Initial 0.1.0 release: EmbeddingGemma ONNX inference by uqio · Pull Request #1 · Findit-AI/egemma

uqio · 2026-05-02T02:07:48Z

Summary

Replaces the template-rs scaffold with the initial release of
egemma — a Rust ONNX Runtime wrapper for Google's
google/embeddinggemma-300m text encoder. Produces 768-dim
L2-normalized sentence embeddings via [ort] and [tokenizers].

Mirrors the SigLIP2 text-tower API style (from_files /
from_files_with_options / from_ort_session_with_options,
embed / embed_batch / warmup), but tailored to the
[batch, sequence_length] → [batch, 768] contract of the
EmbeddingGemma ONNX export (input_ids + attention_mask in,
sentence_embedding out).

Public API

TextEncoder — owns one ort::Session and one Tokenizer.
Send + !Sync. Constructors: from_files, from_files_with_options,
from_ort_session, from_ort_session_with_options. Methods:
embed, embed_batch (chunked), warmup.
Embedding(Arc<[f32]>) — 768-dim, L2-normalized at construction.
Zero panic surface: try_cosine returns Result<f32, Error> instead
of panicking on dim mismatch. into_inner() exposes the Arc<[f32]>
cheaply (no copy). TryFrom<Vec<f32>> validates dim and unit-norm
for caller-supplied vectors. Embedding deliberately does not derive
Serialize/Deserialize — round-trip via the inner slice so the
invariants are re-checked.
Options / BatchOptions / ThreadOptions — const fn builders
(with_*/set_*); zero public fields. Validation runs at encoder
construction (batch_size ∈ 1..=max_batch_size, max_seq_len > 0).
Error — #[non_exhaustive], thiserror-derived. Includes
EmbeddingDim, NotNormalized, BatchTooLarge, Batch { index, source }, SessionShapeMismatch, SessionContractMismatch (dtype
errors), InvalidBatchSize, InvalidMaxSeqLen, etc.

Cargo features

Feature	Default	Effect
`inference`	✅	Pulls `ort` + `tokenizers`; activates `TextEncoder`. Native targets only.
`serde`		`Serialize` / `Deserialize` on `Options`, `BatchOptions`, `ThreadOptions`. `dep:serde` is opt-in.
`cuda`		NVIDIA GPUs. Requires CUDA toolkit + cuDNN at build/run time.
`tensorrt`		NVIDIA, optimized inference. Falls back to CUDA, then CPU.
`directml`		Windows GPUs (any vendor) via DirectX 12.
`rocm`		AMD GPUs. Requires ROCm SDK.
`coreml`		macOS / iOS via Core ML (Neural Engine + GPU + Metal).

The execution-provider features are off by default — none required for
CPU inference, and each requires its vendor SDK at build time.

SIMD

Embedding::try_cosine dispatches the 768-element f32 dot product
through a runtime-detected backend:

NEON on aarch64 (ISA baseline; always available).
AVX2 + FMA on x86_64, gated on is_x86_feature_detected! for both.
Scalar four-accumulator fallback elsewhere.

The unsafe per-arch kernels take &[f32; 768] rather than &[f32] —
the type-level length invariant is what makes the raw-pointer reads
sound. A wrong-length slice can never reach the unsafe boundary
(release-mode debug_assert! would have been the alternative — and
strippable). The dispatcher short-circuits to scalar under
cfg!(miri) so the Miri matrix exercises the same call sites without
entering platform intrinsics it can't model.

Target / feature contract

The inference feature is native-only. It pulls ort (ONNX
Runtime FFI) and tokenizers (which transitively depends on C-only
libraries like onig_sys), neither of which builds on wasm32-*.

Wasm consumers must opt out:

cargo check --target wasm32-unknown-unknown --no-default-features

Without inference, the public surface is Embedding, Options /
BatchOptions / ThreadOptions, and Error — useful for browser /
edge runtimes that compute embeddings server-side and need only the
value types and try_cosine.

Testing

Unit tests: 34 with default features, 24 without. Cover
Embedding invariants, BatchOptions::validate, try_cosine error
paths, SIMD agreement (scalar / NEON / dispatcher), tokenizer constants,
per-row vs chunk-level error wrapping in embed_batch.
Integration tests: 4 tests gated on EGEMMA_MODEL_DIR. Cover the
full from_files → embed_batch → try_cosine flow against the
released embedding-gemma ONNX. Print a grep-able [INTEGRATION-SKIP]
banner in CI logs when the env var is unset; developer-local
responsibility to run with the model present before merging changes
to text_enc.rs / simd/ / embedding.rs.
Miri: tested on aarch64-apple-darwin, x86_64-linux, i686-linux,
powerpc64-linux, s390x-linux, riscv64-linux. Runs with
--no-default-features (Miri can't FFI into ort/tokenizers); the
cfg!(miri) short-circuit covers the SIMD call sites via scalar.
Sanitizers: ASAN/LSAN/MSAN/TSAN on x86_64-linux with
--features inference,serde (EP features need vendor SDKs).

CI matrix

clippy / build / test: ubuntu-latest, macos-latest,
windows-latest. Test job uses cargo hack --feature-powerset.
Windows test step excludes inference and default to dodge an
upstream MSVC C-runtime mismatch — ort_sys builds with /MD,
esaxx-rs/onig_sys build with /MT, and cargo test triggers a
link step (it compiles examples too) where MSVC rejects the mix.
Lib-level Windows coverage is still provided by clippy and build.
cross: 15 targets (wasm32-* and tier-2/3 native) built with
--no-default-features.
miri-tb / miri-sb: 7-target matrix with --no-default-features.
coverage (tarpaulin): --features inference,serde only, since
EP features can't compile on a stock runner.

Caveats

Auto-discovery prefers model.onnx (canonical fp32 export from
onnx-community/embeddinggemma-300m-ONNX). The model card flags fp16
as an unsupported activation dtype; model_fp16.onnx is not
auto-discovered — pass it explicitly via EGEMMA_MODEL_FILE only if
you've validated quality for your workload.
EP features are not built in CI — they pull ort's vendor-SDK
flags. Coverage for them lives on provisioned runners outside this
repo.
docs.rs builds with inference,serde only, not all-features,
for the same reason.
embed_batch failure indexing is row-precise for empty-text /
per-row tokenizer / per-row normalization failures, and chunk-level
(base_index) for tensor-build / ORT-run / output-extract failures.
See the method's docstring.

Migrations / breaking changes

This is the initial release; nothing to migrate from. The crate name
flipped from template-rs to egemma and the public surface is
entirely new.

Copilot

Pull request overview

This PR turns the template crate into the initial egemma 0.1.0 release: a Rust library for producing EmbeddingGemma text embeddings via ONNX Runtime, with a public Embedding/TextEncoder API and supporting options, error handling, SIMD, tests, and CI updates.

Changes:

Replaces the template library with the core egemma implementation: embeddings, text encoding, session construction, options, errors, and SIMD dot-product backends.
Adds developer-facing validation assets around the new API: unit tests, an opt-in integration test, and an example CLI.
Renames/package-configures the crate for release and updates CI to reflect the new feature matrix and target constraints.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`tests/integration.rs`	Adds opt-in end-to-end inference tests against real model assets.
`tests/foo.rs`	Removes template placeholder test file.
`src/text_enc.rs`	Implements `TextEncoder`, batching, tokenizer setup, session validation, and embedding inference flow.
`src/simd/x86.rs`	Adds AVX2+FMA x86_64 dot-product backend and tests.
`src/simd/scalar.rs`	Adds scalar fallback/reference dot-product backend and tests.
`src/simd/neon.rs`	Adds aarch64 NEON dot-product backend and tests.
`src/simd/mod.rs`	Adds SIMD dispatch layer and shared backend documentation/tests.
`src/session.rs`	Adds shared ONNX Runtime session builder and execution-provider registration.
`src/options.rs`	Adds public batch/thread/session option types plus serde support and tests.
`src/lib.rs`	Replaces template crate root with the new public API surface and crate docs.
`src/error.rs`	Adds crate error types for inference, validation, and batching failures.
`src/embedding.rs`	Adds validated `Embedding` type, cosine similarity, normalization, and tests.
`examples/foo.rs`	Removes template placeholder example.
`examples/embed_text.rs`	Adds a runnable embedding example CLI.
`benches/foo.rs`	Removes template placeholder benchmark.
`Cargo.toml`	Renames and configures the crate, dependencies, features, example/test targets, and docs metadata.
`.gitignore`	Adds ignore rules related to local docs/tooling output.
`.github/workflows/ci.yml`	Adjusts CI to the new feature matrix, cross-target contract, and coverage settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  #[error("session shape mismatch on `{input}`: expected {expected}, got {got:?}")]
+  SessionShapeMismatch {
+    input: &'static str,
+    expected: &'static str,
+    got: Vec<i64>,


+//! GitHub Actions does **not** set `EGEMMA_MODEL_DIR`. When unset, every
+//! test in this file emits a `[INTEGRATION-SKIP]` banner and returns
+//! `Ok(())` without loading a model. CI therefore reports them as
+//! `ok` even though no `ort::Session::run` ever happened. This is a


        run: |
          rustup target add ${{ matrix.target }}
-          cargo build --target ${{ matrix.target }}
+          cargo build --target ${{ matrix.target }} --no-default-features


+    "README.md",
+    "CHANGELOG.md",


codecov · 2026-05-02T04:52:24Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

al8n requested a review from Copilot May 2, 2026 02:08

fix

befd6d1

uqio force-pushed the 0.1.0 branch from b4062d3 to befd6d1 Compare May 2, 2026 02:10

Copilot started reviewing on behalf of al8n May 2, 2026 02:12 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

uqio added 7 commits May 2, 2026 15:17

cleanup

e1d64c2

cleanup

06ba1fe

cleanup

268fdbd

cleanup

342b564

cleanup

f23655f

cleanup

860d2b8

cleanup

a025d23

uqio changed the title ~~0.1.0~~ Initial 0.1.0 release: EmbeddingGemma ONNX inference May 2, 2026

uqio added 3 commits May 2, 2026 18:08

cleanup

1085964

cleanup

0297513

cleanup

4c2ec5f

uqio merged commit ec85c7a into main May 2, 2026
41 of 43 checks passed

uqio deleted the 0.1.0 branch May 2, 2026 09:11

uqio added a commit that referenced this pull request May 2, 2026

Initial 0.1.0 release: EmbeddingGemma ONNX inference (#1)

3126adf

uqio added a commit that referenced this pull request May 4, 2026

Initial 0.1.0 release: EmbeddingGemma ONNX inference (#1)

a603061

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial 0.1.0 release: EmbeddingGemma ONNX inference#1

Initial 0.1.0 release: EmbeddingGemma ONNX inference#1
uqio merged 11 commits intomainfrom
0.1.0

uqio commented May 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

uqio commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Public API

Cargo features

SIMD

Target / feature contract

Testing

CI matrix

Caveats

Migrations / breaking changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026

Welcome to Codecov 🎉

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented May 2, 2026 •

edited

Loading