Conversation
There was a problem hiding this comment.
Pull request overview
This PR turns the template crate into the initial egemma 0.1.0 release: a Rust library for producing EmbeddingGemma text embeddings via ONNX Runtime, with a public Embedding/TextEncoder API and supporting options, error handling, SIMD, tests, and CI updates.
Changes:
- Replaces the template library with the core
egemmaimplementation: embeddings, text encoding, session construction, options, errors, and SIMD dot-product backends. - Adds developer-facing validation assets around the new API: unit tests, an opt-in integration test, and an example CLI.
- Renames/package-configures the crate for release and updates CI to reflect the new feature matrix and target constraints.
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
tests/integration.rs |
Adds opt-in end-to-end inference tests against real model assets. |
tests/foo.rs |
Removes template placeholder test file. |
src/text_enc.rs |
Implements TextEncoder, batching, tokenizer setup, session validation, and embedding inference flow. |
src/simd/x86.rs |
Adds AVX2+FMA x86_64 dot-product backend and tests. |
src/simd/scalar.rs |
Adds scalar fallback/reference dot-product backend and tests. |
src/simd/neon.rs |
Adds aarch64 NEON dot-product backend and tests. |
src/simd/mod.rs |
Adds SIMD dispatch layer and shared backend documentation/tests. |
src/session.rs |
Adds shared ONNX Runtime session builder and execution-provider registration. |
src/options.rs |
Adds public batch/thread/session option types plus serde support and tests. |
src/lib.rs |
Replaces template crate root with the new public API surface and crate docs. |
src/error.rs |
Adds crate error types for inference, validation, and batching failures. |
src/embedding.rs |
Adds validated Embedding type, cosine similarity, normalization, and tests. |
examples/foo.rs |
Removes template placeholder example. |
examples/embed_text.rs |
Adds a runnable embedding example CLI. |
benches/foo.rs |
Removes template placeholder benchmark. |
Cargo.toml |
Renames and configures the crate, dependencies, features, example/test targets, and docs metadata. |
.gitignore |
Adds ignore rules related to local docs/tooling output. |
.github/workflows/ci.yml |
Adjusts CI to the new feature matrix, cross-target contract, and coverage settings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[error("session shape mismatch on `{input}`: expected {expected}, got {got:?}")] | ||
| SessionShapeMismatch { | ||
| input: &'static str, | ||
| expected: &'static str, | ||
| got: Vec<i64>, |
| //! GitHub Actions does **not** set `EGEMMA_MODEL_DIR`. When unset, every | ||
| //! test in this file emits a `[INTEGRATION-SKIP]` banner and returns | ||
| //! `Ok(())` without loading a model. CI therefore reports them as | ||
| //! `ok` even though no `ort::Session::run` ever happened. This is a |
| run: | | ||
| rustup target add ${{ matrix.target }} | ||
| cargo build --target ${{ matrix.target }} | ||
| cargo build --target ${{ matrix.target }} --no-default-features |
| "README.md", | ||
| "CHANGELOG.md", |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Summary
Replaces the
template-rsscaffold with the initial release ofegemma— a Rust ONNX Runtime wrapper for Google'sgoogle/embeddinggemma-300mtext encoder. Produces 768-dimL2-normalized sentence embeddings via [
ort] and [tokenizers].Mirrors the SigLIP2 text-tower API style (
from_files/from_files_with_options/from_ort_session_with_options,embed/embed_batch/warmup), but tailored to the[batch, sequence_length] → [batch, 768]contract of theEmbeddingGemma ONNX export (
input_ids+attention_maskin,sentence_embeddingout).Public API
TextEncoder— owns oneort::Sessionand oneTokenizer.Send + !Sync. Constructors:from_files,from_files_with_options,from_ort_session,from_ort_session_with_options. Methods:embed,embed_batch(chunked),warmup.Embedding(Arc<[f32]>)— 768-dim, L2-normalized at construction.Zero panic surface:
try_cosinereturnsResult<f32, Error>insteadof panicking on dim mismatch.
into_inner()exposes theArc<[f32]>cheaply (no copy).
TryFrom<Vec<f32>>validates dim and unit-normfor caller-supplied vectors.
Embeddingdeliberately does not deriveSerialize/Deserialize— round-trip via the inner slice so theinvariants are re-checked.
Options/BatchOptions/ThreadOptions—const fnbuilders(
with_*/set_*); zero public fields. Validation runs at encoderconstruction (
batch_size ∈ 1..=max_batch_size,max_seq_len > 0).Error—#[non_exhaustive],thiserror-derived. IncludesEmbeddingDim,NotNormalized,BatchTooLarge,Batch { index, source },SessionShapeMismatch,SessionContractMismatch(dtypeerrors),
InvalidBatchSize,InvalidMaxSeqLen, etc.Cargo features
inferenceort+tokenizers; activatesTextEncoder. Native targets only.serdeSerialize/DeserializeonOptions,BatchOptions,ThreadOptions.dep:serdeis opt-in.cudatensorrtdirectmlrocmcoremlThe execution-provider features are off by default — none required for
CPU inference, and each requires its vendor SDK at build time.
SIMD
Embedding::try_cosinedispatches the 768-element f32 dot productthrough a runtime-detected backend:
is_x86_feature_detected!for both.The unsafe per-arch kernels take
&[f32; 768]rather than&[f32]—the type-level length invariant is what makes the raw-pointer reads
sound. A wrong-length slice can never reach the unsafe boundary
(release-mode
debug_assert!would have been the alternative — andstrippable). The dispatcher short-circuits to scalar under
cfg!(miri)so the Miri matrix exercises the same call sites withoutentering platform intrinsics it can't model.
Target / feature contract
The
inferencefeature is native-only. It pullsort(ONNXRuntime FFI) and
tokenizers(which transitively depends on C-onlylibraries like
onig_sys), neither of which builds onwasm32-*.Wasm consumers must opt out:
Without
inference, the public surface isEmbedding,Options/BatchOptions/ThreadOptions, andError— useful for browser /edge runtimes that compute embeddings server-side and need only the
value types and
try_cosine.Testing
Embeddinginvariants,BatchOptions::validate,try_cosineerrorpaths, SIMD agreement (scalar / NEON / dispatcher), tokenizer constants,
per-row vs chunk-level error wrapping in
embed_batch.EGEMMA_MODEL_DIR. Cover thefull
from_files→embed_batch→try_cosineflow against thereleased
embedding-gemmaONNX. Print a grep-able[INTEGRATION-SKIP]banner in CI logs when the env var is unset; developer-local
responsibility to run with the model present before merging changes
to
text_enc.rs/simd//embedding.rs.powerpc64-linux, s390x-linux, riscv64-linux. Runs with
--no-default-features(Miri can't FFI intoort/tokenizers); thecfg!(miri)short-circuit covers the SIMD call sites via scalar.--features inference,serde(EP features need vendor SDKs).CI matrix
clippy/build/test: ubuntu-latest, macos-latest,windows-latest. Test job uses
cargo hack --feature-powerset.Windows test step excludes
inferenceanddefaultto dodge anupstream MSVC C-runtime mismatch —
ort_sysbuilds with/MD,esaxx-rs/onig_sysbuild with/MT, andcargo testtriggers alink step (it compiles examples too) where MSVC rejects the mix.
Lib-level Windows coverage is still provided by
clippyandbuild.cross: 15 targets (wasm32-* and tier-2/3 native) built with--no-default-features.miri-tb/miri-sb: 7-target matrix with--no-default-features.coverage(tarpaulin):--features inference,serdeonly, sinceEP features can't compile on a stock runner.
Caveats
model.onnx(canonical fp32 export fromonnx-community/embeddinggemma-300m-ONNX). The model card flags fp16as an unsupported activation dtype;
model_fp16.onnxis notauto-discovered — pass it explicitly via
EGEMMA_MODEL_FILEonly ifyou've validated quality for your workload.
ort's vendor-SDKflags. Coverage for them lives on provisioned runners outside this
repo.
docs.rsbuilds withinference,serdeonly, notall-features,for the same reason.
embed_batchfailure indexing is row-precise for empty-text /per-row tokenizer / per-row normalization failures, and chunk-level
(
base_index) for tensor-build / ORT-run / output-extract failures.See the method's docstring.
Migrations / breaking changes
This is the initial release; nothing to migrate from. The crate name
flipped from
template-rstoegemmaand the public surface isentirely new.