Summary
When fastembed is used in a GPU environment alongside onnxruntime-gpu, the CUDAExecutionProvider silently disappears at runtime because pip/uv end up installing both onnxruntime (CPU) and onnxruntime-gpu simultaneously. Since both packages install into the same site-packages/onnxruntime/ directory, whichever is installed last wins — and in practice the CPU build's onnxruntime_pybind11_state.so overwrites the GPU build's, stripping CUDAExecutionProvider from the available providers list.
Root cause
fastembed declares a hard dependency on onnxruntime > 1.20.0 (by name). Until onnxruntime-gpu ~1.19.x, the GPU wheel declared Provides-Dist: onnxruntime in its metadata, which instructed pip/uv that onnxruntime-gpu satisfies any onnxruntime requirement. This metadata is absent from onnxruntime-gpu >= 1.20.0 (confirmed in 1.24.2). As a result:
- uv resolves
onnxruntime > 1.20.0 → installs onnxruntime==1.24.2 (CPU build)
- User also has
onnxruntime-gpu==1.24.2 in their project requirements
- Both install to
site-packages/onnxruntime/; the CPU onnxruntime_pybind11_state.so overwrites the GPU one
ort.get_available_providers() returns ['CPUExecutionProvider'] instead of ['CUDAExecutionProvider', 'CPUExecutionProvider']
The failure mode is silent — no import error, no warning, just no GPU acceleration.
Minimal repro (Docker)
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
# Project that depends on fastembed + onnxruntime-gpu
RUN uv pip install fastembed onnxruntime-gpu
RUN python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Output: ['CPUExecutionProvider'] <-- CUDAExecutionProvider is gone
The libonnxruntime_providers_cuda.so is present and all its .so dependencies resolve correctly, but it links against Provider_GetHost from libonnxruntime_providers_shared.so — which is only exported by the GPU pybind11 build, not the CPU one. So dlopen of the CUDA provider fails silently.
Workaround
Reinstall onnxruntime-gpu after uv sync to restore the GPU binaries:
RUN uv sync --frozen --no-dev
RUN uv pip install --python .venv/bin/python --reinstall "onnxruntime-gpu[cuda,cudnn]==1.24.2"
This is fragile (order-dependent, easy to get wrong) and can't be expressed cleanly in pyproject.toml.
Suggested fixes
Option A — fastembed side (preferred): Add a gpu extra that replaces the onnxruntime dep with onnxruntime-gpu:
[project.optional-dependencies]
gpu = ["onnxruntime-gpu"]
[project.dependencies]
# Remove direct onnxruntime pin or make it conditional
And guard the import with try/except so either package works. This lets GPU users do pip install "fastembed[gpu]" and get a coherent environment.
Option B — onnxruntime side: Restore Provides-Dist: onnxruntime in onnxruntime-gpu's wheel metadata so package managers treat them as interchangeable. This was present in onnxruntime-gpu <= 1.19.x. A related issue is tracked at microsoft/onnxruntime#22107.
Environment
fastembed==0.7.4, onnxruntime==1.24.2, onnxruntime-gpu==1.24.2
- Python 3.13, uv 0.6.x
- Docker image:
ghcr.io/astral-sh/uv:python3.13-bookworm-slim
- Host: NVIDIA RTX 4090 + RTX 5090, driver 570.x, nvidia-container-toolkit 1.17.6
Summary
When
fastembedis used in a GPU environment alongsideonnxruntime-gpu, theCUDAExecutionProvidersilently disappears at runtime because pip/uv end up installing bothonnxruntime(CPU) andonnxruntime-gpusimultaneously. Since both packages install into the samesite-packages/onnxruntime/directory, whichever is installed last wins — and in practice the CPU build'sonnxruntime_pybind11_state.sooverwrites the GPU build's, strippingCUDAExecutionProviderfrom the available providers list.Root cause
fastembeddeclares a hard dependency ononnxruntime > 1.20.0(by name). Untilonnxruntime-gpu~1.19.x, the GPU wheel declaredProvides-Dist: onnxruntimein its metadata, which instructed pip/uv thatonnxruntime-gpusatisfies anyonnxruntimerequirement. This metadata is absent fromonnxruntime-gpu >= 1.20.0(confirmed in 1.24.2). As a result:onnxruntime > 1.20.0→ installsonnxruntime==1.24.2(CPU build)onnxruntime-gpu==1.24.2in their project requirementssite-packages/onnxruntime/; the CPUonnxruntime_pybind11_state.sooverwrites the GPU oneort.get_available_providers()returns['CPUExecutionProvider']instead of['CUDAExecutionProvider', 'CPUExecutionProvider']The failure mode is silent — no import error, no warning, just no GPU acceleration.
Minimal repro (Docker)
The
libonnxruntime_providers_cuda.sois present and all its.sodependencies resolve correctly, but it links againstProvider_GetHostfromlibonnxruntime_providers_shared.so— which is only exported by the GPU pybind11 build, not the CPU one. Sodlopenof the CUDA provider fails silently.Workaround
Reinstall
onnxruntime-gpuafteruv syncto restore the GPU binaries:This is fragile (order-dependent, easy to get wrong) and can't be expressed cleanly in
pyproject.toml.Suggested fixes
Option A — fastembed side (preferred): Add a
gpuextra that replaces theonnxruntimedep withonnxruntime-gpu:And guard the import with
try/exceptso either package works. This lets GPU users dopip install "fastembed[gpu]"and get a coherent environment.Option B — onnxruntime side: Restore
Provides-Dist: onnxruntimeinonnxruntime-gpu's wheel metadata so package managers treat them as interchangeable. This was present inonnxruntime-gpu <= 1.19.x. A related issue is tracked at microsoft/onnxruntime#22107.Environment
fastembed==0.7.4,onnxruntime==1.24.2,onnxruntime-gpu==1.24.2ghcr.io/astral-sh/uv:python3.13-bookworm-slim