Skip to content

perf: make numpy imports lazy and optional#17

Open
isaacbmiller wants to merge 4 commits intomainfrom
devin/1775656111-lazy-numpy-import
Open

perf: make numpy imports lazy and optional#17
isaacbmiller wants to merge 4 commits intomainfrom
devin/1775656111-lazy-numpy-import

Conversation

@isaacbmiller
Copy link
Copy Markdown

@isaacbmiller isaacbmiller commented Apr 8, 2026

Summary

Makes numpy a fully optional dependency of dspy. Previously numpy was a hard requirement (dependencies in pyproject.toml) and imported eagerly at module level, costing ~170ms on every import dspy.

This PR:

  1. Moves numpy to an optional extrapip install dspy[numpy] to opt in. Removed from core dependencies in pyproject.toml.
  2. Defers all numpy imports into the functions/methods that actually use them (lazy loading).
  3. Centralizes import guards via import_numpy() helper (dspy/utils/optional_imports.py) — a single function that returns the numpy module or raises ImportError with actionable install instructions. All 14 previous inline try/except blocks replaced with one-liner calls like np = import_numpy("embeddings").
  4. Replaces numpy with stdlib where possible (math.inf, math.log2, math.exp, sum()/len() for averages).
  5. Removes TYPE_CHECKING imports of numpy — the previous if TYPE_CHECKING: import numpy as np blocks would break type checkers for users without numpy installed. Removed from embedding.py and embeddings.py; np.ndarray type hints stripped from method signatures.
  6. Adds numpy to dev dependencies so CI test environments still have it installed (tests import numpy at the top level).

Files changed:

File Change
pyproject.toml numpy moved from dependencies[project.optional-dependencies].numpy; also added to dev extras so CI tests pass
dspy/utils/optional_imports.py New fileimport_numpy(feature) helper that returns numpy or raises ImportError with install guidance
dspy/clients/embedding.py lazy import via helper in _postprocess(); removed TYPE_CHECKING block and -> np.ndarray return type
dspy/retrievers/embeddings.py lazy import via helper in 6 methods; removed TYPE_CHECKING block and np.ndarray param type hints
dspy/predict/knn.py lazy import via helper in __init__() and __call__()
dspy/utils/dummies.py lazy import via helper in DummyVectorizer.__call__(); removed -> np.ndarray return type
dspy/teleprompt/utils.py replaced with pure Python (sum()/len()) — no numpy needed
dspy/teleprompt/mipro_optimizer_v2.py lazy import via helper in _set_random_seeds(), math.log2 replacement
dspy/teleprompt/infer_rules.py replaced np.infmath.inf — no numpy needed
dspy/teleprompt/simba.py lazy import via helper in compile(), math.exp + pure Python percentile replacements
dspy/teleprompt/copro_optimizer.py import via helper when track_stats is enabled
dspy/teleprompt/gepa/gepa.py import via helper in auto_budget()
dspy/dsp/colbertv2.py import via helper in ColBERTv2RerankerLocal.forward()

Review & Testing Checklist for Human

  • Breaking change for existing users: numpy is no longer auto-installed with pip install dspy. Users who depend on embeddings, KNN, retrievers, or optimizers (MIPROv2, SIMBA, COPRO, GEPA) will need pip install dspy[numpy]. Verify this is an acceptable UX tradeoff and consider whether a deprecation warning or migration note is needed.
  • Lost type annotations: Several methods (Embedder.__call__, DummyVectorizer.__call__, _faiss_search, _rerank_and_predict, _normalize) lost their np.ndarray type hints. Verify no downstream tooling or type-checked code depends on these annotations.
  • simba.py percentile replacement (lines ~219-222): The pure Python percentile (all_batch_scores[int(n * 0.1)]) uses simple index truncation, while np.percentile uses linear interpolation by default. This is a subtle behavioral difference that could affect optimization on small batches.
  • "No numpy" path is untested in CI: Because numpy is in dev deps, CI always has it installed. The except ImportError branches are never exercised. Recommend manually verifying: python -c "import dspy" succeeds in a venv without numpy, and that calling a guarded method produces the expected ImportError message.
  • gepa/gepa.py still uses np.log2 in auto_budget() — unlike mipro_optimizer_v2.py which was switched to math.log2. Not a bug (numpy will be imported via the helper), but is inconsistent.

Notes

  • All pre-existing ruff lint warnings (trailing whitespace in simba.py docstrings, E712 in mipro_optimizer_v2.py, etc.) were left untouched — not related to this change.
  • from __future__ import annotations is used in files that previously had np.ndarray in type hints, making annotations lazy strings at runtime.
  • numpy is included in dev extras so that test files (which import numpy at the top level) can be collected by pytest. This does not affect end users.

Link to Devin session: https://app.devin.ai/sessions/c2be1624ef994c42bf6ac26d8a1b096d
Requested by: @isaacbmiller

Move all top-level 'import numpy as np' statements into the functions
and methods that actually use numpy. This avoids loading numpy (~170ms)
during 'import dspy' for users who don't use embedding, KNN, or
optimizer features.

Files changed:
- dspy/clients/embedding.py: lazy import in _postprocess(), TYPE_CHECKING for type hints
- dspy/retrievers/embeddings.py: lazy import in _batch_forward(), _build_faiss(), _rerank_and_predict(), _normalize(), save(), load()
- dspy/utils/dummies.py: lazy import in DummyVectorizer.__call__()
- dspy/predict/knn.py: lazy import in __init__() and __call__()
- dspy/teleprompt/utils.py: replaced np.array/np.average/np.sum with pure Python equivalents
- dspy/teleprompt/mipro_optimizer_v2.py: lazy import in _set_random_seeds(), replaced np.log2 with math.log2
- dspy/teleprompt/infer_rules.py: replaced np.inf with math.inf
- dspy/teleprompt/simba.py: lazy import in compile(), replaced np.exp with math.exp, np.percentile with pure Python

Co-Authored-By: Isaac Miller <isaacbmiller@gmail.com>
@isaacbmiller
Copy link
Copy Markdown
Author

Test Results

Tested locally against the PR branch. All tests passed.

Lazy Import Verification
  • import dspy does NOT load numpy — PASSED. Verified 'numpy' not in sys.modules after import dspy.
  • On-demand numpy loading — PASSED. Verified: (1) numpy NOT in sys.modules after import dspy, (2) numpy IS in sys.modules after calling Embedder, (3) result is np.ndarray.
Regression Tests (15 tests total)
  • tests/clients/test_embedding.py — 4/4 passed (litellm embedding, callable embedding, invalid model, async embedding)
  • tests/predict/test_knn.py — 3/3 passed (initialization with isinstance(result, np.ndarray), query, query specificity)
  • tests/retrievers/test_embeddings.py — 7/7 passed (basic search, multithreaded, save/load, from_saved, nonexistent path, with_scores)
  • tests/teleprompt/test_knn_fewshot.py — 1/1 passed

CI: 13/13 checks passed (Python 3.10-3.14, builds, ruff, snyk).

Devin session

@isaacbmiller
Copy link
Copy Markdown
Author

Make it an optional dep in pyproject.toml and include guards that will guide users there

- Move numpy from required to optional dependency in pyproject.toml (pip install dspy[numpy])
- Add try/except ImportError guards with helpful install messages in all files with lazy numpy imports
- Covers: embedding, retrievers, KNN, dummies, MIPROv2, SIMBA, COPRO, GEPA, ColBERTv2

Co-Authored-By: Isaac Miller <isaacbmiller@gmail.com>
@isaacbmiller
Copy link
Copy Markdown
Author

Create a helper to reduce the repeated code.

Also if we have it in type checking, will that break type checking for people who don't have numpy installed?

Co-Authored-By: Isaac Miller <isaacbmiller@gmail.com>
@isaacbmiller
Copy link
Copy Markdown
Author

Create a helper to reduce the repeated code.

Also if we have it in type checking, will that break type checking for people who don't have numpy installed?

… fix TYPE_CHECKING issue

- Add dspy/utils/optional_imports.py with import_numpy() helper that returns the numpy module or raises ImportError with install instructions
- Replace all 14 inline try/except ImportError blocks across 9 files with calls to import_numpy()
- Remove TYPE_CHECKING import of numpy from embedding.py and embeddings.py (fixes type checking for users without numpy)
- Remove np.ndarray type hints from method signatures that would break without numpy

Co-Authored-By: Isaac Miller <isaacbmiller@gmail.com>
@isaacbmiller
Copy link
Copy Markdown
Author

Addressed both comments in 47b53d9:

1. Helper to reduce repeated code:
Created dspy/utils/optional_imports.py with an import_numpy(feature) helper that returns the numpy module or raises ImportError with install instructions. Replaced all 14 inline try/except ImportError blocks across 9 files with one-liner calls like np = import_numpy("embeddings").

2. TYPE_CHECKING breaking type checking without numpy:
Yes — if TYPE_CHECKING: import numpy as np would break type checkers (pyright/mypy) for users without numpy installed. Fixed by removing the TYPE_CHECKING blocks from both embedding.py and embeddings.py, and removing np.ndarray type hints from method signatures. Since these files use from __future__ import annotations, annotations are lazy strings at runtime and don't need numpy imported.

@isaacbmiller
Copy link
Copy Markdown
Author

Are there any tests that will fail if someone tries to add a non-optional numpy import in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant