fix(#44): pytest passes out of the box — guard optional-dep tests#58
Conversation
A fresh clone running `pytest tests/` hit 2 collection errors (Azure) that
aborted the whole run, plus provider-test failures — all from *optional* deps,
not bugs.
- src/data/loader.py: lazy-import AzureBlobEmailLoader inside the azure_blob
branch (was a top-level import that pulled optional `azure-storage-blob`), so
`from src.data.loader import load_emails` works without Azure — the enron /
mail_archive_x paths need none. Keeps test_load_emails running on minimal envs.
- tests/test_azure_blob_loader.py: `pytest.importorskip("azure.storage.blob")`.
- tests/test_settings_embedding_provider.py:
`pytest.importorskip("llama_index.embeddings.openai")`.
Verified: full env 882 pass; minimal env (azure + cloud embeddings + nvidia +
pinecone blocked) 865 pass / 5 skip / 0 errors — clone → pytest → green.
Closes #44.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| raise ValueError("backup_dir required for mail_archive_x source") | ||
| loader = MailArchiveXLoader(backup_dir) | ||
| elif source == "azure_blob": | ||
| from src.data.loaders import AzureBlobEmailLoader # noqa: PLC0415 (optional dep) |
There was a problem hiding this comment.
The azure_blob branch of load_emails isn't covered in test_load_emails.py. The other three branches (enron, mail_archive_x, invalid source) all have cases there, but this one doesn't.
A test can avoid the real Azure SDK by patching with create=True — no pytest.importorskip needed:
@patch("src.data.loaders.azure_blob.AzureBlobEmailLoader", create=True)
def test_load_emails_azure_blob_source(self, MockLoader):
mock_loader = MagicMock()
mock_loader.load.return_value = []
MockLoader.return_value = mock_loader
docs = load_emails(source="azure_blob")
MockLoader.assert_called_once()
mock_loader.load.assert_called_once_with(num_samples=None)
self.assertEqual(docs, [])Per the repo rule: modified logic → tests for happy path, edge cases, and invalid inputs.
There was a problem hiding this comment.
✅ Good catch — added test_load_emails_azure_blob_source in 7e7e350 covering the dispatch (constructs the loader, calls load(num_samples=...), maps to Documents).
One adjustment to your suggestion: I kept a per-test pytest.importorskip("azure.storage.blob") rather than relying on create=True alone. The reason is the lazy __getattr__ in src/data/loaders/__init__.py: load_emails resolves from src.data.loaders import AzureBlobEmailLoader at call time, and mock.patch saving the original attribute still triggers that getattr → imports azure_blob → azure.storage.blob. So an unguarded patch would re-introduce a minimal-env import error — the exact thing #44 fixes. Verified both ways: passes with Azure installed, skips cleanly when it's blocked. Module-level importorskip wasn't an option either (the enron / mail_archive_x tests in this file must still run without Azure).
Review: fix(#44) — pytest passes out of the boxOverall: Clean, minimal, correctly scoped. The three-file change does exactly what it says. One test-coverage gap to address; no correctness or security issues. What's correct
One gapMissing test for the
Minor note (no action needed)The comment in |
Address review on #58: the azure_blob branch (added lazy import) wasn't directly tested in test_load_emails.py. Add a dispatch test asserting it constructs the loader and calls load(num_samples=...). Per-test importorskip (not the reviewer's unguarded create=True): the loaders package __getattr__ means mock.patch saving the original still triggers the azure import, which would re-break the minimal env — the exact thing #44 fixes. Skips on minimal installs, runs where azure is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| raise ValueError("backup_dir required for mail_archive_x source") | ||
| loader = MailArchiveXLoader(backup_dir) | ||
| elif source == "azure_blob": | ||
| from src.data.loaders import AzureBlobEmailLoader # noqa: PLC0415 (optional dep) |
There was a problem hiding this comment.
The lazy import and __getattr__ interaction here is correct. Minor follow-up: the source param in the docstring just above (line 28) still reads "enron" or "mail_archive_x" — "azure_blob" should be added now that it has test coverage. CLAUDE.md requires docstrings to be updated when behaviour is touched.
There was a problem hiding this comment.
✅ Fixed — the docstring now reads "enron", "mail_archive_x", or "azure_blob". Followed up in a separate docstring-only PR since #58 was already merged.
Review summaryGoal: make Correctness — one nit
Lazy-import +
|
Summary
A fresh clone running the documented
pytest tests/hit 2 collection errors (Azure) that aborted the whole run, plus provider-test failures — all from optional dependencies, not bugs. "Clone → pytest → red" is a poor first impression for a public repo.src/data/loader.py— lazy-importAzureBlobEmailLoaderinside theazure_blobbranch (it was a top-level import pulling the optionalazure-storage-blob). Nowfrom src.data.loader import load_emailsworks without Azure; the enron / mail_archive_x paths need none. This keepstest_load_emailsrunning on minimal envs (vs. skipping it).tests/test_azure_blob_loader.py—pytest.importorskip("azure.storage.blob").tests/test_settings_embedding_provider.py—pytest.importorskip("llama_index.embeddings.openai").(The NVIDIA and Office-parser optional tests were already guarded in earlier PRs.)
Verification
Closes #44.
🤖 Generated with Claude Code