Skip to content

fix(#44): pytest passes out of the box — guard optional-dep tests#58

Merged
fmasi merged 2 commits into
mainfrom
fix/pytest-out-of-the-box
Jun 12, 2026
Merged

fix(#44): pytest passes out of the box — guard optional-dep tests#58
fmasi merged 2 commits into
mainfrom
fix/pytest-out-of-the-box

Conversation

@fmasi

@fmasi fmasi commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

A fresh clone running the documented pytest tests/ hit 2 collection errors (Azure) that aborted the whole run, plus provider-test failures — all from optional dependencies, not bugs. "Clone → pytest → red" is a poor first impression for a public repo.

  • src/data/loader.py — lazy-import AzureBlobEmailLoader inside the azure_blob branch (it was a top-level import pulling the optional azure-storage-blob). Now from src.data.loader import load_emails works without Azure; the enron / mail_archive_x paths need none. This keeps test_load_emails running on minimal envs (vs. skipping it).
  • tests/test_azure_blob_loader.pypytest.importorskip("azure.storage.blob").
  • tests/test_settings_embedding_provider.pypytest.importorskip("llama_index.embeddings.openai").

(The NVIDIA and Office-parser optional tests were already guarded in earlier PRs.)

Verification

  • Full env: 882 pass.
  • Minimal env (Azure + cloud embeddings + NVIDIA + Pinecone blocked): 865 pass / 5 skip / 0 errors — clone → pytest → green.

Closes #44.

🤖 Generated with Claude Code

A fresh clone running `pytest tests/` hit 2 collection errors (Azure) that
aborted the whole run, plus provider-test failures — all from *optional* deps,
not bugs.

- src/data/loader.py: lazy-import AzureBlobEmailLoader inside the azure_blob
  branch (was a top-level import that pulled optional `azure-storage-blob`), so
  `from src.data.loader import load_emails` works without Azure — the enron /
  mail_archive_x paths need none. Keeps test_load_emails running on minimal envs.
- tests/test_azure_blob_loader.py: `pytest.importorskip("azure.storage.blob")`.
- tests/test_settings_embedding_provider.py:
  `pytest.importorskip("llama_index.embeddings.openai")`.

Verified: full env 882 pass; minimal env (azure + cloud embeddings + nvidia +
pinecone blocked) 865 pass / 5 skip / 0 errors — clone → pytest → green.

Closes #44.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/data/loader.py
raise ValueError("backup_dir required for mail_archive_x source")
loader = MailArchiveXLoader(backup_dir)
elif source == "azure_blob":
from src.data.loaders import AzureBlobEmailLoader # noqa: PLC0415 (optional dep)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The azure_blob branch of load_emails isn't covered in test_load_emails.py. The other three branches (enron, mail_archive_x, invalid source) all have cases there, but this one doesn't.

A test can avoid the real Azure SDK by patching with create=True — no pytest.importorskip needed:

@patch("src.data.loaders.azure_blob.AzureBlobEmailLoader", create=True)
def test_load_emails_azure_blob_source(self, MockLoader):
    mock_loader = MagicMock()
    mock_loader.load.return_value = []
    MockLoader.return_value = mock_loader

    docs = load_emails(source="azure_blob")

    MockLoader.assert_called_once()
    mock_loader.load.assert_called_once_with(num_samples=None)
    self.assertEqual(docs, [])

Per the repo rule: modified logic → tests for happy path, edge cases, and invalid inputs.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Good catch — added test_load_emails_azure_blob_source in 7e7e350 covering the dispatch (constructs the loader, calls load(num_samples=...), maps to Documents).

One adjustment to your suggestion: I kept a per-test pytest.importorskip("azure.storage.blob") rather than relying on create=True alone. The reason is the lazy __getattr__ in src/data/loaders/__init__.py: load_emails resolves from src.data.loaders import AzureBlobEmailLoader at call time, and mock.patch saving the original attribute still triggers that getattr → imports azure_blobazure.storage.blob. So an unguarded patch would re-introduce a minimal-env import error — the exact thing #44 fixes. Verified both ways: passes with Azure installed, skips cleanly when it's blocked. Module-level importorskip wasn't an option either (the enron / mail_archive_x tests in this file must still run without Azure).

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review: fix(#44) — pytest passes out of the box

Overall: Clean, minimal, correctly scoped. The three-file change does exactly what it says. One test-coverage gap to address; no correctness or security issues.


What's correct

  • Lazy import in loader.py — moving AzureBlobEmailLoader inside the elif source == "azure_blob": branch is the right fix. The loaders/__init__.py already uses __getattr__ for deferred loading, so from src.data.loaders import AzureBlobEmailLoader inside the function resolves cleanly through that same mechanism.
  • pytest.importorskip placement — both guards are at module level, before any import that would fail at collection time. Correct placement; azure.storage.blob guard prevents the AzureBlobEmailLoader import from erroring out, and the llama_index.embeddings.openai guard prevents the @patch decorators from raising at test execution.
  • # noqa: PLC0415 — appropriate; lint suppression is scoped to the one non-top-level import that's intentionally deferred.

One gap

Missing test for the azure_blob path in test_load_emails.py (see inline comment on loader.py line 42).

test_load_emails.py covers enron, mail_archive_x, missing-backup_dir, and invalid source — but not azure_blob. The new lazy-import branch is only indirectly exercised by test_azure_blob_loader.py (which tests the loader class itself, not the load_emails dispatch). Per the repo rule, modified logic needs its own test; a create=True patch makes it runnable without the real Azure SDK.


Minor note (no action needed)

The comment in test_settings_embedding_provider.py — "must be importable to patch it" — is accurate: unittest.mock.patch does import the target module at application time, so the test would raise (not skip) on a missing dep without the guard. No change needed.

Address review on #58: the azure_blob branch (added lazy import) wasn't directly
tested in test_load_emails.py. Add a dispatch test asserting it constructs the
loader and calls load(num_samples=...). Per-test importorskip (not the reviewer's
unguarded create=True): the loaders package __getattr__ means mock.patch saving
the original still triggers the azure import, which would re-break the minimal
env — the exact thing #44 fixes. Skips on minimal installs, runs where azure is
present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/data/loader.py
raise ValueError("backup_dir required for mail_archive_x source")
loader = MailArchiveXLoader(backup_dir)
elif source == "azure_blob":
from src.data.loaders import AzureBlobEmailLoader # noqa: PLC0415 (optional dep)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy import and __getattr__ interaction here is correct. Minor follow-up: the source param in the docstring just above (line 28) still reads "enron" or "mail_archive_x""azure_blob" should be added now that it has test coverage. CLAUDE.md requires docstrings to be updated when behaviour is touched.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — the docstring now reads "enron", "mail_archive_x", or "azure_blob". Followed up in a separate docstring-only PR since #58 was already merged.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review summary

Goal: make pytest tests/ pass on a minimal install by lazy-importing the Azure SDK and guarding the optional-dep tests. The approach is sound and the verification numbers (882 pass full / 865 pass + 5 skip minimal) are reassuring.


Correctness — one nit

load_emails docstring (loader.py:28) omits "azure_blob".
The source parameter description says "enron" or "mail_archive_x". Now that azure_blob has test coverage, the docstring should list it too. Flagged inline on the nearest diff line.


Lazy-import + __getattr__ interaction — correct

The loaders/__init__.py __getattr__ never caches its return value (each access re-imports from the submodule), so patching src.data.loaders.azure_blob.AzureBlobEmailLoader in the new test correctly intercepts the lazy import inside load_emails. The long docstring on the test explains the reasoning; the approach is valid even if the target looks unusual compared to the sibling tests.


Test coverage — good overall

The new test_load_emails_azure_blob_source covers the happy path with num_samples=None, including the dispatcher, the loader call, and the Document return. One assertion that could be added (not blocking): mock_normalized_email.to_document.assert_called_once_with(doc_id="azure_blob_0") would pin the doc-id construction logic f"{email.source}_{i}". The sibling enron/mail_archive_x tests also omit this, so it is a consistent gap rather than a regression.


No security or injection concerns.

@fmasi fmasi merged commit e67cb46 into main Jun 12, 2026
6 checks passed
@fmasi fmasi deleted the fix/pytest-out-of-the-box branch June 12, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make pytest tests/ pass out of the box (skip guards for optional deps)

1 participant