Skip to content

fix: replace pkg_resources with importlib.metadata (#1816)#1847

Open
jbbqqf wants to merge 1 commit into
Data-Centric-AI-Community:developfrom
jbbqqf:feat/1816-importlib-metadata-pillow
Open

fix: replace pkg_resources with importlib.metadata (#1816)#1847
jbbqqf wants to merge 1 commit into
Data-Centric-AI-Community:developfrom
jbbqqf:feat/1816-importlib-metadata-pillow

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

Replace the runtime dependency on pkg_resources (removed in setuptools 81)
with the stdlib importlib.metadata, so import data_profiling no longer
crashes on fresh installs that pull in modern setuptools.

Fixes #1816Replace deprecated pkg_resources with importlib.metadata in profile_report.py.

Context

pkg_resources was removed from setuptools in v81 (changelog entry).
After Aug 2025, fresh pip install fg-data-profiling resolutions on a
clean environment routinely crash at import time with
ModuleNotFoundError: No module named 'pkg_resources', because
src/data_profiling/profile_report.py:11 does import pkg_resources
unconditionally and src/data_profiling/utils/versions.py keeps a
pkg_resources fallback.

That fallback was meant to support Python < 3.8 (where
importlib.metadata did not exist), but pyproject.toml already pins
requires-python = ">=3.10,<3.14" since #1778, so the fallback branch is
dead code under any supported runtime.

The original issue reporter validated the change locally with a 2195-test
green run on Python 3.13.

Changes

  • src/data_profiling/profile_report.py: drop import pkg_resources,
    use importlib.metadata.version("Pillow") in ProfileReport.to_file.
    Wrap the lookup in a PackageNotFoundError guard (Pillow is an indirect
    dependency, so a trimmed environment without it should warn-and-continue,
    not crash). Parse numeric components defensively so pre-release versions
    like "11.0.0a1" don't blow up int().
  • src/data_profiling/utils/versions.py: drop the pkg_resources fallback
    branch. importlib.metadata.version is unconditional now.
  • tests/issues/test_issue1816.py (new): three regression guards.

The targeted comments inline explain why the branch was kept (Pillow may
be absent) and why the numeric-parts loop exists (pre-release segments) so
a reviewer reading the diff cold doesn't have to re-derive the reasoning.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/Data-Centric-AI-Community/fg-data-profiling.git /tmp/repro && cd /tmp/repro
python3 -m venv /tmp/repro-venv
source /tmp/repro-venv/bin/activate
pip install --upgrade 'setuptools>=81' pip   # the env that triggers the bug

# --- BEFORE (origin/develop) ---
git checkout origin/develop
pip install -q -e . pytest
python3 -c "import data_profiling"
# Expected: ModuleNotFoundError: No module named 'pkg_resources'
git fetch https://github.com/jbbqqf/fg-data-profiling.git feat/1816-importlib-metadata-pillow
git checkout FETCH_HEAD -- tests/issues/test_issue1816.py
python3 -m pytest tests/issues/test_issue1816.py -v 2>&1 | tail -5
# Expected: collection error / import error before any test runs

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/fg-data-profiling.git feat/1816-importlib-metadata-pillow
git checkout FETCH_HEAD
pip install -q --force-reinstall --no-deps -e .
python3 -c "import data_profiling; print('import OK')"
# Expected: prints "import OK"
python3 -m pytest tests/issues/test_issue1816.py -v 2>&1 | tail -5
# Expected: 3 passed

The block above is a single bash run a reviewer can paste verbatim. The
only thing that changes between BEFORE and AFTER is the checked-out git
ref — that's the load-bearing piece of evidence.

What I ran locally

  • pytest tests/issues/test_issue1816.py -v → 3/3 passed.
  • pytest tests/unit/test_describe.py tests/unit/test_html_export.py tests/issues/test_issue1816.py → 33/33 passed.
  • pytest tests/issues/ → 43 passed, 3 skipped, 1 unrelated failure
    (test_issue147 — needs pyarrow which isn't installed in my
    reproducible-build env; same failure on origin/develop, not introduced
    by this change).
  • Python 3.13.12, setuptools 82.0.1 (>= 81, the version that triggers the
    bug). pkg_resources is genuinely absent in this env, which is why the
    regression test is meaningful.
  • black --check clean on the three modified files.

GHA on the fork is not enabled, so the upstream pull-request.yml
workflow will be the first end-to-end signal across the supported Python
matrix; the pyproject pins >=3.10,<3.14, so the fix is in scope for
every supported version.

Edge cases tested

# Scenario Input Expected Verified by
1 setuptools >= 81 (no pkg_resources) import data_profiling succeeds; pkg_resources never imported test_profile_report_module_does_not_import_pkg_resources
2 versions helper used elsewhere import data_profiling.utils.versions succeeds; no pkg_resources test_versions_helper_does_not_import_pkg_resources
3 end-to-end report write ProfileReport(df).to_file("...") on a 3-row df writes HTML, sys.modules clean test_to_file_runs_without_pkg_resources
4 Pillow absent importlib.metadata.PackageNotFoundError warning skipped, write continues inline try/except PackageNotFoundError (covered by test 3 if Pillow optional)
5 Pillow pre-release version "11.0.0a1" numeric part loop short-circuits cleanly inline numeric_parts loop (no separate test, exercised whenever Pillow isn't a final release)

Risk / blast radius

  • Additive at the import layer — no public API change.
  • The Pillow-version warning is preserved, just driven from
    importlib.metadata. pkg_resources and importlib.metadata agree on
    the version string for any package installed via PEP 517.
  • One behavior change: if Pillow is somehow not installed, the previous
    code would have raised pkg_resources.DistributionNotFound (an
    uncaught crash from inside to_file); the new code warns nothing and
    continues. That is the safer default for a Pillow optional path. Worth
    a maintainer's nod, hence the inline comment.

Release note

fix: replace runtime pkg_resources lookups with importlib.metadata so
ProfileReport imports cleanly under setuptools >= 81. (#1816)

Upstream PR checklist (from .github/PULL_REQUEST_TEMPLATE/pull_request_template.md)

  • make lint (black) — clean on the 3 modified files. (Pre-existing
    isort drift on profile_report.py that already exists on develop is
    not addressed here to keep the diff scoped.)
  • make docs — n/a, no doc changes.
  • make testpytest tests/issues/test_issue1816.py -v → 3/3 passed
    on Python 3.13 with setuptools 82 (the env that triggers the bug).
  • make examples — n/a, no example changes.

PR drafted with assistance from Claude Code. The change was reviewed
manually against the upstream codebase on develop and the setuptools
v81 release notes. The reproducer block above was used during development;
it is the same one a reviewer can paste verbatim.

…ommunity#1816)

Drop the runtime dependency on pkg_resources from profile_report.py and
utils/versions.py. pkg_resources was removed in setuptools >= 81 (Aug
2025), which made every fresh install raise
ModuleNotFoundError: No module named 'pkg_resources' on import of
data_profiling.

importlib.metadata.version provides the same lookup and ships in the
stdlib from Python 3.8 onward; pyproject.toml already pins
requires-python >= 3.10, so the previous pkg_resources fallback in
utils/versions.py was dead code under any supported runtime.

The Pillow-version branch in ProfileReport.to_file is also hardened to
tolerate non-numeric pre-release segments (e.g. "11.0.0a1") and to skip
the warning rather than crash when Pillow is not installed (it's an
indirect dependency, so trimmed environments may lack it).

Adds tests/issues/test_issue1816.py with three guards:
  - data_profiling.profile_report import must not pull pkg_resources in
  - data_profiling.utils.versions import must not pull pkg_resources in
  - ProfileReport.to_file end-to-end must not re-import pkg_resources

All three tests fail on origin/develop (collection error: the import
chain crashes when pkg_resources is unavailable) and pass on this
branch.

Co-Authored-By: Claude Code <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace deprecated pkg_resources with importlib.metadata in profile_report.p

1 participant