fix: replace pkg_resources with importlib.metadata (#1816)#1847
Open
jbbqqf wants to merge 1 commit into
Open
Conversation
…ommunity#1816) Drop the runtime dependency on pkg_resources from profile_report.py and utils/versions.py. pkg_resources was removed in setuptools >= 81 (Aug 2025), which made every fresh install raise ModuleNotFoundError: No module named 'pkg_resources' on import of data_profiling. importlib.metadata.version provides the same lookup and ships in the stdlib from Python 3.8 onward; pyproject.toml already pins requires-python >= 3.10, so the previous pkg_resources fallback in utils/versions.py was dead code under any supported runtime. The Pillow-version branch in ProfileReport.to_file is also hardened to tolerate non-numeric pre-release segments (e.g. "11.0.0a1") and to skip the warning rather than crash when Pillow is not installed (it's an indirect dependency, so trimmed environments may lack it). Adds tests/issues/test_issue1816.py with three guards: - data_profiling.profile_report import must not pull pkg_resources in - data_profiling.utils.versions import must not pull pkg_resources in - ProfileReport.to_file end-to-end must not re-import pkg_resources All three tests fail on origin/develop (collection error: the import chain crashes when pkg_resources is unavailable) and pass on this branch. Co-Authored-By: Claude Code <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the runtime dependency on
pkg_resources(removed in setuptools 81)with the stdlib
importlib.metadata, soimport data_profilingno longercrashes on fresh installs that pull in modern setuptools.
Fixes #1816 — Replace deprecated pkg_resources with importlib.metadata in profile_report.py.
Context
pkg_resourceswas removed from setuptools in v81 (changelog entry).After Aug 2025, fresh
pip install fg-data-profilingresolutions on aclean environment routinely crash at
importtime withModuleNotFoundError: No module named 'pkg_resources', becausesrc/data_profiling/profile_report.py:11doesimport pkg_resourcesunconditionally and
src/data_profiling/utils/versions.pykeeps apkg_resourcesfallback.That fallback was meant to support Python < 3.8 (where
importlib.metadatadid not exist), butpyproject.tomlalready pinsrequires-python = ">=3.10,<3.14"since #1778, so the fallback branch isdead code under any supported runtime.
The original issue reporter validated the change locally with a 2195-test
green run on Python 3.13.
Changes
src/data_profiling/profile_report.py: dropimport pkg_resources,use
importlib.metadata.version("Pillow")inProfileReport.to_file.Wrap the lookup in a
PackageNotFoundErrorguard (Pillow is an indirectdependency, so a trimmed environment without it should warn-and-continue,
not crash). Parse numeric components defensively so pre-release versions
like
"11.0.0a1"don't blow upint().src/data_profiling/utils/versions.py: drop thepkg_resourcesfallbackbranch.
importlib.metadata.versionis unconditional now.tests/issues/test_issue1816.py(new): three regression guards.The targeted comments inline explain why the branch was kept (Pillow may
be absent) and why the numeric-parts loop exists (pre-release segments) so
a reviewer reading the diff cold doesn't have to re-derive the reasoning.
Reproduce BEFORE/AFTER yourself (copy-paste)
The block above is a single bash run a reviewer can paste verbatim. The
only thing that changes between BEFORE and AFTER is the checked-out git
ref — that's the load-bearing piece of evidence.
What I ran locally
pytest tests/issues/test_issue1816.py -v→ 3/3 passed.pytest tests/unit/test_describe.py tests/unit/test_html_export.py tests/issues/test_issue1816.py→ 33/33 passed.pytest tests/issues/→ 43 passed, 3 skipped, 1 unrelated failure(
test_issue147— needspyarrowwhich isn't installed in myreproducible-build env; same failure on
origin/develop, not introducedby this change).
bug). pkg_resources is genuinely absent in this env, which is why the
regression test is meaningful.
black --checkclean on the three modified files.GHA on the fork is not enabled, so the upstream
pull-request.ymlworkflow will be the first end-to-end signal across the supported Python
matrix; the pyproject pins
>=3.10,<3.14, so the fix is in scope forevery supported version.
Edge cases tested
import data_profilingtest_profile_report_module_does_not_import_pkg_resourcesimport data_profiling.utils.versionstest_versions_helper_does_not_import_pkg_resourcesProfileReport(df).to_file("...")on a 3-row dftest_to_file_runs_without_pkg_resourcesimportlib.metadata.PackageNotFoundErrortry/except PackageNotFoundError(covered by test 3 if Pillow optional)"11.0.0a1"numeric_partsloop (no separate test, exercised whenever Pillow isn't a final release)Risk / blast radius
importlib.metadata.pkg_resourcesandimportlib.metadataagree onthe version string for any package installed via PEP 517.
code would have raised
pkg_resources.DistributionNotFound(anuncaught crash from inside
to_file); the new code warns nothing andcontinues. That is the safer default for a Pillow optional path. Worth
a maintainer's nod, hence the inline comment.
Release note
Upstream PR checklist (from
.github/PULL_REQUEST_TEMPLATE/pull_request_template.md)make lint(black) — clean on the 3 modified files. (Pre-existingisort drift on
profile_report.pythat already exists ondevelopisnot addressed here to keep the diff scoped.)
make docs— n/a, no doc changes.make test—pytest tests/issues/test_issue1816.py -v→ 3/3 passedon Python 3.13 with setuptools 82 (the env that triggers the bug).
make examples— n/a, no example changes.PR drafted with assistance from Claude Code. The change was reviewed
manually against the upstream codebase on
developand the setuptoolsv81 release notes. The reproducer block above was used during development;
it is the same one a reviewer can paste verbatim.