Replace hasattr(data, "__len__") duck typing with ndim-based collection detection#1414
Open
h-mayorquin wants to merge 5 commits intodevfrom
Open
Replace hasattr(data, "__len__") duck typing with ndim-based collection detection#1414h-mayorquin wants to merge 5 commits intodevfrom
hasattr(data, "__len__") duck typing with ndim-based collection detection#1414h-mayorquin wants to merge 5 commits intodevfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev #1414 +/- ##
==========================================
+ Coverage 92.85% 92.87% +0.02%
==========================================
Files 41 41
Lines 9989 10007 +18
Branches 2054 2060 +6
==========================================
+ Hits 9275 9294 +19
+ Misses 436 435 -1
Partials 278 278 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The function returns shape[0], not len(). The new name reflects the actual semantics: the size of the first dimension, which is what the array API standard exposes via .shape rather than __len__. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After discussion, _get_length is the clearest name: it maps to what Python programmers understand as "how many items if I iterate this", which is what every call site needs. The implementation detail of using shape[0] vs len() belongs in the docstring, not the name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivated by hdmf-dev/hdmf-zarr#325 (zarr v2 to v3 migration).
hdmf uses
hasattr(data, "__len__")across hte library (utils.py,h5tools.py,validator.py) to decide whether data is a collection or a scalar. This has worked so far because numpy implements__len__and other array libraries (h5py, zarr v2) followed that convention. But__len__is not part of any array specification (see the array API standard object definition for the required attributes and methods), so we cannot rely on future array libraries continuing to provide it. The check is also implicit about what it is actually trying to detect. The zarr v2 to v3 migration (hdmf-dev/hdmf-zarr#325) exposed this, as zarr v3 arrays no longer implement__len__. The__len__heuristic also has a bug in the current release: 0-d ndarrays have__len__defined butlen()raisesTypeError, so passing one toHDF5IO.get_typeorwrite_datasetcrashes. This already happens when converting results from array-API-conforming libraries (which return 0-d arrays from reductions likemeanandsum) to numpy vianp.asarray(). A regression test for this is included.This PR centralizes the "is this a collection?" logic into a single helper (
_is_collection) that is both more general and more explicit: checkndim > 0first (which the array API standard guarantees for any conforming array), then fall back to__len__for plain Python containers. A companion_get_lengthhelper safely gets first-dimension length viashape[0]orlen(). This replaces all scattered checks with one clear, documented policy that covers both legacy types and array API standard conforming libraries, and shoudl eliminate the need for monkey-patching in hdmf-zarr.The tests that are failing are because pynwb has not updated its ceiling on hdmf I think
Related to #1235
Checklist
CHANGELOG.mdwith your changes?