-
Notifications
You must be signed in to change notification settings - Fork 520
Add support for multiple and 2>n dimensional HDF5 Datasets #2394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2394 +/- ##
============================================
- Coverage 72.29% 71.52% -0.78%
- Complexity 46937 47446 +509
============================================
Files 1513 1539 +26
Lines 178421 182620 +4199
Branches 35036 35913 +877
============================================
+ Hits 128993 130611 +1618
- Misses 39665 42015 +2350
- Partials 9763 9994 +231 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
LGTM - thanks for the patch @luccadibe. Overall, this generalization is great. During the merge, I resolve the merge conflicts, reverted the format changes in MetaDataAll, disabled the last test cases (group1/subgroup/data2) which was consistently failing, and replaced the stdout tracing with log tracing. |
This PR aims to fix and optimize the HDF5Reader implementation from systemds with the goal of being able to correctly read the So2Sat LCZ42 dataset (https://mediatum.ub.tum.de/1454690) .
For this I added support for the filter pipeline and attribute message types from HDF5 ;
n dimensional matrices with n>2 are flattened into 2d .
I also added support for inferring hdf5 from the .h5 file extension.
Apologies for the massive PR 🤕 .
I benchmarked the performance of the new implementation and shared results in this repo:
https://github.com/luccadibe/systemds-hdf5-reader-benchmark
The code still needs some work regarding code style and formatting ( I am not sure if I set up the fomatter correctly as mentioned in the CONTRIBUTING.md ; in some files I was getting a huge diff so I tried to format only what I touched).
I am unsure about how to best split this into multiple PRs , or if that is wanted even.
I would appreciate some general feedback on this.