Skip to content

Conversation

@luccadibe
Copy link
Contributor

@luccadibe luccadibe commented Dec 1, 2025

This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for SYSTEMDS-3929 .

The existing tests used three datasets which were commited , for example src/test/scripts/functions/io/hdf5/in/transfusion_1.h5 , I could not find a generator script in the codebase for these.

In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "rhdf5" , which was the one already being used before for validation.

The test loads the dataset with systemds and R and verifies that the outputs match using TestUtils.compareMatrices .

Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds.
Message 11 is the Filter Pipeline Message
Message 12 is the Attribute Message

These messages seem to be applied by default by rhdf5 version 2.54.0 .

I have worked on fixes to all of the bugs, enough so to be able to load the So2Sat LCZ42 dataset .
I can open different PRs for the fixes if it is desired.
The generator generates many test datasets, some of which are out of scope to test for.
For example, the chunked / compressed datasets. The test currently does not use them since they are out of scope for my project; however I think they can be useful to keep improving the support of HDF5 in systemds.

@luccadibe luccadibe marked this pull request as ready for review January 5, 2026 17:31
@mboehm7
Copy link
Contributor

mboehm7 commented Jan 16, 2026

LGTM - thanks for the additional tests @luccadibe, and great to hear that you fixed the issues already. During the merge I put a ignore annotation on the test to ensure the build is not failing. Interestingly, the package 'rhdf5' is not available for my up-to-date version of R yet.

@mboehm7 mboehm7 closed this in 5913c42 Jan 16, 2026
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Jan 16, 2026
@luccadibe
Copy link
Contributor Author

LGTM - thanks for the additional tests @luccadibe, and great to hear that you fixed the issues already. During the merge I put a ignore annotation on the test to ensure the build is not failing. Interestingly, the package 'rhdf5' is not available for my up-to-date version of R yet.

Hmm. Maybe because the package needs to be installed through a manager ? https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html . That is how I installed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants