For the compute benchmarks, we've been generating and persisting the data in memory for every combination of chunk_size and chunking_scheme prior the computations:
chunk_size:
- 32MB
- 64MB
- 128MB
- 256MB
chunking_scheme:
- spatial
- temporal
- auto
Per discussions with @rabernat, @kmpaul, @tinaok, @guillaumeeb, it is crucial to have an I/O component that emulates real use cases: the data will almost always live on the filesystem and be bigger than what we can persist into memory.
I/O benchmarks
A few months ago, @kmpaul and @halehawk conducted an IOR-based I/O scaling study (C/MPI-based code) that compared:
- Z5
- netCDF4
- HDF5
- PnetCDF
- MPIIO
- POSIX
In zarr-hdf-benchmarks (Python/mpi4py-based code), @rabernat compared both the write and read components.
How should we go on about incorporating I/O component in the compute benchmarks?
- Should we focus on the
read component by generating a dataset with same chunking and compression to both netcdf4 and zarr for every chunk_size and chunking_scheme combination, and then testing a variety of access approaches?
- Should the
write component be taken into consideration too?
- One of our longterm goals for this repo is that the benchmarks should be runnable on different platforms (HPC, Cloud) and storage systems. Both https://github.com/rabernat/zarr_hdf_benchmarks and https://github.com/NCAR/ior_scaling are MPI dependent, and I was wondering whether the I/O components for these benchmarks can be Python/Dask based?
For the compute benchmarks, we've been generating and persisting the data in memory for every combination of
chunk_sizeandchunking_schemeprior the computations:Per discussions with @rabernat, @kmpaul, @tinaok, @guillaumeeb, it is crucial to have an I/O component that emulates real use cases: the data will almost always live on the filesystem and be bigger than what we can persist into memory.
I/O benchmarks
A few months ago, @kmpaul and @halehawk conducted an IOR-based I/O scaling study (C/MPI-based code) that compared:
In zarr-hdf-benchmarks (Python/mpi4py-based code), @rabernat compared both the
writeandreadcomponents.How should we go on about incorporating I/O component in the compute benchmarks?
readcomponent by generating a dataset with same chunking and compression to both netcdf4 and zarr for everychunk_sizeandchunking_schemecombination, and then testing a variety of access approaches?writecomponent be taken into consideration too?