problem
Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When git clone, the entire history is downloaded, and with it all versions of the binary files.
Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think VIP_extras will grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).
alternatives
git-lfs
git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.
advantages
- regular git
- as larges files are not part of the repo, the repo size does not increase with new/changed files
disadvantages
- slightly more complicated setup (difficult to move existing files to lfs → rewrite history, etc.)
- users need git-lfs installed to clone the repo
- Binder does not seem to support lfs
bintray
advantages
- free for open source, tightly integrated with GitHub (e.g. organizations)
- simple to use (web interface for uploading,
curl for downloading and astropy.utils.data.download_file for python)
- keeps multiple file versions (like git or git-lfs)
disadvantages
demo
I created a bintray project for VIP, and uploaded the IFS cube for testing.
Take a look at the project site: https://bintray.com/r4lv/vip/data-cubes
Using the files in python would be
from astropy.utils.data import download_file
fn = download_file("https://dl.bintray.com/r4lv/vip/IFS_HD64568.vip.npz")
dataset = vip.HCIDataset.load(fn)
problem
Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When
git clone, the entire history is downloaded, and with it all versions of the binary files.Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think
VIP_extraswill grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).alternatives
git-lfs
git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.
advantages
disadvantages
bintray
advantages
curlfor downloading andastropy.utils.data.download_filefor python)disadvantages
demo
I created a bintray project for VIP, and uploaded the IFS cube for testing.
Take a look at the project site: https://bintray.com/r4lv/vip/data-cubes
Using the files in python would be