Skip to content

IDEA: move hosting of data cubes to bintray.com #7

@r4lv

Description

@r4lv

problem

Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When git clone, the entire history is downloaded, and with it all versions of the binary files.

Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think VIP_extras will grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).

alternatives

git-lfs

git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.

advantages

  • regular git
  • as larges files are not part of the repo, the repo size does not increase with new/changed files

disadvantages

  • slightly more complicated setup (difficult to move existing files to lfs → rewrite history, etc.)
  • users need git-lfs installed to clone the repo
  • Binder does not seem to support lfs

bintray

advantages

  • free for open source, tightly integrated with GitHub (e.g. organizations)
  • simple to use (web interface for uploading, curl for downloading and astropy.utils.data.download_file for python)
  • keeps multiple file versions (like git or git-lfs)

disadvantages

  • none?

demo

I created a bintray project for VIP, and uploaded the IFS cube for testing.

Take a look at the project site: https://bintray.com/r4lv/vip/data-cubes

Using the files in python would be

from astropy.utils.data import download_file

fn = download_file("https://dl.bintray.com/r4lv/vip/IFS_HD64568.vip.npz")
dataset = vip.HCIDataset.load(fn)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions