Current State
The current way that data references are stored in HIPPO is as follows:
Product:
sources: list[FileMetadata(id=..., name=..., ...)]
Optionally, some metadata types may have additional information creating a link between FileMetadata.name and a specific keyword. For instance, this is what the maps table in MapSet does:
Product:
metadata: MapSet:
maps: {
"coadd", "coadd_map.fits",
...
}
sources: list[FileMetadata(id=..., name="coadd_map.fits"), ...]
This is displayed quite nicely in the Web UI as follows:
However, it's not really portable. One must know that, for MapSet, they should expect to look down into maps to extract the coadd map and then filter the sources to extract the path that they actually want... Not only is this complicated, requires a lot of knowledge about the system and underlying database structure, but also it would need to be implemented separately for all multi-source products.
This looks something like this. Say I wanted the coadd map:
file_name = product.metadata.maps["coadd"]
file_metadata = next(x for x in product.sources if x.name == file_name)
source = cache.get(file_metadata.uuid)
We should keep maps as it is a table that can store map-specific information. But we need an additional piece of metadata that is universal to all products to let us know how to map from product to file.
Proposal
I would like to propose storing a slug for each and every source. This slug would default to data.
Then, in the HIPPO python client, one would do:
where this returns one of:
- A local fully resolved path to a cached version of the product
- A
BytesIO object that is filled with the bytes of the product. This would be the case if one was running in cache-less mode where products are downloaded on-demand as they are used.
To enable this, we would need to:
- Add a
slug to every FileMetadata object, ingested at upload time
- Have a pre-defined set of allowed
slugs in each Metadata type. These would be checked at upload. We would support a wildcard slug, but slugs should be validated to make sure they are snake_case
- Change all the examples and UI to use this new structure.
Current State
The current way that data references are stored in HIPPO is as follows:
Optionally, some metadata types may have additional information creating a link between
FileMetadata.nameand a specific keyword. For instance, this is what themapstable inMapSetdoes:This is displayed quite nicely in the Web UI as follows:
However, it's not really portable. One must know that, for
MapSet, they should expect to look down intomapsto extract thecoaddmap and then filter thesourcesto extract the path that they actually want... Not only is this complicated, requires a lot of knowledge about the system and underlying database structure, but also it would need to be implemented separately for all multi-source products.This looks something like this. Say I wanted the coadd map:
We should keep
mapsas it is a table that can store map-specific information. But we need an additional piece of metadata that is universal to all products to let us know how to map fromproducttofile.Proposal
I would like to propose storing a
slugfor each and every source. Thisslugwould default todata.Then, in the HIPPO python client, one would do:
where this returns one of:
BytesIOobject that is filled with the bytes of the product. This would be the case if one was running in cache-less mode where products are downloaded on-demand as they are used.To enable this, we would need to:
slugto everyFileMetadataobject, ingested at upload timeslugs in eachMetadatatype. These would be checked at upload. We would support a wildcard slug, but slugs should be validated to make sure they aresnake_case