Skip to content

Change Proposal: Slugs for Data #47

@JBorrow

Description

@JBorrow

Current State

The current way that data references are stored in HIPPO is as follows:

Product:
  sources: list[FileMetadata(id=..., name=..., ...)]

Optionally, some metadata types may have additional information creating a link between FileMetadata.name and a specific keyword. For instance, this is what the maps table in MapSet does:

Product:
  metadata: MapSet:
    maps: {
     "coadd", "coadd_map.fits",
     ...
    }
  sources: list[FileMetadata(id=..., name="coadd_map.fits"), ...]

This is displayed quite nicely in the Web UI as follows:

Image

However, it's not really portable. One must know that, for MapSet, they should expect to look down into maps to extract the coadd map and then filter the sources to extract the path that they actually want... Not only is this complicated, requires a lot of knowledge about the system and underlying database structure, but also it would need to be implemented separately for all multi-source products.

This looks something like this. Say I wanted the coadd map:

file_name = product.metadata.maps["coadd"]
file_metadata = next(x for x in product.sources if x.name == file_name)
source = cache.get(file_metadata.uuid)

We should keep maps as it is a table that can store map-specific information. But we need an additional piece of metadata that is universal to all products to let us know how to map from product to file.

Proposal

I would like to propose storing a slug for each and every source. This slug would default to data.

Then, in the HIPPO python client, one would do:

product.slug

where this returns one of:

  • A local fully resolved path to a cached version of the product
  • A BytesIO object that is filled with the bytes of the product. This would be the case if one was running in cache-less mode where products are downloaded on-demand as they are used.

To enable this, we would need to:

  1. Add a slug to every FileMetadata object, ingested at upload time
  2. Have a pre-defined set of allowed slugs in each Metadata type. These would be checked at upload. We would support a wildcard slug, but slugs should be validated to make sure they are snake_case
  3. Change all the examples and UI to use this new structure.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions