modelmeta is a Python package that provides an Object Relational
Mapping (ORM)
layer for accessing the Pacific Climate Impacts Consortium
(PCIC)'s database of coverage
data metadata. The package
provides model classes for each of the tables in the database.
With this package, one can recreate the database schema in PostgreSQL or SQLite and/or use the package as an object mapper for programmatic database access.
modelmeta uses SQLAlchemy to provide the
ORM layer, and Alembic to
manage database creation and migration (see section below).
The intent of the database itself is to separate the small, inexpensive, structured metadata and attribute information (stored in the database) from the expensive-to-access bulk spatiotemporal data (stored on disk in multidimensional files). It provides an efficiently searchable index of the bulk data files, and separates storage from indexing.
Installation of this package, including system dependences, is automated through either make or poethepoet for Ubuntu 24.04:
$ make #option 1
$ poe setup #option 2
If you wish to install modelmeta manually, follow the steps below.
-
Clone the repository:
$ git clone https://github.com/pacificclimate/modelmeta -
Create and activate a virtual environment:
$ cd modelmeta $ python3 -m venv venv $ source venv/bin/activate $ pip install poetry $ poetry install # --with=test for development and testing
This repository contains two convenient scripts that add data files to an existing modelmeta database so that our websites can access data from them. They are installed when the package is installed.
index_netcdf adds one or more netCDF climate data files to a PCIC
modelmeta-format database:
index_netcdf -d postgresql://username:password@monsoon.pcic.uvic.ca/database /path/to/files/*.nc
Usernames and passwords can be found in Team Password Manager. To add
files to the data portal, use database pcic_meta; to add files to PCEX
or Plan2adapt, use database ce_meta_12f290b63791.
In order to determine the metadata of the file, the index_metadata
script scans its netCDF attributes. If the file does not have all the
required
attributes
specified, the index_metadata script will be unable to proceed. You
can look at a file's attributes with the command:
ncdump -h file.nc
and update attributes using the update_metadata script in the
climate-explorer-data-prep
respository. If you update file attributes, log your update YAML and a
list of which files you used with in in the
data-prep-actions
repository, in case you need to reprocess or check the files later.
Once files have been indexed into the database, they need to be added to individual ensembles; each ensemble is associated with a particular project or website and contains all data files needed to support the functioning of that component. In most cases, a file will be added to more than one ensemble:
associate_ensemble -n ensemble_name -v 1 -d postgresql://username:password@db.pcic.uvic.ca/database /path/to/files/*.nc
Available ensembles, or where should I put this data anyway?
Most ensembles represent groupings of related files that users can interact with (view maps, download data, create graphs, etc) using a specific PCIC tool. Plan2adapt, the data portal, SCIP, and PCEX all use ensembles in this way.
Plan2adapt uses a single ensemble which represents a list of all the
files a user can view. The name of this ensemble is set when plan2adapt
is deployed, as the environment variable REACT_APP_ENSEMBLE_NAME. You
can see the environment variables for a docker container running
plan2adapt with docker exec container_name env.
The data portal uses a separate ensemble for each portal, which represents a list of the data files a user can download from that portal. Each portal's ensemble is hard-coded in that portal's definition file, in pdp/portals .
PCEX is flexible about which ensembles it uses. A PCEX URL encodes both
a UI and an ensemble which specifies which data is to be viewed with
that UI. In theory you can look at any ensemble with any UI, but in
practice, UIs make assumptions about the type of data available and most
combinations won't work. In most cases, users access the various PCEX
UIs pages via the link bar at the top of the
page.
The navSubpath variable has the UI before the slash and the ensemble
after it. PCEX UIs that display hydrological data for a watershed also
have an additional ensemble that contains files that describe the
geography of the watershed; this data cannot be directly viewed by the
user but is required for some calculations. A list of these geographic
ensembles can be
found
in getWatershedGeographyName().
SCIP is currently hardcoded to use an ensemble named "scip_fraser_bccoast".
There are also two special ensembles used by PCIC internal tools, not web portals.
- The
all_filesensemble in thece_meta_12f290b63791contains every file in the database, with the exception of time-invariant files. It is used with various scripts that test new functionality across all files. - The
p2a_rulesensemble in thece_meta_12f290b63791database contains all the information needed by plan2adapt's rules engine; it is used by the scripts which pre-generate rules engine results for plan2adapt, which are too slow to process in real time.
Unfortunately, we don't currently have a script that can delete files
from the databases. If you accidentally index a file with bad metadata
and need to get rid of it, at present the only way is to log on to the
database directly with psql or pgadmin.
Climate coverage data (or "raster data" or "spatiotemporal data") consist of large data fields, typically over two or three dimensions in space plus a time dimension. Depending on the resolution in each axis, the data can typically be quite large in size. Typically there are several-to-many output quantities (e.g. temperature, precipiation, wind speed/direction) and often there can be multiple scenarios, multiple model implementations, and multiple runs of each model further exacerbating the size of the data.
Modifications to modelmeta's schema definition are now managed using
Alembic, a database migration tool based on SQLAlchemy.
In short, Alembic supports and disciplines two processes of database schema change:
- Creation of database migration scripts (Python programs) that modify the schema of a database.
- Application of migrations to specific database instances.
- In particular, Alembic can be used to create a new instance of
a
modelmetadatabase by migrating an empty database to the current state. This is described in detail below.
- In particular, Alembic can be used to create a new instance of
a
For more information, see the Alembic tutorial.
The existing instance of a modelmeta database (monsoon/pcic_meta)
was created prior to the adoption of Alembic, and therefore the timeline
for Alembic database migrations is slightly confusing.
Timeline:
- the distant past:
pcic_metais created by mysterious primeval processes. - somewhat later:
modelmetais defined using SQLAlchemy, mapping most (but not all) features of the existingpcic_metadatabase into an ORM. - 2017-07-18:
- Alembic is introduced.
- Alembic is used to create migration
614911daf883that adds itemseasonaltotimescaleEnum.
- 2017-08-01:
- The SQLAlchemy ORM is updated to reflect all features of the
pcic_metadatabase. This mainly involves adding some missing indexes and constraints. - Alembic is used to create a logically-previous migration
7847aa3c1b39that creates the initial database schema from an empty database. - The add-seasonal migration is modified to logically follow the initial-create migration.
- The SQLAlchemy ORM is updated to reflect all features of the
A Postgres database is somewhat more elaborate to set up, but it is also the foundation of a production database, not least because we use PostGIS.
Instructions:
-
Choose a name for your new database/schema, e.g.,
ce_meta. -
On the server of your choice (e.g.,
monsoon):Note: These operations must be performed with high-level permissions. See the System Administrator to have these done or obtain permissions.
For a record of such a creation, see Redmine Issue 696. Permission setup was more complicated than anticipated.
a. Create a new database with the chosen name, e.g.,
ce_meta.b. Within that database, create a new schema with the chosen name, e.g.,
ce_meta.c. Create new users, with the following permissions:
- `ce_meta` (database owner): full permissions for table creation and read-write permissions in schemas `ce_meta` and `public` - `ce_meta_rw` (database writer): read-write permissions in schemas `ce_meta` and `public` - `ce_meta_ro` (database reader): read-only permissions in schemas `ce_meta` and `public` and for each of them - `search_path = ce_meta,public`d. Enable PostGIS in the new database.
- `CREATE EXTENSION postgis;` - This creates the table `spatial_ref_sys` in schema `public`. Check that. -
Add a DSN for your new database, including the appropriate user name, to
alembic.ini. For example:[prod_ce_meta] sqlalchemy.url = postgresql://ce_meta@monsoon.pcic.uvic.ca/ce_meta -
Create your new database with Alembic by ugrading the empty database to
head:alembic -x db=prod_ce_meta upgrade head -
Have a beer.
A SQLite database is very simple to set up, but is normally used only for testing.
-
Add a DSN for your new database to
alembic.ini. This database need not exist yet (although the path does). For example:[my_test_database] sqlalchemy.url = sqlite:///path/to/test.sqlite -
Create your new database with Alembic by ugrading the non-existent database to
head:alembic -x db=my_test_database upgrade head -
Have a beer. Or at least a soda.
DEPRECATED: Decision taken not to modify pcic_meta This content is retained in case that decision is revised in future.
This section is only of interest to PCIC.
Status: NOT DONE
The following things need to be done ONCE in order to bring pcic_meta
under management by Alembic.
- The table
pcic_meta.alembic_versionhas already been created inpcic_metaby earlier operations. Its content is currentlynull. - Place the value
7847aa3c1b39in the single row and column of tablepcic_meta.alembic_versioninpcic_meta.- This fakes the migration from an empty database to its nominal initial state (before add-seasonal migration).
Once the initialization steps have been completed, ongoing migrations are simple and standard:
- Apply later migrations:
alembic -x db=prod_pcic_meta upgrade head- At the time of this writing (2017-08-01), that would be
migration
614911daf883.
- At the time of this writing (2017-08-01), that would be
migration
First create a new revision with this command:
alembic revision -m "<brief description of changes to the database>"
Alembic will create a new file in alembic/versions with stub upgrade() and downgrade()
functions. Fill in the upgrade function to make the change you want the migration to create
(IE, drop a table, recreate it with different column names) and the downgrade function to return
to the current database configuration from the new one (IE, drop a table, recreate it with
the old column names). The alembic.op object has various useful database manipulation functions
for this.
You can migrate a database to the new revision with the alembic upgrade head command discussed
earlier.