Faster definitive data DOIs #11

SimonFlower · 2026-03-02T10:36:57Z

SimonFlower
Mar 2, 2026
Maintainer

Options for faster DOI publication

Summary

The current format of archive files used to access Intermagnet DOI data (magYYYY_defYYYY.zip) will not allow the frequency of updates that we want. A new way of grouping data is needed. A proposal is made for keeping data in single observatory-year archive files. This would significantly simplify the publication process, but would require software to be created for users to access the data.

Requirements for Intermagnet definitive data publication

Publication of definitive data soon after it has been successfully checked
Adherence to DOI rules - data can be added, but not altered or removed
Simplicity of access to data for users
Clear citation for data providers

Current arrangements

The current formats for data mean that one year of Intermagnet definitive data occupy a large number of files (e.g. over 2,000 in 2019). The current data structure for definitive data publication is described here: https://tech-man.intermagnet.org/stable/appendices/archivedataformats.html#intermagnet-physical-media-directory-structure. Currently data for DOI publication is stored in annual zip archives. This leads to a manageable number of data files (under 100 for all DOI publications across all years).

The imcdview software currently requires data in the structure described in the Intermagnet manual, but only the folders (and their observatory data sub-folders) are necessary - other folders (such as , <obsy_inf> and <ctry_inf>) are not required.

For each year, data is requested from data providers until no further data can be expected before the entire annual data set is published.

Our current system can be divided into publication of two different types of data:

New data that has not previously been published - this occurs frequently
Correction to data that has previosuly been published - this occurs infrequently

Discussion

In order to make management of the data possible for both administrators and users, data needs to be grouped into a small number of archive files. Publishing the data without grouping it into archive files would simplify adding new data to a DOI, but is unrealistic for maintenance of the DOI data archive and would make it complex for users to access the data. It is not acceptable to add data to a ZIP archive file - this constitutes a change in terms of the DOI.

Data is currently grouped into archive files by year. It could alternatively be grouped by observatory. However there is no obvious advantage to grouping by observatory - doing so would not make it easier to add data to a DOI in a form that allows users easy access to the data that they want.

How frequently do we anticipate adding data to the DOIs? Answering this question will help define an appropriate way to structure the data. For previously unpublished data, we are agreed that we should stop waiting for all data from a year to be available before publishing. However we cannot expect data to be published as soon as it is checked - this could lead to weekly or more frequent updates. I suggest either monthly or 4-times yearly would be realistic. Either of these frequencies would allow much more rapid availability of data for users than our current publication system.

Corrections to previously published data are relatively infrequent, but once it is known that corrected data is available it is important to publish the corrections as quickly as possible. Corrected data is more complex to publish under a DOI as the previous (incorrect) data must be maintained alongside the new (correct) data.

Adding new definitive data as soons as it is made available by data checkers increases the complexity of publication. At present we collect data from all observatories for a given year into a single archive, waiting until the data set is as complete as possible before publication. Data is published in archive files with names mag_def.zip. We cannot continue to use this form of archive file if we plan to publish data more frequently that once a year. At the least would need to become . Morevover, for each , we will generate (and need to keep) many publications. For example, if the entire definitive data set for the year 2025 takes 4 years to collect and is published 4-times a year, there would be a 16-fold increase in the number of archive files we need to store.

This approach may be possible for minute data, where the size of data files is relatively small, but will it be acceptable for 1-second data?

Alternative proposal

Hold data in archives that contain one year of data from just one observatory. To allow the data repository to be searched, the archive file names must include:

The observatory IAGA code
The year the data was recorded
The year and month of publication
EG: esk_2020_2023-03.zip
esk is the observatory IAGA code
2020 is the year of the data
2023-03 is the year and month of publication

Data can be easily added to this archive as it becomes available. There will never be the need to "re-publish" anything, data will only ever be added. Corrections to data are applied by creating a new archive file with an updated year and month of publication. This will also reduce the size of data files when correcting data. In the present system, correction of one observatory's data means that the entire year with all observatories is re-published, whereas this system will only add one "observatory-year" of data - a reduction of around 99%.

If we adopt this approach, a single DOI can be used for all ongoing Intermagnet definitive data publication.

The current use of symbolic links to show users the most recently published data is no long required (further simplifying publication).

The number of archive files will make it too complex for users to download data directly. Instead an application (e.g. a web form) will be needed that allows users to enter:

A list of observatories
A range of years describing the data
A publication date
In response the application will provide the user with the list of observatory-year archive files that were available at the publication date that the user chose. This system allows easy and exact repeatability of searches, meaning that authors can cite the Intermagnet DOI along with a publication date in their papers, and colleagues are able to exactly replicate the same data set. Similar search functionality can be built into the imcdview application (and MagPy?) to allow software to directly download a precise data set using the DOI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster definitive data DOIs #11

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Faster definitive data DOIs #11

Uh oh!

SimonFlower Mar 2, 2026 Maintainer

Options for faster DOI publication

Summary

Requirements for Intermagnet definitive data publication

Current arrangements

Discussion

Alternative proposal

Replies: 0 comments

SimonFlower
Mar 2, 2026
Maintainer