initial plugin metadata extraction by JFRudzinski · Pull Request #159 · FAIRmat-NFDI/nomad-parser-plugins-simulation

JFRudzinski · 2026-03-19T14:19:54Z

Summary

This PR adds the first metadata extraction for nomad-simulation-parsers using the nomad-plugin-metadata pipeline.

What to review

Please review nomad_plugin_metadata.yaml.
This is the canonical, merged file intended for querying/registry usage.

How files work

.metadata/nomad_plugin_metadata.auto.yaml
Machine-generated metadata from package/plugin introspection.
.metadata/nomad_plugin_metadata.manual.yaml
Maintainer-owned manual curation/overrides (not machine-overwritten).
nomad_plugin_metadata.yaml
Final merged output (auto + manual; manual non-empty values take precedence).
.metadata/plugin-metadata.override-report.yaml
Report of conflicts where manual overrides auto.

Goal of this PR

Initial extraction pass for developer feedback before broader rollout:

verify extracted parser/file-format metadata
identify missing/incorrect fields
align expected manual curation scope

References

nomad-plugins-metadata: https://github.com/FAIRmat-NFDI/nomad-plugins-metadata, https://fairmat-nfdi.github.io/nomad-plugins-metadata/reference/schema_reference.html#full-schema-shaped-yaml-template
datatractor: https://github.com/datatractor

JFRudzinski · 2026-03-19T14:30:21Z

@ndaelman-hu If you want more context, see https://github.com/FAIRmat-NFDI/nomad-plugins-metadata -- inspired from and with export compatibility to datatractor ... for now I am mostly concerned with the schema itself and the accuracy/usefulness of the metadata that is extracted

ndaelman-hu · 2026-03-20T14:35:19Z

@ndaelman-hu If you want more context, see https://github.com/FAIRmat-NFDI/nomad-plugins-metadata -- inspired from and with export compatibility to datatractor ... for now I am mostly concerned with the schema itself and the accuracy/usefulness of the metadata that is extracted

I see. So this is a design for our own in-house plugin metadata, with compatibility for datatractor towards the future in mind?
Even if the latter isn't the current focus, I would start there as it gives a template for such schemas. I'll then evaluate all your additions on top.

JFRudzinski · 2026-03-22T19:58:42Z

@ndaelman-hu If you want more context, see https://github.com/FAIRmat-NFDI/nomad-plugins-metadata -- inspired from and with export compatibility to datatractor ... for now I am mostly concerned with the schema itself and the accuracy/usefulness of the metadata that is extracted

I see. So this is a design for our own in-house plugin metadata, with compatibility for datatractor towards the future in mind? Even if the latter isn't the current focus, I would start there as it gives a template for such schemas. I'll then evaluate all your additions on top.

Yes exactly, it was inspired/built directly off of datatractor, with tooling to generate datatractor compliant metadata files, but with extension towards nomad-specific usage.

ndaelman-hu

Schema within NOMAD Plugins

I'm trying to understand the primary objective of this schema. I noticed that extracts into a single yaml file:

endpoint metadata (names, etc.)
project metadata (authors, etc.)
GitHub telemetry

So, I surmise at this point that the schema aims to provide an intermediate (and centralized) artifact to generate a documentation page?

Schema vs DataTractor

Given that it seems partially inspired by DataTractor, I would like to contrast their goals.
DataTractor aims to provide an automated setup:

gives project and author metadata, sure. This may incentivize community building.
capture all necessary information to install a fileformat-specific parser.
also the instructions on how to call it once installed.

This yields a single interface that can deploy a whole parsing library environment, while avoiding excessive installation.

Final Objective

This objective is less relevant in our NOMAD universe, where plugins are already installed and integrated via entry points. If we do want to register NOMAD parser in DataTractor in the future, we will have to provide an installation and calling template too, though.

Some additional questions to help focus the objective:

What are the intended consumers of this metadata (documentation generator, plugin registry, CI/CD)?
How will this be kept in sync with code changes (automated CI/CD or manual regeneration)?

ndaelman-hu · 2026-03-23T09:34:34Z

nomad_plugin_metadata.yaml

Comparison chart metadata fields between this schema and DataTractor:

NOMAD Field Datatractor Field Conversion Quality

id id Direct ✅ Perfect

name name Direct ✅ Perfect

description description Direct ✅ Perfect

subject subject Direct ✅ Perfect

upstream_repository source_repository Direct ✅ Perfect

documentation documentation Direct ✅ Perfect

license license.spdx String → object ✅ Good

supported_filetypes supported_filetypes[].id Direct ✅ Perfect

file_format_support FileType entries Nested → standalone ⚠️ Manual

schema_dependencies installation Declarative → executable ❌ Lossy

parser_details usage Regex → templates ❌ Cannot automate

entry_points N/A No mapping ❌ NOMAD-specific

On coverage of current schema:

taken from entrypoints metadata

Field In init.py In Metadata Schema

Parser name ✅ name='parsers/vasp' ✅ parser_name

Filename pattern ✅ mainfile_name_re ✅ mainfile_name_re

Content pattern ✅ mainfile_contents_re ✅ mainfile_contents_re

MIME pattern ✅ mainfile_mime_re ✅ mainfile_mime_re

Binary header ✅ mainfile_binary_header ✅ mainfile_binary_header

Aliases ✅ aliases=['parsers/vasp'] ✅ parser_aliases

Compression ✅ supported_compressions ✅ compression_support

Level ✅ level=0 ✅ parser_level

Gap: code_name and code_homepage exist in some __init__.py files but aren't extracted to metadata schema.

taken from Git project metadata

Category Examples

Package info plugin_version, license, upstream_repository

People authors[], maintainers[] with emails/affiliations

GitHub telemetry stars, created, last_updated, archived

Deployment on_central, on_pypi, pypi_package

Discovery suggested_usages, subject tags, maturity

Datatractor supported_filetypes (FileType IDs)

Provenance Where/when/how metadata was generated

initial extraction

dc9b536

JFRudzinski requested a review from ndaelman-hu March 19, 2026 14:27

JFRudzinski marked this pull request as draft March 19, 2026 14:36

forgot manual sync?

28a52a7

ndaelman-hu reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial plugin metadata extraction#159

initial plugin metadata extraction#159
JFRudzinski wants to merge 2 commits intodevelopfrom
metadata-extractor

JFRudzinski commented Mar 19, 2026 •

edited

Loading

Uh oh!

JFRudzinski commented Mar 19, 2026

Uh oh!

ndaelman-hu commented Mar 20, 2026

Uh oh!

JFRudzinski commented Mar 22, 2026

Uh oh!

ndaelman-hu left a comment

Uh oh!

ndaelman-hu Mar 23, 2026

Uh oh!

ndaelman-hu Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NOMAD Field	Datatractor Field	Conversion	Quality
id	id	Direct	✅ Perfect
name	name	Direct	✅ Perfect
description	description	Direct	✅ Perfect
subject	subject	Direct	✅ Perfect
upstream_repository	source_repository	Direct	✅ Perfect
documentation	documentation	Direct	✅ Perfect
license	license.spdx	String → object	✅ Good
supported_filetypes	supported_filetypes[].id	Direct	✅ Perfect
file_format_support	FileType entries	Nested → standalone	⚠️ Manual
schema_dependencies	installation	Declarative → executable	❌ Lossy
parser_details	usage	Regex → templates	❌ Cannot automate
entry_points	N/A	No mapping	❌ NOMAD-specific

Field	In init.py	In Metadata Schema
Parser name	✅ name='parsers/vasp'	✅ parser_name
Filename pattern	✅ mainfile_name_re	✅ mainfile_name_re
Content pattern	✅ mainfile_contents_re	✅ mainfile_contents_re
MIME pattern	✅ mainfile_mime_re	✅ mainfile_mime_re
Binary header	✅ mainfile_binary_header	✅ mainfile_binary_header
Aliases	✅ aliases=['parsers/vasp']	✅ parser_aliases
Compression	✅ supported_compressions	✅ compression_support
Level	✅ level=0	✅ parser_level

Category	Examples
Package info	plugin_version, license, upstream_repository
People	authors[], maintainers[] with emails/affiliations
GitHub telemetry	stars, created, last_updated, archived
Deployment	on_central, on_pypi, pypi_package
Discovery	suggested_usages, subject tags, maturity
Datatractor	supported_filetypes (FileType IDs)
Provenance	Where/when/how metadata was generated

Conversation

JFRudzinski commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What to review

How files work

Goal of this PR

References

Uh oh!

JFRudzinski commented Mar 19, 2026

Uh oh!

ndaelman-hu commented Mar 20, 2026

Uh oh!

JFRudzinski commented Mar 22, 2026

Uh oh!

ndaelman-hu left a comment

Choose a reason for hiding this comment

Schema within NOMAD Plugins

Schema vs DataTractor

Final Objective

Uh oh!

ndaelman-hu Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ndaelman-hu Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JFRudzinski commented Mar 19, 2026 •

edited

Loading