Conversation
|
@ndaelman-hu If you want more context, see https://github.com/FAIRmat-NFDI/nomad-plugins-metadata -- inspired from and with export compatibility to datatractor ... for now I am mostly concerned with the schema itself and the accuracy/usefulness of the metadata that is extracted |
I see. So this is a design for our own in-house plugin metadata, with compatibility for datatractor towards the future in mind? |
Yes exactly, it was inspired/built directly off of datatractor, with tooling to generate datatractor compliant metadata files, but with extension towards nomad-specific usage. |
ndaelman-hu
left a comment
There was a problem hiding this comment.
Schema within NOMAD Plugins
I'm trying to understand the primary objective of this schema. I noticed that extracts into a single yaml file:
- endpoint metadata (names, etc.)
- project metadata (authors, etc.)
- GitHub telemetry
So, I surmise at this point that the schema aims to provide an intermediate (and centralized) artifact to generate a documentation page?
Schema vs DataTractor
Given that it seems partially inspired by DataTractor, I would like to contrast their goals.
DataTractor aims to provide an automated setup:
- gives project and author metadata, sure. This may incentivize community building.
- capture all necessary information to install a fileformat-specific parser.
- also the instructions on how to call it once installed.
This yields a single interface that can deploy a whole parsing library environment, while avoiding excessive installation.
Final Objective
This objective is less relevant in our NOMAD universe, where plugins are already installed and integrated via entry points. If we do want to register NOMAD parser in DataTractor in the future, we will have to provide an installation and calling template too, though.
Some additional questions to help focus the objective:
- What are the intended consumers of this metadata (documentation generator, plugin registry, CI/CD)?
- How will this be kept in sync with code changes (automated CI/CD or manual regeneration)?
There was a problem hiding this comment.
Comparison chart metadata fields between this schema and DataTractor:
| NOMAD Field | Datatractor Field | Conversion | Quality |
|---|---|---|---|
| id | id | Direct | ✅ Perfect |
| name | name | Direct | ✅ Perfect |
| description | description | Direct | ✅ Perfect |
| subject | subject | Direct | ✅ Perfect |
| upstream_repository | source_repository | Direct | ✅ Perfect |
| documentation | documentation | Direct | ✅ Perfect |
| license | license.spdx | String → object | ✅ Good |
| supported_filetypes | supported_filetypes[].id | Direct | ✅ Perfect |
| file_format_support | FileType entries | Nested → standalone | |
| schema_dependencies | installation | Declarative → executable | ❌ Lossy |
| parser_details | usage | Regex → templates | ❌ Cannot automate |
| entry_points | N/A | No mapping | ❌ NOMAD-specific |
There was a problem hiding this comment.
On coverage of current schema:
taken from entrypoints metadata
| Field | In init.py | In Metadata Schema |
|---|---|---|
| Parser name | ✅ name='parsers/vasp' | ✅ parser_name |
| Filename pattern | ✅ mainfile_name_re | ✅ mainfile_name_re |
| Content pattern | ✅ mainfile_contents_re | ✅ mainfile_contents_re |
| MIME pattern | ✅ mainfile_mime_re | ✅ mainfile_mime_re |
| Binary header | ✅ mainfile_binary_header | ✅ mainfile_binary_header |
| Aliases | ✅ aliases=['parsers/vasp'] | ✅ parser_aliases |
| Compression | ✅ supported_compressions | ✅ compression_support |
| Level | ✅ level=0 | ✅ parser_level |
Gap: code_name and code_homepage exist in some __init__.py files but aren't extracted to metadata schema.
taken from Git project metadata
| Category | Examples |
|---|---|
| Package info | plugin_version, license, upstream_repository |
| People | authors[], maintainers[] with emails/affiliations |
| GitHub telemetry | stars, created, last_updated, archived |
| Deployment | on_central, on_pypi, pypi_package |
| Discovery | suggested_usages, subject tags, maturity |
| Datatractor | supported_filetypes (FileType IDs) |
| Provenance | Where/when/how metadata was generated |
Summary
This PR adds the first metadata extraction for
nomad-simulation-parsersusing thenomad-plugin-metadatapipeline.What to review
Please review
nomad_plugin_metadata.yaml.This is the canonical, merged file intended for querying/registry usage.
How files work
.metadata/nomad_plugin_metadata.auto.yamlMachine-generated metadata from package/plugin introspection.
.metadata/nomad_plugin_metadata.manual.yamlMaintainer-owned manual curation/overrides (not machine-overwritten).
nomad_plugin_metadata.yamlFinal merged output (auto + manual; manual non-empty values take precedence).
.metadata/plugin-metadata.override-report.yamlReport of conflicts where manual overrides auto.
Goal of this PR
Initial extraction pass for developer feedback before broader rollout:
References
nomad-plugins-metadata: https://github.com/FAIRmat-NFDI/nomad-plugins-metadata, https://fairmat-nfdi.github.io/nomad-plugins-metadata/reference/schema_reference.html#full-schema-shaped-yaml-templatedatatractor: https://github.com/datatractor