This project explores methods for automatically annotating dataset descriptions and scientific texts related to Molecular Dynamics (MD).
A dataset of about 280 annotated texts is available in the annotations folder.
These texts are build from the title and description of molecular dynamics simulation datasets scraped from Zenodo and Figshare.
These texts have been manually annotated with annotation rules.
Because Large Language Models (LLMs) are inherently non-deterministic, we aim to enforce structured and reproducible outputs using a strict Pydantic schema. Below is a Mermaid diagram that summarizes the schema used to capture detected entities:
classDiagram
class ListOfEntities {
entities: list[Molecule | SimulationTime | ForceFieldModel | Temperature | SoftwareName | SoftwareVersion]
}
class SoftwareVersion {
label: str = 'SOFTVERS'
text: str
}
class Temperature {
label: str = 'TEMP'
text: str
}
class SimulationTime {
label: str = 'STIME'
text: str
}
class Molecule {
label: str = 'MOL'
text: str
}
class SoftwareName {
label: str = 'SOFTNAME'
text: str
}
class ForceFieldModel {
label: str = 'FFM'
text: str
}
class Entity {
label: str
text: str
}
ListOfEntities ..> Molecule
ListOfEntities ..> SoftwareVersion
ListOfEntities ..> SimulationTime
ListOfEntities ..> Temperature
ListOfEntities ..> SoftwareName
ListOfEntities ..> ForceFieldModel
To assess robustness and accuracy, we benchmark several LLMs (GPT-5, Gemini 3 Pro, etc.) together with extraction libraries such as Instructor, LlamaIndex, and Pydantic. Our goal is to identify the best model–framework combinations for accurate, consistent, and schema-compliant Molecular Dynamics Named Entity Recognition (MDNER).
We use uv to manage dependencies and the project environment.
Clone the GitHub repository:
git clone git@github.com:MDverse/mdner_llm.git
cd mdner_llmSync dependencies:
uv syncCreate an .env file with a valid OpenAI and OpenRouter API key:
OPENAI_API_KEY=<your-openai-api-key>
OPENROUTER_API_KEY=<your-openrouter-api-key>Remark: This .env file is ignored by git.
Perform quality control on manually annotated entities:
$ uv run validate-annotations --annotations-dir data/annotations
2026-04-10 15:36:46 | INFO | Validating all annotations in directory: data/annotations.
2026-04-10 15:36:46 | INFO | Found 372 JSON files to validate.
2026-04-10 15:36:47 | INFO | Total text mismatches: 0
2026-04-10 15:36:47 | INFO | Total span mismatches: 0
2026-04-10 15:36:47 | INFO | Total overlapping entities: 0
2026-04-10 15:36:47 | INFO | Total removed entities: 0
2026-04-10 15:36:47 | INFO | Total entities with invalid boundaries: 0
2026-04-10 15:36:47 | INFO | Total unknown categories: 0
2026-04-10 15:36:47 | SUCCESS | Validation completed successfully!Make the inventory of all entities:
$ uv run build-entity-inventory --annotation-path data/groundtruth_paths.txt --out-path data/entities.tsv
2026-04-08 15:29:25 | INFO | Starting entity inventory.
2026-04-08 15:29:25 | INFO | Collecting entities.
2026-04-08 15:29:25 | INFO | Reading list of JSON files from data/groundtruth_paths.txt.
2026-04-08 15:29:25 | SUCCESS | Found 109 JSON files successfully.
2026-04-08 15:29:25 | SUCCESS | Collected 1708 entities
2026-04-08 15:29:25 | INFO | Writing entity inventory TSV file.
2026-04-08 15:29:25 | SUCCESS | Saved entity inventoryin: data/entities.tsv
2026-04-08 15:29:25 | SUCCESS | Entity inventory completed successfully!A list of entities per category can be found in notebooks/review/explore_entities_from_inventory.ipynb.
To extract structured entities from a single text using a specified LLM (from OpenRouter available models) and framework, run :
uv run extract-entities \
--text-path data/annotations/figshare_121241.json \
--model openai/gpt-5.2 \
--framework instructorThis command generates two outputs: a
.txtfile containing the raw LLM response, and a.jsonfile containing the extracted entities along with metadata about the extraction (model, framework, input file, and run details).
# TXT output example:
{
"entities": [
{
"label": "MOL",
"text": "Phosphatidylcholine"
},
{
"label": "MOL",
"text": "1,2-diauroyl-sn-glycero-3-phospocholine"
},
{
"label": "MOL",
"text": "DLPC"
},
{
"label": "MOL",
"text": "DMPC"
},
{
"label": "MOL",
"text": "DPPC"
},
{
"label": "FFM",
"text": "AMBER"
}
]
}
# JSON output example:
{
"timestamp": "2026-04-10T16:34:26.459776+00:00",
"json_path": "data/annotations/figshare_121241.json",
"text": "<text content of the dataset description>",
"url": "<url of the dataset description page>",
"model_name": "openai/gpt-5.2",
"framework_name": "none",
"prompt_path": "json_few_shot.txt",
"prompt_tag": "json",
"groundtruth": "<ground-truth entities string>",
"raw_llm_response": "<raw LLM response string>",
"llm_response": "<Parsed LLM response string>",
"inference_time_sec": 2.540970104979351,
"inference_cost_usd": 0.003835,
"response_file": "results/llm/annotations/figshare_121241_openai_gpt-5.2_none_2026-04-10_T16-34-26.txt"
}To extract structured entities from multiple dataset descriptions listed in results/100_selected_md_dataset_description_paths.txt, execute:
uv run extract-entities-all-texts \
--texts-path data/groundtruth_paths.txt \
--model openai/gpt-5.2 \
--framework instructorTo evaluate the quality of annotations produced by LLMs and different framework, run:
uv run evaluate-llm-and-framework \
--annotations-dir results/llm/annotations \
--results-dir results/llm/evaluation_statsThis command loads all LLM-generated JSON files in results/llm/annotations, computes per-annotation metrics against the ground-truth, and saves the results in results/llm/evaluation_stats. It generates an Excel file with overall metrics for each entity class, and a parquet file with detailed annotation results for each test sample and each label.
To train the Gliner2 model on the Molecular Dynamics annotations, run:
uv run train-gliner --config-path src/mdner_llm/gliner/training_config.yamlThis command trains the Gliner2 model using the configuration specified in src/mdner_llm/gliner/training_config.yaml and save the trained model with the best validation performance.
To evaluate the performance of the trained Gliner2 model or on a test set of annotations, run:
uv run evaluate-gliner \
--model-name gliner2_base_large \
--model-path fastino/gliner2-large-v1 \
--test-dataset data/gliner/test.jsonl \
--test-metadata-path data/gliner/test_metadata.txtThis command evaluates the specified Gliner2 model on the test dataset provided in
data/gliner/test.jsonl. It computes evaluation metrics such as precision, recall, and F1-score for each entity class. It saves the annotation results (per test sample and per labels) into a parquet file, and generates a summary Excel file with overall metrics for each entity class.