PathoLin is an automated and dynamic lineage assignment and nomenclature framework for viral pathogens based on haplotype network topology. From continuously evolving mutations to stable lineage labels, PathoLin is designed as a practical workflow you can run end to end.
Figure. End-to-end PathoLin workflow from preprocessing to dynamic lineage naming.
cd /path/to/HaploNet-PathoLin
conda create --name patholin python=3.8
conda activate patholin
pip install -r requirements.txtThis part tracks how SARS-CoV-2 sequence updates flow into lineage division and dynamic naming.
- Data storage directory:
COVID-19/meta_and_mut - Metadata filename:
metadata.tsv - Mutation information filename:
Mutation.txt
bash COVID-19/meta_mut_preprocess.sh- Modify the loop timeframe as needed.
bash COVID-19/script_lihy_mcanbash_2021_to_2022.sh- Modify the loop timeframe as needed.
bash COVID-19/script_Division_2021_to_2024.sh- Entropy comparison: already included in
COVID-19/script_Division_2021_to_2024.sh - Cluster comprehensive distance D comparison:
bash COVID-19/ALL_distance.shThis part mirrors the same logic for Mpox, so you can keep one mental model across pathogens.
- Metadata filename:
Mpox/original_meta/meta.txt - Mutation information filename:
Mpox/original_mut/mutations.txt - Lineage information filename:
Mpox/ncbi_gisaid_mpxv/compare_re_clade.csv
bash Mpox/meta_mut_preprocess.sh- Modify the loop timeframe as needed.
bash Mpox/script_monkey_bash_2022_to_2024.sh- Modify the loop timeframe as needed.
bash Mpox/script_Division_2022_to_2024.sh- Entropy comparison: already included in
Mpox/script_Division_2022_to_2024.sh - Cluster comprehensive distance D comparison:
bash Mpox/ALL_distance.shUse the trained model for SARS-CoV-2 lineage identification:
cd HaploGRU
python predicter.pySupports single/multiple sequence lineage identification and mutation-file lineage identification. It is useful when you need quick lineage calls without running the full pipeline.
For other species, train with datasets generated after running dynamic lineage division:
pip install wandb
cd HaploGRU
python snp_lineage_dl.py
wandb sweep sweep.yamlNotes:
- In
HaploGRU/config.py, replacesequence_lineageandmutationswith your own dataset paths. - Parameter ranges in
HaploGRU/sweep.yamlcan be customized.
python find_family.py [Arg1] [Arg2] [Arg3]- Arg1: data type (
COVID-19orMpox) - Arg2: reference folder year
- Arg3: reference folder month
Then enter the lineage to query when prompted.
These helpers do not replace the validated workflow above; they are only for quick smoke testing.
python scripts/run_minimal_mpox_demo.py
python scripts/verify_minimal_outputs.py --mode minimal-mpox- Workflows: docs/en/workflows.md
- Data availability: docs/en/data-availability.md
Manuscript metadata will be updated after submission/acceptance.
lihongyu2025@ia.ac.cn
MIT License. See LICENSE.
