Skip to content

AceLi12138/HaploNet-PathoLin

Repository files navigation

PathoLin

PathoLin is an automated and dynamic lineage assignment and nomenclature framework for viral pathogens based on haplotype network topology. From continuously evolving mutations to stable lineage labels, PathoLin is designed as a practical workflow you can run end to end.

Language: English | 中文

Workflow Overview

PathoLin workflow

Figure. End-to-end PathoLin workflow from preprocessing to dynamic lineage naming.

Usage Instructions

Environment Configuration

cd /path/to/HaploNet-PathoLin
conda create --name patholin python=3.8
conda activate patholin
pip install -r requirements.txt

COVID-19 Data

This part tracks how SARS-CoV-2 sequence updates flow into lineage division and dynamic naming.

Data Preprocessing

  • Data storage directory: COVID-19/meta_and_mut
  • Metadata filename: metadata.tsv
  • Mutation information filename: Mutation.txt
bash COVID-19/meta_mut_preprocess.sh

Lineage Division

  • Modify the loop timeframe as needed.
bash COVID-19/script_lihy_mcanbash_2021_to_2022.sh

Lineage Dynamic Naming

  • Modify the loop timeframe as needed.
bash COVID-19/script_Division_2021_to_2024.sh

Result Comparison

  • Entropy comparison: already included in COVID-19/script_Division_2021_to_2024.sh
  • Cluster comprehensive distance D comparison:
bash COVID-19/ALL_distance.sh

Mpox Data

This part mirrors the same logic for Mpox, so you can keep one mental model across pathogens.

Data Preprocessing

  • Metadata filename: Mpox/original_meta/meta.txt
  • Mutation information filename: Mpox/original_mut/mutations.txt
  • Lineage information filename: Mpox/ncbi_gisaid_mpxv/compare_re_clade.csv
bash Mpox/meta_mut_preprocess.sh

Data Preprocessing and Lineage Division

  • Modify the loop timeframe as needed.
bash Mpox/script_monkey_bash_2022_to_2024.sh

Lineage Dynamic Naming

  • Modify the loop timeframe as needed.
bash Mpox/script_Division_2022_to_2024.sh

Result Comparison

  • Entropy comparison: already included in Mpox/script_Division_2022_to_2024.sh
  • Cluster comprehensive distance D comparison:
bash Mpox/ALL_distance.sh

Other Usage

Rapid Lineage Identification (PathoLin-GRU)

Use the trained model for SARS-CoV-2 lineage identification:

cd HaploGRU
python predicter.py

Supports single/multiple sequence lineage identification and mutation-file lineage identification. It is useful when you need quick lineage calls without running the full pipeline.

For other species, train with datasets generated after running dynamic lineage division:

pip install wandb
cd HaploGRU
python snp_lineage_dl.py
wandb sweep sweep.yaml

Notes:

  • In HaploGRU/config.py, replace sequence_lineage and mutations with your own dataset paths.
  • Parameter ranges in HaploGRU/sweep.yaml can be customized.

Family Query

python find_family.py [Arg1] [Arg2] [Arg3]
  • Arg1: data type (COVID-19 or Mpox)
  • Arg2: reference folder year
  • Arg3: reference folder month

Then enter the lineage to query when prompted.

Optional Minimal Demo and Validation

These helpers do not replace the validated workflow above; they are only for quick smoke testing.

python scripts/run_minimal_mpox_demo.py
python scripts/verify_minimal_outputs.py --mode minimal-mpox

Detailed Docs

Citation

Manuscript metadata will be updated after submission/acceptance.

Contact

  • lihongyu2025@ia.ac.cn

License

MIT License. See LICENSE.

About

PathoLin is an automated, dynamic lineage assignment and nomenclature framework for viral pathogens based on haplotype network topology, with PathoLin-GRU for rapid lineage inference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors