PepBridge for peptide-bridged, unified and structure-aware modeling of pMHC-TCR recognition
PepBridge is a peptide-bridged, pair-aware deep learning framework for unified modeling of the pMHC-TCR recognition cascade. PepBridge jointly models multiple related tasks and incorporates structure-aware supervision through residue-level contact and distance signals.
PepBridge supports:
- peptide-MHC binding prediction
- peptide-TCR binding prediction
- MHC-peptide-TCR binding prediction
- epitope immunogenicity prediction
- residue-level contact and distance map prediction for selected interfaces
This repository provides the model code, inference pipeline, and training utilities for applying PepBridge to pMHC-TCR related prediction tasks.
You can clone the repository with:
git clone https://github.com/aapupu/PepBridge.git
cd PepBridgeThe file esm_emb_HLAI.pkl is larger than 100 MB and may not be included in a normal GitHub clone.
You have two options:
If the repository tracks this file with Git LFS, install Git LFS first and then pull LFS files:
git lfs install
git clone https://github.com/aapupu/PepBridge.git
cd PepBridge
git lfs pullIf you do not use Git LFS, download esm_emb_HLAI.pkl separately from Zenodo and place it into the doc/ folder manually:
PepBridge/
└── doc/
└── esm_emb_HLAI.pkl
The package list below is a suggested template and can be adjusted later according to your local environment.
We recommend using Python 3.10 or Python 3.11 with CUDA-enabled PyTorch.
conda create -n pepbridge python=3.10 -y
conda activate pepbridgepip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2
pip install numpy==1.26.4 pandas==2.2.2 scikit-learn==1.4.2 scipy==1.13.1
pip install matplotlib==3.8 einops==0.8If your local setup requires additional packages, such as ESM or other model-specific dependencies, please install them separately.
PepBridge supports the following inference tasks:
mpptmptimmmp_contactpt_contact
It supports running one or multiple tasks in a single command.
- binding-related outputs are merged into one CSV
- contact predictions are saved as per-sample matrices
The inference script accepts flexible input column names and normalizes them automatically.
MHC,HLA→MHCpeptide,epitope→peptidecdr3,cdr3b→cdr3v_gene,trbv,bv,tcrbv→v_gene
For HLA-I related tasks, MHC will be automatically mapped to pseudo-MHC sequences using doc/pseudo_HLAI.csv.
MHCpeptide
MHCpeptide
peptidecdr3
MHCpeptidecdr3v_gene
MHCpeptide
peptidecdr3
MHC,peptide
HLA-A24:02,QLPRLFPLL
HLA-A02:01,LLFGYPVYVpeptide,cdr3
QLPRLFPLL,CASSLHHEQYF
LLFGYPVYV,CASRPGLMSAQPEQYFMHC,peptide,v_gene,cdr3
HLA-A24:02,QLPRLFPLL,TRBV7-9,CASSLHHEQYF
HLA-A02:01,LLFGYPVYV,TRBV5-1,CASRPGLMSAQPEQYFA single CSV can contain all required columns:
MHC,peptide,v_gene,cdr3
HLA-A24:02,QLPRLFPLL,TRBV7-9,CASSLHHEQYF
HLA-A02:01,LLFGYPVYV,TRBV5-1,CASRPGLMSAQPEQYFpython infer.py task=mp input_csv=example.csv out_dir=./resultspython infer.py task=mp,pt,mpt,mp_contact,pt_contact,imm input_csv=example.csv out_dir=./resultsBatch size for binding-related tasks:
python infer.py task=mp,pt,mpt,imm input_csv=example.csv batch_size=32Batch size for contact-related tasks:
python infer.py task=mp_contact,pt_contact input_csv=example.csv contact_batch_size=8Whether to save predicted distance matrices for contact tasks:
python infer.py task=mp_contact input_csv=example.csv save_dist=trueWhether to load LoRA adapters:
python infer.py task=mp input_csv=example.csv use_lora=trueCheckpoint path(s):
python infer.py task=mp input_csv=example.csv path=./doc/checkpoints_multi_lora_align3_lnor
python infer.py task=mp input_csv=example.csv paths=./ckpt_dir1,./ckpt_dir2Assume:
input_csv=example.csv
out_dir=./results
task=mp,pt,mpt,imm,mp_contact,pt_contactThen the outputs will look like:
results/
├── example_pred.csv
├── mp_contact/
│ ├── <pseudoMHC>_<peptide>_site.csv
│ ├── <pseudoMHC>_<peptide>_dist.csv
│ └── ...
└── pt_contact/
├── <peptide>_<cdr3>_site.csv
├── <peptide>_<cdr3>_dist.csv
└── ...
All binding-related tasks are merged into one CSV.
For example, running:
python infer.py task=mp,pt,mpt,imm input_csv=example.csv out_dir=./resultswill produce:
results/example_pred.csv
with added columns such as:
pred_mp_bindingpred_pt_bindingpred_mpt_bindingpred_immunogenicity
Only the requested tasks are added.
Each sample produces:
*_site.csv: predicted contact probability matrix*_dist.csv: predicted distance matrix, ifsave_dist=true
Rows correspond to pseudo-MHC residues.
Columns correspond to peptide residues.
Each sample produces:
*_site.csv*_dist.csv
Rows correspond to peptide residues.
Columns correspond to CDR3 residues.
All contact matrices are cropped to the true sequence length.
python infer.py task=mp input_csv=example.csv out_dir=./resultspython infer.py task=pt,pt_contact input_csv=example.csv out_dir=./results save_dist=truepython infer.py \
task=mp,pt,mpt,imm,mp_contact,pt_contact \
input_csv=example.csv \
out_dir=./results \
batch_size=32 \
contact_batch_size=8 \
save_dist=true PepBridge for peptide-bridged, unified and structure-aware modeling of pMHC-TCR recognition
Wenpu Lai, Yang Li, Oscar Junhong Luo
For issues, suggestions, or collaboration, please use the GitHub repository issue tracker or contact by e-mail: kyzy850520@163.com
