CycDiff-DPO

CycDiff-DPO uses preference-aligned diffusion to design target-specific macrocyclic peptides with enhanced membrane permeability, balancing binding competence and cell permeability in one framework.

Setup

Environment

conda env create -f env.yaml
conda activate CycDiff_DPO

Note: This environment uses Python 3.9.19.

Data Download

Download datasets and pre-trained models from Zenodo:

# Download checkpoints
wget https://zenodo.org/records/19429073/files/ckpts.tar.gz?download=1 -O ./ckpts.tar.gz
tar -xzf ./ckpts.tar.gz && rm ./ckpts.tar.gz

# Download train_valid dataset (for training DPO)
wget https://zenodo.org/records/19429955/files/train_valid.tar.gz?download=1 -O ./datasets/train_valid.tar.gz
tar -xzf ./datasets/train_valid.tar.gz -C ./datasets/ && rm ./datasets/train_valid.tar.gz

# Download LNR_CPSea dataset
wget https://zenodo.org/records/19429073/files/LNR_CPSea.tar.gz?download=1 -O ./datasets/LNR_CPSea.tar.gz
tar -xzf ./datasets/LNR_CPSea.tar.gz -C ./datasets/ && rm ./datasets/LNR_CPSea.tar.gz

# Download SciBERT model
wget https://zenodo.org/records/19429073/files/scibert_model.tar.gz?download=1 -O ./scibert_model.tar.gz
tar -xzf ./scibert_model.tar.gz && rm ./scibert_model.tar.gz

Pre-trained Weights

The following weights are included in this repository:

File	Description
`./ckpts/base_model.ckpt`	Base model from CP-Composer
`./ckpts/autoencoder.pth`	Pre-trained full-atom autoencoder (from PepGLAD)
`./ckpts/dpo/epoch44_step513090.ckpt`	DPO-fine-tuned model (final checkpoint)
`./ckpts/xgboost_ensemble/`	XGBoost ensemble for membrane permeability prediction
`./datasets/train_valid/generated_pairs.pkl`	Pre-generated DPO preference pairs

Quick Start

The full pipeline consists of three steps. Default: 5 samples per target on the LNR_CPSea test set.

Step 1: Generation

conda activate CycDiff_DPO
GPU=0 bash scripts/inference_forw.sh

Step 2: Filter

bash scripts/filter_success.sh ./results/LNR_CPSea/condition2_w5_5samples/results.jsonl

Step 3: Postprocessing

INPUT_DIR=./results/LNR_CPSea/condition2_w5_5samples/candidates
OUTPUT_DIR=./results/LNR_CPSea/condition2_w5_5samples/relaxed
NUM_CORES=10 bash scripts/batch_relax_good_results.sh

Training from Scratch

DPO training fine-tunes the pre-trained LDM to align with membrane permeability preferences. We provide the pre-generated preference pairs and the trained DPO model. To train from scratch:

conda activate CycDiff_DPO
GPU=0 bash scripts/train.sh

Preference Pairs Construction

Preference pairs are used to train the DPO model. We provide pre-generated pairs at ./datasets/train_valid/generated_pairs.pkl.

To regenerate pairs with a custom permeability predictor, run:

bash scripts/run_build_pairs_xgboost.sh

This requires the training PDB structures. Download and place them in:

./datasets/train_valid/pdbs/   # Reference PDB structures

Permeability Predictor Training

We provide the trained XGBoost ensemble at ./ckpts/xgboost_ensemble/, which includes:

model_*.pkl — 10 individual XGBoost models
scaler.pkl — feature scaler
extractor.pkl — ECFP + descriptor feature extractor
config.json — ensemble configuration

To retrain from scratch using Caco-2 permeability data at ./datasets/caco2/caco2_dedup.csv:

bash scripts/train_xgb.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs/pepbench		configs/pepbench
data		data
datasets		datasets
evaluate_utils		evaluate_utils
models		models
relaxer		relaxer
scripts		scripts
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
generate.py		generate.py
model.png		model.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CycDiff-DPO

Table of Contents

Setup

Environment

Data Download

Pre-trained Weights

Quick Start

Step 1: Generation

Step 2: Filter

Step 3: Postprocessing

Training from Scratch

Preference Pairs Construction

Permeability Predictor Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CycDiff-DPO

Table of Contents

Setup

Environment

Data Download

Pre-trained Weights

Quick Start

Step 1: Generation

Step 2: Filter

Step 3: Postprocessing

Training from Scratch

Preference Pairs Construction

Permeability Predictor Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages