DDI-FP-Graph

Updated training pipelines for the paper Molecular Fingerprints Are a Simple Yet Effective Solution to the Drug–Drug Interaction Problem.

What's new?

Modern PyTorch Lightning workflows live under GPU/ with W&B integration and Bayesian sweeps, now using the unified lightning.pytorch API.
TPU-ready TensorFlow GNN pipeline under TPU/ for converting the dataset and running on modern TPUs, updated for TensorFlow 2.15 and TF-GNN 1.0.3.
Symmetric fingerprint fusion for the baseline models on both PyTorch and TPU stacks, combining union/intersection/exclusive fingerprints and post-encoder interactions that remain invariant to swapping the drug order.
Reproducible environments via the provided pyproject.toml and Dockerfile.

Quickstart

Install dependencies with Poetry (Python 3.10 through 3.12 are supported with the current TensorFlow stack):

poetry install

Train a graph model with PyTorch Lightning and log to Weights & Biases:

python -m GPU.train --config GPU/configs/graph.yaml --run-name dev-run

To use the Bayesian sweep configuration:

wandb sweep GPU/sweeps/graph_bayesian.yaml
wandb agent <entity/project>/<sweep_id>

You can also launch the sweep programmatically:

python GPU/sweeps/run_graph_sweep.py --entity <your-entity>

The sweep explores optimiser settings alongside the Morgan fingerprint radius and bit-length so the data pipeline stays in sync with the model hyperparameters.

To tune the fingerprint models, dedicated sweeps cover each gradient-boosting estimator:

# CatBoost search across depth, learning-rate, iterations, bagging temperature, and regularisation strength.
wandb sweep GBDT/sweeps/fp_catboost_bayesian.yaml

# LightGBM search for tree shape, learning-rate, sampling ratios, and L1/L2 penalties.
wandb sweep GBDT/sweeps/fp_lightgbm_bayesian.yaml

# XGBoost search over depth, shrinkage, sampling, and both L1/L2 regularisation.
wandb sweep GBDT/sweeps/fp_xgboost_bayesian.yaml

Each configuration keeps the fingerprint radius/bit-length coupled with the estimator-specific hyperparameters so Bayesian optimisation can explore compatible data/feature settings for the selected model (--model is fixed by the sweep command).

TPU workflow

Export the PyTorch Geometric dataset to NumPy archives compatible with TF-GNN:
```
python TPU/preprocess_to_npz.py --output-dir tf_dataset
```
Train the TF-GNN model (runs on CPU/GPU by default, pass --tpu to target a TPU):
```
python TPU/train_tf.py --dataset tf_dataset --model fp_graph --epochs 50 --batch-size 128 --tpu your-tpu-name
```
The trainer now validates that --batch-size is a multiple of 64, matching Google’s TPU performance guidelines; 128 is the default for balanced per-core workloads.

Use --model to mirror the PyTorch experiments exactly: fp_mlp (fingerprint MLP), graph (graph-only encoder), fp_graph (combined encoder), or ssiddi. All models share the same fusion modes, decoder widths, and metric suite as their Lightning counterparts, and additional knobs like --fusion, --final-concat, --gnn-layer, and --top-k match the PyTorch configuration options.
Run Bayesian optimisation to tune the TensorFlow hyperparameters with W&B sweeps:
```
python TPU/tune_tf.py --dataset tf_dataset --model fp_graph --wandb-project your-project --max-trials 40 --epochs 60
```
The CLI launches a W&B Bayesian sweep that samples the encoder width, depth, dropout, activations, attention heads, decoder size, optimiser learning rate, and the fingerprint radius/bit-length. Provide --raw-data-dir if you want the tuner to regenerate datasets for unseen fingerprint settings on the fly. Every trial logs metrics, artefacts, and the saved model to W&B; the best run is also exported locally under tpu_tuning/ by default.

Docker

Build and run the containerised environment:

docker build -t ddi-fp-graph .
docker run --gpus all -it --rm \
  -v $(pwd):/workspace ddi-fp-graph \
  --config GPU/configs/graph.yaml

The container entrypoint points to python -m GPU.train, so any additional CLI flags are appended to the docker run command.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
GBDT		GBDT
GPU		GPU
TPU		TPU
data		data
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pair_icl_tpu.py		pair_icl_tpu.py
pyproject.toml		pyproject.toml
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDI-FP-Graph

What's new?

Quickstart

TPU workflow

Docker

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DDI-FP-Graph

What's new?

Quickstart

TPU workflow

Docker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages