ConforFormer

This repository contains the code used for the ConforFormer project. It combines:

a research fork of Uni-Mol with the model, task, loss, and data pipeline changes used in this work
data-processing pipelines for the conformer and isomer datasets used by the paper
lightweight 2D fingerprint baselines for the Uni-Mol MoleculeNet benchmark

Model checkpoints are published on Hugging Face. Dataset artifacts generated by the pipelines in this repository can be reproduced from the provided scripts.

Repository layout

Path	Purpose
`unimol/`	Uni-Mol fork with the ConforFormer model code, training tasks, losses, and inference utilities
`baselines/`	2D molecular fingerprint baselines based on CatBoost and XGBoost
`data_processing/`	Pipelines for reducing Uni-Mol data, filtering OpenMolecules conformers, generating isomer annotations, and building the contrastive benchmark
`analysis/`	Analysis scripts for enantiomer/optisomer labeling and benchmark evaluation
`example_scripts/`	Example shell scripts for training, fine-tuning, inference, and 2D baseline runs
`results/`	Checked-in baseline tuning runs and benchmark summaries

Quick start

2D baselines

The repository root includes a lightweight Python environment for the 2D baselines defined in baselines/. These scripts use:

CatBoost with OpenBabel fingerprints (FP2, FP3, FP4, MACCS)
XGBoost with RDKit Morgan fingerprints (ECFP4_1024 in the paper setup)

Install the baseline dependencies with uv:

uv sync

Run the OpenBabel CatBoost benchmark:

uv run python baselines/catboost_fp2_baseline.py \
  --data-root data_downloads/unimol/molecular_property_prediction \
  --output-dir results/catboost_fp2 \
  --feature-mode tanimoto \
  --fingerprint FP2 \
  --n-anchors 256

Run the RDKit ECFP4 XGBoost benchmark:

uv run python -m baselines.xgb_ecfp_baseline \
  --data-root data_downloads/unimol/molecular_property_prediction \
  --tasks all \
  --radius 2 \
  --fp-bits 1024 \
  --output-dir results/xgb_ecfp4_1024

Wrapper scripts are available in example_scripts/baselines/. See baselines/README.md for tuning workflows, reference configs, and output locations.

ConforFormer and Uni-Mol workflows

The model code lives under unimol/. Use the Uni-Mol-specific requirements and setup instructions in unimol/README.md for:

pretraining and fine-tuning
conformer embedding and inference
docking and pocket tasks

Example entry points are provided in:

Data pipelines

data_processing/ contains the pipelines used to reproduce the data assets referenced in the paper:

reduced Uni-Mol splits
OpenMolecules conformer filtering and grouping
isomer lookup generation and tailored Uni-Mol datasets
the contrastive benchmark generation workflow

Large pipelines are numbered in execution order inside their respective directories. See data_processing/README.md for dataset prerequisites.

Results and references

The repository includes checked-in summaries for the 2D baseline experiments, including:

fingerprint sweep comparisons in results/fingerprint_sweep/
tuned CatBoost runs in results/catboost_tuning_runs/
repeated-seed ECFP4 tuning runs for XGBoost and CatBoost in results/

These outputs are useful as references for reproducing the reported 2D baseline numbers without rerunning every sweep from scratch.

Requirements

Python >=3.11 for the root baseline environment
uv for dependency management at the repository root
OpenBabel for the OpenBabel-based fingerprint baselines
RDKit for ECFP/Morgan fingerprint baselines

For the Uni-Mol fork and its additional training requirements, refer to the documentation in unimol/.

License

Original contributions in this repository are released under the MIT License. See LICENSE. For project questions, contact e.a.pidko@tudelft.nl.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis		analysis
baselines		baselines
catboost_info		catboost_info
data_processing		data_processing
docs		docs
example_scripts		example_scripts
results		results
unimol		unimol
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
original_README.md		original_README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConforFormer

Repository layout

Quick start

2D baselines

ConforFormer and Uni-Mol workflows

Data pipelines

Results and references

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConforFormer

Repository layout

Quick start

2D baselines

ConforFormer and Uni-Mol workflows

Data pipelines

Results and references

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages