CASyM: Chemotype Annotation Through Synthesis Mapping

Summary

This repository contains the code for the publication CASyM: Chemotype Annotation Through Synthesis Mapping. The project introduces a synthesis based approach to annotating chemotypes for drug discovery projects, creating a chemotype graph based on major common intermediates found in the project synthesis data.

Requirements

This repository is built on conda and poetry environments - to install:

git clone https://github.com/aidd-msca/CASyM.git
cd CASyM

conda env create -f environment.yml
conda activate casym

poetry install

How to Use

Given a collection of synthesis data the package can be run via command line,

python casym/main.py -cn config

The code assumes that the config.yaml file is stored in experiments, though this can be updated. For further details on the settings available in the config file see section Config Files.

Data

The package assumes a tab-seperated file (.tsv) containing at least two columns titled "reactants" and "products", additional columns will be ignored unless stated in the config file. The data can additionally be filtered by time, yield and project if these are present in the reaction data and this is specified in the config file. Additional data can also be passed and used as attributed in the chemotypes graph, these columns can be specified in the additional data section

Config File

The config file contains the following information

reaction_file: File path to reaction data
targetmolecules: Settings and information regarding target molecules, otherwise null
- - targetmolecules_fp: File path to target molecules
- - smiles_col: Name of column containing smiles in targetmolecules_fp
- - project_col: Name of column containing project in targetmolecules_fp, if all compound in file are relevant use null
time_col: Name of column containing time information in targetmolecules_fp, if not used use null
project_col: Name of column containing project in reaction_file, if all compound in file are relevant use null
filter_time: Settings for filtering reaction data by time, otherwise null
- - min_time: Start date for reaction data used
- - max_time: End date for reaction data used
assign_compounds_settings:
- - maximum_similarity: Whether to assign target molecule to vhemotypes according to maximum similarity or minimizing score
chemotype_steps: Number of reaction steps to link major common intermediates as single chemotype
common_intermediate_min_connections: Minimum number of associated target molecules to consider common intermediate a major common intermediate
common_intermediate_max_connections: Maximum number of associated target molecules to consider common intermediate a major common intermediate, above this threshold the - common intermediate will always be considered major
projects: Project(s) to analyze
store_root: File path to store results, otherwise null
similarity_threshold: Minimum proportion of maximum common substructure to link major common intermediates under one chemotype
additional_data: Columns of data in reaction_file to store in synthesis graphs and chemotype graphs, otherwise null
filter_yield: Settings to filter reaction data by yield, otherwise null
- - yield_col: Name of column containing yield data in reaction_file
- - min_yield: Minimum % yield to consider reaction for processing
create_report: Whether to create markdown report summarizing run

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
case_study_one		case_study_one
casym		casym
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASyM: Chemotype Annotation Through Synthesis Mapping

Summary

Requirements

How to Use

Data

Config File

About

Uh oh!

Releases

Packages

Languages

License

aidd-msca/CASyM

Folders and files

Latest commit

History

Repository files navigation

CASyM: Chemotype Annotation Through Synthesis Mapping

Summary

Requirements

How to Use

Data

Config File

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages