Overview

DeepDelta is a pairwise deep learning approach that processes two molecules simultaneously and learns to predict property differences between two molecules.

Figure 1: Traditional and Pairwise Architectures. (A) Traditional molecular machine learning models take singular molecular inputs and predict absolute properties of molecules. Predicted property differences can be calculated by subtracting predicted values for two molecules. (B) Pairwise models train on differences in properties from pairs of molecules to directly predict property changes of molecular derivatizations. (C) Molecules are cross-merged to create pairs only after cross-validation splits to prevent the risk of data leakage during model evaluation. Through this, every molecule in the dataset can only occur in pairs in the training or testing data but not both.

On 10 pharmacokinetic benchmark tasks, our DeepDelta approach outperforms two established molecular machine learning algorithms, the message passing neural network (MPNN) ChemProp and Random Forest using radial fingerprints.

We also derive three simple computational tests of our models based on first mathematical principles and show that compliance to these tests correlate with overall model performance – providing an innovative, unsupervised, and easily computable measure of expected model performance and applicability.

1. With same molecule for both inputs, predictions should be zero:

$$DeepDelta(x,x)= 0$$

2. With swapped input molecules, predictions should be inversed:

$$DeepDelta(x,y)= - DeepDelta(y,x)$$

3. Predicted difference between three molecules should be additive:

$$DeepDelta(x,y) + DeepDelta(y,z)= DeepDelta(x,z)$$

For more information, please refer to: https://chemrxiv.org/engage/chemrxiv/article-details/642d823f0784a63aee949898

If you use this data or code, please kindly cite: Fralish Z, Chen A, Skaluba P, Reker D. DeepDelta: Predicting Pharmacokinetic Improvements of Molecular Derivatives with Deep Learning. ChemRxiv. Cambridge: Cambridge Open Engage; 2023

Requirements

RDKit
scikit-learn
numpy
pandas

Comparison Models

Random Forest
ChemProp v1.5.2
LightGBM

Given the larger size of delta datasets, we recommend using a GPU for significantly faster training.

To use ChemProp with GPUs, you will need:

cuda >= 8.0
cuDNN

Descriptions of Folders

Code

Python code for evaluating DeepDelta and traditional models based on their ability to predict property differences between two molecules.

Datasets

Curated data for 10 ADMET property benchmarking training sets and 2 external test sets.

Results

Results from 5x10-fold cross-validation that are utilized in further analysis.

License

The copyrights of the software are owned by Duke University. As such, two licenses for this software are offered:

An open-source license under the GPLv2 license for non-commercial academic use.
A custom license with Duke University, for commercial use or uses without the GPLv2 license restrictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Overview

Requirements

Descriptions of Folders

Code

Datasets

Results

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Overview

Requirements

Descriptions of Folders

Code

Datasets

Results

License