Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Indication Prediction and have shown improvements over traditional screening methods.
An existing challenge is how to represent compound-target pairs in deep learning models. While several representation methods exist, such descriptor schemes tend to complement one another in many instances, as reported in the literature. In this project, we propose a multi-view architecture trained adversarially to leverage this complementary behavior for DTI prediction by integrating both differentiable and predefined molecular descriptors (fingerprints). Our results on empirical data demonstrate that our approach, generally, results in improved model accuracy.
This repository contains the accompanying codes and other ancillary files of the aforementioned study.
Project/Module | Version |
---|---|
Pytorch | >=1.1.0 |
Numpy | >=1.15 |
DeepChem | >= 2.2.0 |
Padme | See the PADME project |
Pandas | >= 0.25.0 |
Seaborn | 0.9.0 |
Soek | See the Soek project |
torch-scatter | >= 1.3.1 |
tqdm | >= 4.x |
Note: The
dcCustom
package of the PADME project has been refactored to have the new package namepadme
in this project and should not be misunderstood with any other module which may be bearing the same name. We took this decision in order to enhance clarity in our work by calling it the name given to it by its authors.
The bash files found here are used for model training and evaluation of the baseline and the IVPGAN models.
The bash files with the padme_
prefix train the baseline models reflected in their name.
For instance, padme_cold_drug_gconv_cv_kiba
trains our implementation of the GraphConv-PSC
model using k-fold
Cross-Validation with a cold drug splitting scheme on the KIBA dataset. The IVPGAN models are trained using
the bash files with the integrated_
prefix. They also follow the same naming pattern as the padme_
files.
The bash file with _eval_
in their names are used for evaluating a trained model. We use a resource tree
structure to aggregate all training and evaluation statistics which are then saved
as JSON files for later analysis. For more on the resource tree structure, you can examine
sim_data.py and its usage in singleview.py and
train_joint_gan.py. The performance data saved in a JSON file of
each evaluated model is analysed using worker.py. The data that generates the reported results can be found here.
RMSE | ||||
---|---|---|---|---|
Dataset | CV split type | ECFP8 | GraphConv | IVPGAN |
Davis | Warm | 0.2216 ± 0.082 | 0.3537 ± 0.053 | 0.2014± 0.043 |
Cold drug | 0.3978 ± 0.105 | 0.4751 ± 0.123 | 0.2895 ± 0.163 | |
Cold target | 0.5517 ± 0.088 | 0.5752 ± 0.101 | 0.2202± 0.139 | |
Metz | Warm | 0.3321± 0.057 | 0.5537 ± 0.033 | 0.5529 ± 0.033 |
Cold drug | 0.3778± 0.097 | 0.5711± 0.057 | 0.5477 ± 0.064 | |
Cold target | 0.6998 ± 0.065 | 0.7398 ± 0.047 | 0.5745 ± 0.054 | |
KIBA | Warm | 0.4350 ± 0.086 | 0.5604 ± 0.120 | 0.4003 0.082 |
Cold drug | 0.4502 ± 0.128 | 0.552 ± 0.156 | 0.4690 ± 0.132 | |
Cold target | 0.6645 ± 0.137 | 0.7555 ± 0.153 | 0.4486± 0.106 |
Concordance Index | ||||
---|---|---|---|---|
Dataset | CV split type | ECFP8 | GraphConv | IVPGAN |
Davis | Warm | 0.9647 ± 0.020 | 0.9335 ± 0.011 | 0.9729± 0.008 |
Cold drug | 0.9099 ± 0.049 | 0.8784 ± 0.052 | 0.9493 ± 0.044 | |
Cold target | 0.8683 ± 0.033 | 0.8480 ± 0.038 | 0.9631± 0.036 | |
Metz | Warm | 0.8983± 0.0.033 | 0.7968 ± 0.027 | 0.7913 ± 0.029 |
Cold drug | 0.8730± 0.044 | 0.7850± 0.040 | 0.7894 ± 0.042 | |
Cold target | 0.7304 ± 0.039 | 0.7084 ± 0.041 | 0.7776 ± 0.038 | |
KIBA | Warm | 0.8322 ± 0.024 | 0.7873 ± 0.029 | 0.8433 0.023 |
Cold drug | 0.8132 ± 0.047 | 0.7736 ± 0.048 | 0.8070 ± 0.051 | |
Cold target | 0.7185 ± 0.044 | 0.6661 ± 0.052 | 0.8234± 0.044 |
R2 | ||||
---|---|---|---|---|
Dataset | CV split type | ECFP8 | GraphConv | IVPGAN |
Davis | Warm | 0.9252 ± 0.061 | 0.8254 ± 0.039 | 0.9449± 0.021 |
Cold drug | 0.7573 ± 0.171 | 0.6773 ± 0.159 | 0.8635 ± 0.151 | |
Cold target | 0.5916 ± 0.120 | 0.5423 ± 0.121 | 0.9059± 0.121 | |
Metz | Warm | 0.8637± 0.057 | 0.6279 ± 0.075 | 0.6285 ± 0.078 |
Cold drug | 0.8124± 0.117 | 0.5860± 0.120 | 0.6166 ± 0.120 | |
Cold target | 0.4259 ± 0.121 | 0.3619 ± 0.112 | 0.5931 ± 0.106 | |
KIBA | Warm | 0.7212 ± 0.072 | 0.5513 ± 0.097 | 0.7658 0.065 |
Cold drug | 0.6677 ± 0.137 | 0.5026 ± 0.152 | 0.6475 ± 0.142 | |
Cold target | 0.3648 ± 0.128 | 0.1910 ± 0.088 | 0.7056± 0.113 |
- First two charts are for ECFP-PSC
- Second two charts are for GraphConv-PSC
- Last two charts are for IVPGAN
We would like to acknowledge the authors of the PADME project for their work. Our project uses the data, data loading, and metric procedures published by their work and we're grateful. We also acknowledge the authors and contributors of the DeepChem project for their implementations of the Graph Convolution, Weave, and other featurization schemes; the GraphConv
and Weave
implementations in this work are basically our Pytorch translations of their initial implementations.
@inproceedings{Agyemang2019,
author = {Agyemang, Brighter and Wei-Ping, Wu and Kpiebaareh, Michael Y. and Nanor, Ebenezer},
title = {Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints},
year = {2019}
}