Skip to content

Latest commit

 

History

History
378 lines (350 loc) · 16.6 KB

README.md

File metadata and controls

378 lines (350 loc) · 16.6 KB

Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints

Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Indication Prediction and have shown improvements over traditional screening methods.

An existing challenge is how to represent compound-target pairs in deep learning models. While several representation methods exist, such descriptor schemes tend to complement one another in many instances, as reported in the literature. In this project, we propose a multi-view architecture trained adversarially to leverage this complementary behavior for DTI prediction by integrating both differentiable and predefined molecular descriptors (fingerprints). Our results on empirical data demonstrate that our approach, generally, results in improved model accuracy.

This repository contains the accompanying codes and other ancillary files of the aforementioned study.

ivpgan

Requirements

Project/Module Version
Pytorch >=1.1.0
Numpy >=1.15
DeepChem >= 2.2.0
Padme See the PADME project
Pandas >= 0.25.0
Seaborn 0.9.0
Soek See the Soek project
torch-scatter >= 1.3.1
tqdm >= 4.x

Note: The dcCustom package of the PADME project has been refactored to have the new package name padme in this project and should not be misunderstood with any other module which may be bearing the same name. We took this decision in order to enhance clarity in our work by calling it the name given to it by its authors.

Usage

The bash files found here are used for model training and evaluation of the baseline and the IVPGAN models. The bash files with the padme_ prefix train the baseline models reflected in their name. For instance, padme_cold_drug_gconv_cv_kiba trains our implementation of the GraphConv-PSC model using k-fold Cross-Validation with a cold drug splitting scheme on the KIBA dataset. The IVPGAN models are trained using the bash files with the integrated_ prefix. They also follow the same naming pattern as the padme_ files.

The bash file with _eval_ in their names are used for evaluating a trained model. We use a resource tree structure to aggregate all training and evaluation statistics which are then saved as JSON files for later analysis. For more on the resource tree structure, you can examine sim_data.py and its usage in singleview.py and train_joint_gan.py. The performance data saved in a JSON file of each evaluated model is analysed using worker.py. The data that generates the reported results can be found here.

Results

Quantitative results

RMSE
Dataset CV split type ECFP8 GraphConv IVPGAN
Davis Warm 0.2216 ± 0.082 0.3537 ± 0.053 0.2014± 0.043
Cold drug 0.3978 ± 0.105 0.4751 ± 0.123 0.2895 ± 0.163
Cold target 0.5517 ± 0.088 0.5752 ± 0.101 0.2202± 0.139
Metz Warm 0.3321± 0.057 0.5537 ± 0.033 0.5529 ± 0.033
Cold drug 0.3778± 0.097 0.5711± 0.057 0.5477 ± 0.064
Cold target 0.6998 ± 0.065 0.7398 ± 0.047 0.5745 ± 0.054
KIBA Warm 0.4350 ± 0.086 0.5604 ± 0.120 0.4003 0.082
Cold drug 0.4502 ± 0.128 0.552 ± 0.156 0.4690 ± 0.132
Cold target 0.6645 ± 0.137 0.7555 ± 0.153 0.4486± 0.106
Concordance Index
Dataset CV split type ECFP8 GraphConv IVPGAN
Davis Warm 0.9647 ± 0.020 0.9335 ± 0.011 0.9729± 0.008
Cold drug 0.9099 ± 0.049 0.8784 ± 0.052 0.9493 ± 0.044
Cold target 0.8683 ± 0.033 0.8480 ± 0.038 0.9631± 0.036
Metz Warm 0.8983± 0.0.033 0.7968 ± 0.027 0.7913 ± 0.029
Cold drug 0.8730± 0.044 0.7850± 0.040 0.7894 ± 0.042
Cold target 0.7304 ± 0.039 0.7084 ± 0.041 0.7776 ± 0.038
KIBA Warm 0.8322 ± 0.024 0.7873 ± 0.029 0.8433 0.023
Cold drug 0.8132 ± 0.047 0.7736 ± 0.048 0.8070 ± 0.051
Cold target 0.7185 ± 0.044 0.6661 ± 0.052 0.8234± 0.044
R2
Dataset CV split type ECFP8 GraphConv IVPGAN
Davis Warm 0.9252 ± 0.061 0.8254 ± 0.039 0.9449± 0.021
Cold drug 0.7573 ± 0.171 0.6773 ± 0.159 0.8635 ± 0.151
Cold target 0.5916 ± 0.120 0.5423 ± 0.121 0.9059± 0.121
Metz Warm 0.8637± 0.057 0.6279 ± 0.075 0.6285 ± 0.078
Cold drug 0.8124± 0.117 0.5860± 0.120 0.6166 ± 0.120
Cold target 0.4259 ± 0.121 0.3619 ± 0.112 0.5931 ± 0.106
KIBA Warm 0.7212 ± 0.072 0.5513 ± 0.097 0.7658 0.065
Cold drug 0.6677 ± 0.137 0.5026 ± 0.152 0.6475 ± 0.142
Cold target 0.3648 ± 0.128 0.1910 ± 0.088 0.7056± 0.113

Qualitative results

  • First two charts are for ECFP-PSC
  • Second two charts are for GraphConv-PSC
  • Last two charts are for IVPGAN

Davis

Warm split

Cold drug split

Cold target split

Metz

Warm split

Cold drug split

Cold target split

KIBA

Warm split

Cold drug split

Cold target split

Credits

We would like to acknowledge the authors of the PADME project for their work. Our project uses the data, data loading, and metric procedures published by their work and we're grateful. We also acknowledge the authors and contributors of the DeepChem project for their implementations of the Graph Convolution, Weave, and other featurization schemes; the GraphConv and Weave implementations in this work are basically our Pytorch translations of their initial implementations.

Cite

@inproceedings{Agyemang2019,
author = {Agyemang, Brighter and Wei-Ping, Wu and Kpiebaareh, Michael Y. and Nanor, Ebenezer},
title = {Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints},
year = {2019}
}