In this project, we predict drug-target interactions from a heterogeneous network with various types of nodes and edges by employing a novel node embedding method called Edge2vec and by the use of an SVM classifier for predicting all of the pairs of drugs and proteins.
The initial step for these problems is data gathering. Regarding the validation of DTINet dataset has consisted of several drug-related databases, I think it would seem to have more precise results. I used these files for my project:
drug.txt
: list of drug namesprotein.txt
: list of protein namesdisease.txt
: list of protein namesse.txt
: list of side effect namesmat_drug_se.txt
: Drug-SideEffect association matrixmat_protein_protein.txt
: Protein-Protein interaction matrixmat_drug_protein.txt
: Drug_Protein interaction matrixmat_drug_drug.txt
: Drug-Drug interaction matrixmat_protein_disease.txt
: Protein-Disease association matrixmat_drug_disease.txt
: Drug-Disease association matrix
After this part, it turned to build our heterogeneous network. We made a CSV file containing all of the edges existing in this graph. Also, we devoted a number to each row of this file to indicate the type of the edges. Another column containing the number of each row is defined. All of these changes made our heterogeneous network. At this level, we applied these variations on the main_db.csv file.
In this step, we shoud employ a node embedding learning method to describe our network in a lower dimension without missing important information. In this case, due to the type of our graph which was heterogeneous, we had some choices. In some cases, we were familiar with a novel embedding method was named "Edge2vec" which the innovation of this rather than to the other algorithms was cosideration of relations' types between nodes. We applied this procedure to our network with these hyperparameters:
[1] Here is the link of our paper : https://ieeexplore.ieee.org/abstract/document/9066013/