QuickCoy: A fast way of generating unbiased decoy molecules for better virtual screening using deep learning
A key step in training a virtual screening model is to prepare a dataset consisting of actives and decoys. There are already several datasets for training such models, but most of the time, we are actually lacking a method which can generate unbiased decoys for a new set of actives quickly. Besides, more and more studies have proved that previously released virtual screening datasets have hidden bias and the bias in datasets leads to overestimating the performance of deep learning in virtual screening. Two key tasks 1. Generating unbiased decoy: property-matched, etc. maybe need more studies. 2. Get 3D binding pose: Smina or molecular docking by deep learning: Generating 3D binding pose on a given molecule and a protein binding site.
Our experimental results show that our decoy can enhance the generalization ability of structure-based deep learning virtual screening models and reduce the problem of overestimation.
Utilizing a bidirectional77 Gated Recurrent Unit (GRU) with a linear output layer as an encoder. The decoder is a 3-layer GRU RNN of 512 hidden dimensions with intermediate dropout layers with dropout probability 0.2. Training is done with a batch size of 128, utilizing a gradient clipping of 50, KL-term weight of 1, and optimized with Adam with a learning rate of 0.0003 for 50 epochs. A trained CVAE model which can generate molecules conditioned on 27 properties.
Training QuickCoy
./inverse_design.py --n_batch 128 --checkpoint_dir /home/zdx/project/decoy_generation/result/REAL_discrete_13 --train_load /home/zdx/data/REAL/REAL_deepcoy_properties.csv --pro_cols carbon_n nitrogen_n oxygen_n fluorine_n sulfur_n chlorine_n bromine_n HBA HBD rings stereo_centers aromatic_rings NRB
DeepChem dataloader
DeepChem Dataset
DGL-LifeSci Dataset
MUV data generator
LIT-PCBA Paper
LIT-PCBADownload
Make sure molecules in triaining set, 27 properties should have values.
- InteractionGraphNet
- TocoDecoy paper
- TocoDecoy GitHub
- DrugSpaceX
- SELFIES
- Chemistry and Biology databases
- REAL COMPOUND LIBRARIES
- rdkit.Chem.Lipinski module
- VAE training trick
- MOSES VAE
- Automatic Cheical Design Using a Data-Driven Continuous Representation of Molecules
- Koes data
- A feature transferring workflow between data-poor compounds in various tasks
- AUTOQSAR/DEEPCHEM
- AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings
- AutoDock Vina Document