Skip to content

A conditional molecule generation model based CVAE

Notifications You must be signed in to change notification settings

jooewood/QuickCoy

Repository files navigation

QuickCoy: A fast way of generating unbiased decoy molecules for better virtual screening using deep learning

Overview

A key step in training a virtual screening model is to prepare a dataset consisting of actives and decoys. There are already several datasets for training such models, but most of the time, we are actually lacking a method which can generate unbiased decoys for a new set of actives quickly. Besides, more and more studies have proved that previously released virtual screening datasets have hidden bias and the bias in datasets leads to overestimating the performance of deep learning in virtual screening. Two key tasks 1. Generating unbiased decoy: property-matched, etc. maybe need more studies. 2. Get 3D binding pose: Smina or molecular docking by deep learning: Generating 3D binding pose on a given molecule and a protein binding site.

Conclusion

Our experimental results show that our decoy can enhance the generalization ability of structure-based deep learning virtual screening models and reduce the problem of overestimation. alt text

Model

Utilizing a bidirectional77 Gated Recurrent Unit (GRU) with a linear output layer as an encoder. The decoder is a 3-layer GRU RNN of 512 hidden dimensions with intermediate dropout layers with dropout probability 0.2. Training is done with a batch size of 128, utilizing a gradient clipping of 50, KL-term weight of 1, and optimized with Adam with a learning rate of 0.0003 for 50 epochs. A trained CVAE model which can generate molecules conditioned on 27 properties.

Usage

Training QuickCoy

./inverse_design.py --n_batch 128 --checkpoint_dir /home/zdx/project/decoy_generation/result/REAL_discrete_13 --train_load /home/zdx/data/REAL/REAL_deepcoy_properties.csv --pro_cols carbon_n nitrogen_n oxygen_n fluorine_n sulfur_n chlorine_n bromine_n HBA HBD rings stereo_centers aromatic_rings NRB

Data

DeepChem dataloader
DeepChem Dataset
DGL-LifeSci Dataset
MUV data generator
LIT-PCBA Paper
LIT-PCBADownload
Make sure molecules in triaining set, 27 properties should have values.

Resource

  1. InteractionGraphNet
  2. TocoDecoy paper
  3. TocoDecoy GitHub
  4. DrugSpaceX
  5. SELFIES
  6. Chemistry and Biology databases
  7. REAL COMPOUND LIBRARIES
  8. rdkit.Chem.Lipinski module
  9. VAE training trick
  10. MOSES VAE
  11. Automatic Cheical Design Using a Data-Driven Continuous Representation of Molecules
  12. Koes data
  13. A feature transferring workflow between data-poor compounds in various tasks
  14. AUTOQSAR/DEEPCHEM
  15. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings
  16. AutoDock Vina Document

About

A conditional molecule generation model based CVAE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages