Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.
Authors:
- Marina Gorostiola González, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
- Olivier J.M. Béquignon, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
- Willem Jespers, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
While activity data is very abundant for some protein targets, there are still a number of underexplored proteins where the use of machine learning (ML) for activity prediction is very difficult due to the lack of data. This issue can be partially solved by leveraging similarities and differences between proteins. In this talktorial, we use proteochemometrics (PCM) modeling to enrich our activity models with protein data to predict the activity of novel compounds against the four adenosine receptor isoforms (A1, A2A, A2B, A3).
- Proteochemometrics (PCM) modeling
- Data preparation
- Papyrus dataset
- Molecule encoding: molecular descriptors
- Protein encoding: protein descriptors
- Machine learning principles: regression
- Data splitting methods
- Regression evaluation metrics
- ML algorithm: Random Forest
- Applications of PCM in drug discovery
- Download Papyrus dataset
- Data preparation
- Filter activity data for targets of interest
- Align target sequences
- Calculate protein descriptors
- Calculate compound descriptors
- Proteochemometrics modeling
- Helper functions
- Preprocessing
- Model training and validation
- Random split PCM model
- Random split QSAR models
- Leave one target out split PCM model
- Papyrus scripts GitHub
- Papyrus dataset preprint: ChemRvix (2021)
- Molecular descriptors (Modred): J. Cheminf., 10, (2018)
- Protein descriptors (ProDEC) GitHub
- Regression metrics (Scikit learn)
- XGBoost Documentation
- Proteochemometrics review: Drug Discov. (2019), 32, 89-98