T032 · Compound activity: Proteochemometrics

Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.

Authors:

Marina Gorostiola González, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
Olivier J.M. Béquignon, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
Willem Jespers, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)

Aim of this talktorial

While activity data is very abundant for some protein targets, there are still a number of underexplored proteins where the use of machine learning (ML) for activity prediction is very difficult due to the lack of data. This issue can be partially solved by leveraging similarities and differences between proteins. In this talktorial, we use proteochemometrics (PCM) modeling to enrich our activity models with protein data to predict the activity of novel compounds against the four adenosine receptor isoforms (A1, A2A, A2B, A3).

Contents in Theory

Proteochemometrics (PCM) modeling
Data preparation
- Papyrus dataset
- Molecule encoding: molecular descriptors
- Protein encoding: protein descriptors
Machine learning principles: regression
- Data splitting methods
- Regression evaluation metrics
- ML algorithm: Random Forest
Applications of PCM in drug discovery

Contents in Practical

Download Papyrus dataset
Data preparation
- Filter activity data for targets of interest
- Align target sequences
- Calculate protein descriptors
- Calculate compound descriptors
Proteochemometrics modeling
- Helper functions
- Preprocessing
- Model training and validation
  - Random split PCM model
  - Random split QSAR models
  - Leave one target out split PCM model

References

Papyrus scripts GitHub
Papyrus dataset preprint: ChemRvix (2021)
Molecular descriptors (Modred): J. Cheminf., 10, (2018)
Protein descriptors (ProDEC) GitHub
Regression metrics (Scikit learn)
XGBoost Documentation
Proteochemometrics review: Drug Discov. (2019), 32, 89-98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

T032 · Compound activity: Proteochemometrics

Aim of this talktorial

Contents in Theory

Contents in Practical

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

T032 · Compound activity: Proteochemometrics

Aim of this talktorial

Contents in Theory

Contents in Practical

References