Skip to content

Latest commit

 

History

History
57 lines (41 loc) · 2.9 KB

README.md

File metadata and controls

57 lines (41 loc) · 2.9 KB

T032 · Compound activity: Proteochemometrics

Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.

Authors:

  • Marina Gorostiola González, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
  • Olivier J.M. Béquignon, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)
  • Willem Jespers, 2022, Computational Drug Discovery, Drug Discovery & Safety Leiden University (The Netherlands)

Aim of this talktorial

While activity data is very abundant for some protein targets, there are still a number of underexplored proteins where the use of machine learning (ML) for activity prediction is very difficult due to the lack of data. This issue can be partially solved by leveraging similarities and differences between proteins. In this talktorial, we use proteochemometrics (PCM) modeling to enrich our activity models with protein data to predict the activity of novel compounds against the four adenosine receptor isoforms (A1, A2A, A2B, A3).

Contents in Theory

  • Proteochemometrics (PCM) modeling
  • Data preparation
    • Papyrus dataset
    • Molecule encoding: molecular descriptors
    • Protein encoding: protein descriptors
  • Machine learning principles: regression
    • Data splitting methods
    • Regression evaluation metrics
    • ML algorithm: Random Forest
  • Applications of PCM in drug discovery

Contents in Practical

  • Download Papyrus dataset
  • Data preparation
    • Filter activity data for targets of interest
    • Align target sequences
    • Calculate protein descriptors
    • Calculate compound descriptors
  • Proteochemometrics modeling
    • Helper functions
    • Preprocessing
    • Model training and validation
      • Random split PCM model
      • Random split QSAR models
      • Leave one target out split PCM model

References