Run in Google Colab | Open with nbviewer |
Machine Learning for prediction of the RMSD (Root Mean Square Deviation) of a decoy set using Physicochemical Properties of Protein Tertiary Structure Data Set
The workflow followed here is adapted from the book Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Geron.
Scope of this work : To layout the Machine Learning workflow from start to end.
Target Audience : This piece of work is intended for someone with minimal knowledge in machine learning practices. This is not a tutorial for an absolute beginner.
Requirements : ScikitLearn, Numpy, Matplotlib and Pandas (see attached requirements.txt
file for the complete list)
The dataset is downloaded from this link Physicochemical Properties of Protein Tertiary Structure Data Set.
This notebook is divided into two sections:
- Data preprocessing - Explore the dataset and prepare it before regression
- Model selection, training and fine tuning - Try with various models, compare their performance and fine tune them