In this course we will introduce the basic ideas and algorithms of supervised learning and we will implement them using R programming language. A brief theoretical overview of the so-called learning setting will be provided, then the main focus will be on showing practical analysis and modelling of data related to healthcare.
-
To understand concepts of machine learning for healthcare and compare and test a range of techniques.
-
To classify features of data sources, analysing and interpreting the outputs of machine learning techniques in the context of practical solutions in the area of healthcare.
What is machine learning? Types of machine learning. Classification and regression. Training and test sets. Model evaluation. Over-fitting. Overview of Machine Learning Algorithms. No free lunch theorem. Cross validation.
Simple and multivariate linear regression. Polynomial regression. Parameter estimates. Residual analysis. Metrics for model evaluation. Plots and predictions. Feature selection.
Data analysis and pre-processing, exploratory data analysis, handling missing data.
Feature engineering techniques including but not limited to: transformations, feature extraction, reduction and selection.
Logistic Regression: why logistic regression; logistic function; simple logistic regression; multinomial logistic regression (tentative); ROC curve; feature interpretation; predictions using logistic regression.
Decision Trees: classification using decision trees; understanding and visualising decision trees; advantages and disadvantages of decision trees; predictions. Random Forests: from decisions trees to random forests; training and tuning random forests; predictions.
Using decision trees and random forests for regression. Variable importance.
Regularisation and over-fitting. Ridge penalty and LASSO penalty. Elastic Nets. Tuning regularised models.