See course website for additional background: http://www.andrew.cmu.edu/user/lakoglu/courses/95828/index.htm
This course provides R as a basis for teaching (though other languages are accepted). To support this, we provide a cheatsheet introducing R for MLPS and providing a cheatsheet of commands, brief examples, and references necessary for each lecture.
To view the cheatsheets in your browser:
- Lecture 1 (Introduction to R, RStudio, Tidyverse)
- Lecture 2 (Visualization): (too large to open, download HTML file from GitHub and open with browser)
- Lecture 3 (Data Preparation)
- Lecture 4 (Linear Regression & Sparsity)
- Lecture 5 (Logistic & Non-Linear Regression)
- Lecture 6 (Model Selection)
- Lecture 7 (Model Evaluation)
- Lecture 8 (Tree-based Models)
- Lecture 9 (SVM & kernels)
- Lecture 10 (Instance Based Learning)
- Lecture 11 (Ensemble Learning)
- Lecture 12 (Clustering)
Machine Learning (ML) is centered around automated methods that improve their own performance through learning patterns in data, and then using the uncovered patterns to predict the future and make decisions. ML is heavily used in a wide variety of domains such as business, finance, healthcare, security, etc. for problems including display advertising, fraud detection, disease diagnosis and treatment, face/speech/handwriting/object recognition, automated navigation, to name a few. See this for an extended introduction.
"If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." -- Albert Einstein "A problem well put is half solved." -- John Dewey
This course aims to equip students with the practical knowledge and experience of recognizing and formulating machine learning problems in the wild, as well as of applying machine learning techniques effectively in practice. The emphasis will be on learning and practicing the machine learning process, involving the cycle of feature design, modeling, and scaling.
"All models are wrong, but some models are useful." -- George Box
As there exists "no free lunch", we will cover a wide range of different models and learning algorithms, which can be applied to a variety of problems and have varying speed-accuracy-scalability-interpretability tradeoffs. In particular, the topics include generalized linear models, decision trees, Bayesian networks, feature selection, ensemble methods, semi-supervised learning, density estimation, latent factor models, network-based classification, and sequence models. See the syllabus for more.
This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, and best practices used in machine learning. This course does not assume any prior exposure to machine learning theory or practice. Undergraduates need instructor's permission to enroll. PhD students can either enroll or by permission audit the course.