Introduction

Why need to learn Machine Learning ?

Spread Sheets (Excel, CSV): store data that business needs → Human can analyse data to make business decision
Relational DB (MySQL): a better way to organize things → Human can analyse data to make business decision
Big Data (NoSQL): FB, Amazon, Twitter accumulating more and more data like "User actions, user purchasing history", where you can store un-structure data → need Machine Learning instead of Human to make business decision

(Back to top)

Terms

AI

Machine Learning

A subset of AI: ML uses Algorithms or Computer Programs to learn different patterns of data & then take those algorithms & what it learned to make prediction or classification on similar data.
The things hard to describe for computers to perform like
- How to ask Computers to classify Cat/Dog images, or Product Reviews

Difference between ML and Normal Algorithms

Normal Algorithm: a set of instructions on how to accomplish a task: start with given input + set of instructions → output
ML Algorithm : start with given input + given output → set of instructions between I/P and O/P

Types of ML Problems

Supervised: Data with Label
Unsupervised: Data without Label like CSV without Column Names
- Clustering: Machine decicdes clusters/groups
- Association Rule Learning: Associate different things to predict what customers might buy in the future
Reinforcement: teach Machine to try and error (with reward and penalty)

Deep Learning

Data Science

Data Analysis: analyse data to gain understanding of your data
Data Science : running experiments on set of data to figure actionable insights within it
- Example: to build ML Models

(Back to top)

Machine Learning Framework

Readings: (1) , (2)

Step 1: Problem Definition - Rephrase business problem as a machine learning problem

What problem are we trying to solve ?
- Supervised
- Un-supervised
- Classification
- Regression

Step 2: Data

What kind of Data we have ?

Step 3: Evaluation

What defines success for us ? knowing what metrics you should be paying attention to gives you an idea of how to evaluate your machine learning project.

Step 4: Features

What features does your data have and which can you use to build your model ? turning features → patterns
Three main types of features:
- Categorical features — One or the other(s)
  - For example, in our heart disease problem, the sex of the patient. Or for an online store, whether or not someone has made a purchase or not.
- Continuous (or numerical) features: A numerical value such as average heart rate or the number of times logged in.
- Derived features — Features you create from the data. Often referred to as feature engineering.
  - Feature engineering is how a subject matter expert takes their knowledge and encodes it into the data. You might combine the number of times logged in with timestamps to make a feature called time since last login. Or turn dates from numbers into “is a weekday (yes)” and “is a weekday (no)”.

Step 5: Models

Figure out right models for your problems

Step 6: Experimentation

How to improve or what can do better ?

Main Types of ML Problems

Supervised Learning:

(Input & Output) Data + Label → Classifications, Regressions

Un-Supervised Learning:

(Only Input) Data → Clustering

Transfer Learning:

(My problem similar to others) Leverage from Other ML Models

Reinforcement Learning:

Purnishing & Rewarding the ML Learning model by updating the scores of ML

Evaluation

Classification	Regression	Recommendation
Accuracy	Mean Absolute Error (MAE)	Precision at K
Precision	Mean Squared Error (MSE)
Recall	Root Mean Squared Error (RMSE)

(Back to top)

Features

Numerical Features
Categorical Features

(Back to top)

Modelling

Splitting Data

3 sets: Trainning, Validation (model hyperparameter tuning and experimentation evaluation) & Test Sets (model testing and comparison)

Modelling

Chosen models work for your problem → train the model
Goal: Minimise time between experiments
- Start small and add up complexity (use small parts of your training sets to start with)
- Choosing the less complicated models to start first

Tuning

Happens on Validation or Training Sets

Comparison

Measure Model Performance via Test Set
Advoid Overfitting & Underfitting

Overfitting

Great performance on the training data but poor performance on test data means your model doesn’t generalize well
Solution: Try simpler model or making sure your the test data is of the same style your model is training on

Underfitting

Poor performance on training data means the model hasn’t learned properly and is underfitting
Solution: Try a different model, improve the existing one through hyperparameter or collect more data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!