Skip to content

Latest commit

 

History

History
146 lines (117 loc) · 7.31 KB

P00_Introduction.md

File metadata and controls

146 lines (117 loc) · 7.31 KB

Introduction

Table of contents

Why need to learn Machine Learning ?

  • Spread Sheets (Excel, CSV): store data that business needs → Human can analyse data to make business decision
  • Relational DB (MySQL): a better way to organize things → Human can analyse data to make business decision
  • Big Data (NoSQL): FB, Amazon, Twitter accumulating more and more data like "User actions, user purchasing history", where you can store un-structure data → need Machine Learning instead of Human to make business decision

(Back to top)

Terms

AI

Machine Learning

  • A subset of AI: ML uses Algorithms or Computer Programs to learn different patterns of data & then take those algorithms & what it learned to make prediction or classification on similar data.
  • The things hard to describe for computers to perform like
    • How to ask Computers to classify Cat/Dog images, or Product Reviews

Difference between ML and Normal Algorithms

  • Normal Algorithm: a set of instructions on how to accomplish a task: start with given input + set of instructions → output
  • ML Algorithm : start with given input + given output → set of instructions between I/P and O/P

Types of ML Problems

  • Supervised: Data with Label
  • Unsupervised: Data without Label like CSV without Column Names
    • Clustering: Machine decicdes clusters/groups
    • Association Rule Learning: Associate different things to predict what customers might buy in the future
  • Reinforcement: teach Machine to try and error (with reward and penalty)

Deep Learning

Data Science

  • Data Analysis: analyse data to gain understanding of your data
  • Data Science : running experiments on set of data to figure actionable insights within it
    • Example: to build ML Models

(Back to top)

Machine Learning Framework

Screenshot 2021-03-05 at 7 00 17 AM

Step 1: Problem Definition - Rephrase business problem as a machine learning problem

  • What problem are we trying to solve ?
    • Supervised
    • Un-supervised
    • Classification
    • Regression

Step 2: Data

  • What kind of Data we have ?

Step 3: Evaluation

  • What defines success for us ? knowing what metrics you should be paying attention to gives you an idea of how to evaluate your machine learning project.

Step 4: Features

  • What features does your data have and which can you use to build your model ? turning features → patterns
  • Three main types of features:
    • Categorical features — One or the other(s)
      • For example, in our heart disease problem, the sex of the patient. Or for an online store, whether or not someone has made a purchase or not.
    • Continuous (or numerical) features: A numerical value such as average heart rate or the number of times logged in.
    • Derived features — Features you create from the data. Often referred to as feature engineering.
      • Feature engineering is how a subject matter expert takes their knowledge and encodes it into the data. You might combine the number of times logged in with timestamps to make a feature called time since last login. Or turn dates from numbers into “is a weekday (yes)” and “is a weekday (no)”.

Step 5: Models

  • Figure out right models for your problems

Step 6: Experimentation

  • How to improve or what can do better ?

Main Types of ML Problems

Screenshot 2021-03-09 at 8 23 37 AM

Supervised Learning:

  • (Input & Output) Data + Label → Classifications, Regressions

Un-Supervised Learning:

  • (Only Input) Data → Clustering

Transfer Learning:

  • (My problem similar to others) Leverage from Other ML Models

Reinforcement Learning:

  • Purnishing & Rewarding the ML Learning model by updating the scores of ML

Evaluation

Classification Regression Recommendation
Accuracy Mean Absolute Error (MAE) Precision at K
Precision Mean Squared Error (MSE)
Recall Root Mean Squared Error (RMSE)

(Back to top)

Features

  • Numerical Features
  • Categorical Features

(Back to top)

Modelling

Splitting Data

  • 3 sets: Trainning, Validation (model hyperparameter tuning and experimentation evaluation) & Test Sets (model testing and comparison)

Modelling

  • Chosen models work for your problem → train the model
  • Goal: Minimise time between experiments
    • Start small and add up complexity (use small parts of your training sets to start with)
    • Choosing the less complicated models to start first

Tuning

  • Happens on Validation or Training Sets

Comparison

  • Measure Model Performance via Test Set
  • Advoid Overfitting & Underfitting

Overfitting

  • Great performance on the training data but poor performance on test data means your model doesn’t generalize well
  • Solution: Try simpler model or making sure your the test data is of the same style your model is training on

Underfitting

  • Poor performance on training data means the model hasn’t learned properly and is underfitting
  • Solution: Try a different model, improve the existing one through hyperparameter or collect more data.