MACHINE LEARNING

Based on Avain Jain's 100 days of ml code**

DATA PROCESSING

For code and dataset <--- Click

REGRESSION

-> Regression is used when the prediction have "infinite posibilities".

Types of regression

Simple Linear Regression

Multiple Linear Regression

Polynomial Regression

SIMPLE LINEAR REGRSSION

Clik here for Code and dataset

SLR is used, when we have a "single input attribute" and we want to use linearity between variables.

2 Variables, Dependent variable (predicting) and independent variable / exploratory variable(observed)

Simple Linear Regression follows linear equation

                              Y = m x + C

Y = line, Output variable to be predict

x = input variable

m = slope

C = intercept

A line plot through variables, must be "passing through intercept and mean of (x,Y) cordinate, then that line is known as line of best fit.

The goal is to find the best estimates for the coefficients to mininmize the errors in predicting y from x.

Slope

How x translates into Y value before bias.

b1 / m = (Sum((x-mean(x)* (y-mean(y)))/(Sum((x-mean(x)^2))

Intercept

Point that cuts through x axis is intercept

C = mean(y)-m(mean(x))

Assumptions of Linear regression

1️⃣ Model should be Linear

2️⃣ Errors should be Independent

3️⃣ Error terms should be normally distributed

4️⃣ Homoscedacity :Const variance on error terms

MULTIPLE LINEAR REGRESSION

For greater numbers of independent variables, visual understanding is more abstract. For p independent variables, the data points (x1, x2, x3 …, xp, y) exist in a p + 1 -dimensional space. What really matters is that the linear model (which is p -dimensional) can be represented by the p + 1 coefficients β0, β1, …, βp so that y is approximated by the equation y = β0 + β1*x1 +....

Click here for code and dataset

CLASSIFICATION

LOGISTIC REGRESSION

Click here for Code and dataset

K NEAREST NEIGHBOUR

Click here for Code and Dataset

SUPPORT VECTOR MACHINES/REGRESSION

Click here for code and Dataset.

DECISION TREE

Click here for code and Dataset for regression.

Click here for code and Dataset for Classififcation.

RANDOM FOREST

Click here for code and Dataset for regression. Click here for code and Dataset for ckassifier.

KERNAL SVM

Click here for code and Dataset.

NAIVE BAYES

Click here for code and Dataset.

CLUSTERING

K MEANS

An unsupervised learning algorithm (meaning there are no target labels) that allows you to identify similar groups or clusters of data points within your data.

Algorithm

We randomly initialize the K starting centroids. Each data point is assigned to its nearest centroid.
The centroids are recomputed as the mean of the data points assigned to the respective cluster.
Repeat steps 1 and 2 until we trigger our stopping criteria.

optimizing for and the answer is usually Euclidean distance or squared Euclidean distance to be more precise. Data points are assigned to the cluster closest to them or in other words the cluster which minimizes this squared distance. We can write this more formally as:

Kmeans Visualize

We have defined k = 2 so we are assigning data to one of two clusters at each iteration. Figure (a) corresponds to the randomly initializing the centroids. In (b) we assign the data points to their closest cluster and in Figure c we assign new centroids as the average of the data in each cluster. This continues until we reach our stopping criteria (minimize our cost function J or for a predefined number of iterations). Hopefully, the explanation above coupled with the visualization has given you a good understanding of what K means is doing.

Click here for code and Dataset.

Hierarchical Clustering

Click here for code and Dataset.

Projects:

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Apriori_Python		Apriori_Python
Artificial_Neural_Networks		Artificial_Neural_Networks
Data Processing		Data Processing
Decision Tree Classification		Decision Tree Classification
Decision tree Regression		Decision tree Regression
Eclat		Eclat
Hierarchical Clustering		Hierarchical Clustering
K Nearest Neighbours		K Nearest Neighbours
KernelSVM		KernelSVM
Kernel_PCA		Kernel_PCA
Kmeans		Kmeans
LDA		LDA
Logistic regression		Logistic regression
Model_Selection		Model_Selection
Multiple linear regression		Multiple linear regression
Naive Bayes		Naive Bayes
Natural_Language_Processing		Natural_Language_Processing
PCA		PCA
Random Forest Classification		Random Forest Classification
Random forest regression		Random forest regression
Simple linear regression		Simple linear regression
Support Vector Regression		Support Vector Regression
Thompson_Sampling		Thompson_Sampling
UCB		UCB
XGBoost		XGBoost
Day 1.jpg		Day 1.jpg
Deeplearning.md		Deeplearning.md
Python.md		Python.md
README.md		README.md
Staistics.md		Staistics.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MACHINE LEARNING

Based on Avain Jain's 100 days of ml code**

DATA PROCESSING

REGRESSION

SIMPLE LINEAR REGRSSION

Assumptions of Linear regression

MULTIPLE LINEAR REGRESSION

CLASSIFICATION

LOGISTIC REGRESSION

K NEAREST NEIGHBOUR

SUPPORT VECTOR MACHINES/REGRESSION

DECISION TREE

RANDOM FOREST

KERNAL SVM

NAIVE BAYES

CLUSTERING

K MEANS

Hierarchical Clustering

About

Releases

Packages

Languages

subhashgowda/MachineLearningComplete

Folders and files

Latest commit

History

Repository files navigation

MACHINE LEARNING

Based on Avain Jain's 100 days of ml code**

DATA PROCESSING

REGRESSION

SIMPLE LINEAR REGRSSION

Assumptions of Linear regression

MULTIPLE LINEAR REGRESSION

CLASSIFICATION

LOGISTIC REGRESSION

K NEAREST NEIGHBOUR

SUPPORT VECTOR MACHINES/REGRESSION

DECISION TREE

RANDOM FOREST

KERNAL SVM

NAIVE BAYES

CLUSTERING

K MEANS

Hierarchical Clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages