For code and dataset <--- Click
-> Regression is used when the prediction have "infinite posibilities".
Types of regression
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Clik here for Code and dataset
SLR is used, when we have a "single input attribute" and we want to use linearity between variables.
2 Variables, Dependent variable (predicting) and independent variable / exploratory variable(observed)
Simple Linear Regression follows linear equation
Y = m x + C
Y = line, Output variable to be predict
x = input variable
m = slope
C = intercept
A line plot through variables, must be "passing through intercept and mean of (x,Y) cordinate, then that line is known as line of best fit.
The goal is to find the best estimates for the coefficients to mininmize the errors in predicting y from x.
Slope
How x translates into Y value before bias.
b1 / m = (Sum((x-mean(x)* (y-mean(y)))/(Sum((x-mean(x)^2))
Intercept
Point that cuts through x axis is intercept
C = mean(y)-m(mean(x))
1️⃣ Model should be Linear
2️⃣ Errors should be Independent
3️⃣ Error terms should be normally distributed
4️⃣ Homoscedacity :Const variance on error terms
For greater numbers of independent variables, visual understanding is more abstract. For p independent variables, the data points (x1, x2, x3 …, xp, y) exist in a p + 1 -dimensional space. What really matters is that the linear model (which is p -dimensional) can be represented by the p + 1 coefficients β0, β1, …, βp so that y is approximated by the equation y = β0 + β1*x1 +....
Click here for code and dataset
Click here for Code and dataset
Click here for Code and Dataset
Click here for code and Dataset.
Click here for code and Dataset for regression.
Click here for code and Dataset for Classififcation.
Click here for code and Dataset for regression. Click here for code and Dataset for ckassifier.
Click here for code and Dataset.
Click here for code and Dataset.
An unsupervised learning algorithm (meaning there are no target labels) that allows you to identify similar groups or clusters of data points within your data.
Algorithm
- We randomly initialize the K starting centroids. Each data point is assigned to its nearest centroid.
- The centroids are recomputed as the mean of the data points assigned to the respective cluster.
- Repeat steps 1 and 2 until we trigger our stopping criteria.
optimizing for and the answer is usually Euclidean distance or squared Euclidean distance to be more precise. Data points are assigned to the cluster closest to them or in other words the cluster which minimizes this squared distance. We can write this more formally as:
Kmeans Visualize
We have defined k = 2 so we are assigning data to one of two clusters at each iteration. Figure (a) corresponds to the randomly initializing the centroids. In (b) we assign the data points to their closest cluster and in Figure c we assign new centroids as the average of the data in each cluster. This continues until we reach our stopping criteria (minimize our cost function J or for a predefined number of iterations). Hopefully, the explanation above coupled with the visualization has given you a good understanding of what K means is doing.
Click here for code and Dataset.
Click here for code and Dataset.
Projects: