-
Notifications
You must be signed in to change notification settings - Fork 0
/
Machine Learning intro.py
212 lines (170 loc) · 8.81 KB
/
Machine Learning intro.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# Machine Learning intro.py
## 1. What is Machine Learning?
"""
Imagine you're teaching a child to recognize different types of fruits.
You show them apples, bananas, oranges, and so on, and over time,
they start to understand the characteristics of each fruit -
color, shape, size, taste. Now, when you show them a fruit they've
never seen before, they can often guess what it is based on those characteristics.
Machine Learning (ML) works in a similar way.
It's a branch of artificial intelligence (AI) that focuses on the use of
data and algorithms to imitate the way humans learn.
In essence, ML algorithms use a set of data samples to make predictions
or decisions without being explicitly programmed to do so.
For example,
it can be trained to recognize the characteristics of different types of fruits.
In our case trained to predict and suggest tasks based on the user's schedule
and predicted time to complete that task and the due date of the task.
"""
## 2. Introduction to Scikit-learn
"""
Now that we understand what machine learning is,
let's talk about one of its libraries in Python - Scikit-learn.
Scikit-learn is like a toolbox full of machine learning tools.
You'd use Scikit-learn for tasks like classification, regression, clustering.
It provides various tools for model fitting, data preprocessing, model selection and
evaluation, making it a library for both supervised and unsupervised machine learning.
Scikit-learn is built on top of NumPy, SciPy, and Matplotlib,
"""
## 3. Basic Code Example Using Scikit-learn
"""
Let's look at an example of how Scikit-learn can be used.
We'll use the Linear Regression model as an example:
Run the code below to see the output.
"""
# Import the necessary libraries
from sklearn import linear_model
# pip install -U scikit-learn
import matplotlib.pyplot as plt
# pip install -U matplotlib
# Create a Linear Regression object
# Basically, we're creating a Linear Regression model: https://youtu.be/nk2CQITm_eo?si=uciJQKH6HVuyshdt, https://www.youtube.com/watch?v=NUXdtN1W1FE&ab_channel=Simplilearn
# Linear regression is a statistical method used to model and analyze the relationship
# between a dependent variable and one or more independent variables by fitting a linear equation to the observed data.
reg = linear_model.LinearRegression()
# Define the data points
# Basically, we're creating a dataset
# Features are the independent variables
# Target variable is the dependent variable
x = [[i, i] for i in range(10)] # Features
y = [i for i in range(10)] # Target variable
# Fit the model to the data
# Basically, we're training the model
reg.fit(x, y)
# Generate additional data points
# Basically, we're creating more data points
x_new = [[i, i+1] for i in range(10)] # Features
y_new = [i+1 for i in range(10)] # Target variable
# Generate additional data points
# Basically, we're creating even more data points
x_newer = [[i, i-1] for i in range(10)] # Features
y_newer = [i-1 for i in range(10)] # Target variable
# Plot the original data points
# We're plotting the original data points
plt.scatter([i[0] for i in x], y, color='black')
# Plot the additional data points
# We're plotting the additional data points
plt.scatter([i[0] for i in x_new], y_new, color='red')
plt.scatter([i[0] for i in x_newer], y_newer, color='green')
# Plot the regression line
# We're plotting the regression line
plt.plot([i[0] for i in x], reg.predict(x), color='blue', linewidth=3)
# Add labels and title
plt.xlabel('Feature')
plt.ylabel('Target')
plt.title('#3. Linear Regression')
# Show the plot
plt.show()
"""
In this code snippet, we first import the necessary library.
We then create an object of the Linear Regression model.
We fit this model on some data (x,y).
Finally, we print out the coefficients of the fitted model.
"""
## 4. Conclusion
"""
Helpful Sources:
(1) What is Machine Learning? | YouTube. https://www.youtube.com/watch?v=ukzFI9rgwfU&ab_channel=Simplilearn
(2) What is Machine Learning? | IBM. https://www.ibm.com/topics/machine-learning.
(3) An introduction to machine learning with scikit-learn. https://scikit-learn.org/stable/tutorial/basic/tutorial.html.
(4) Introduction to Scikit-Learn (sklearn) in Python • datagy. https://datagy.io/python-scikit-learn-introduction/.
(5) Introduction to Scikit learn in 6 minutes - YouTube. https://www.youtube.com/watch?v=oFa8NvTLkKE&ab_channel=TimesPro.
(6) Python SciKit Learn Tutorial | DigitalOcean. https://www.digitalocean.com/community/tutorials/python-scikit-learn-tutorial.
(7) scikit-learn Tutorials — scikit-learn 1.3.1 documentation. https://scikit-learn.org/stable/tutorial/index.html.
(8) INTRODUCTION TO SCIKIT-LEARN - GitHub Pages. https://tdmdal.github.io/sklearn-workshop/introtosklearn.pdf.
(9) How To Build a Machine Learning Classifier in Python with Scikit-learn. https://www.digitalocean.com/community/tutorials/how-to-build-a-machine-learning-classifier-in-python-with-scikit-learn.
(10) What Is Machine Learning? Definition, Types, and Examples. https://www.coursera.org/articles/what-is-machine-learning.
(11) What is machine learning? | Microsoft Azure. https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-machine-learning-platform/.
(12) Machine learning - Wikipedia. https://en.wikipedia.org/wiki/Machine_learning.
"""
# 5. Bonus: Examlpe of a Real World Example using Clustering (K-Means Clustering)
"""
This code demonstrates how machine learning can be used to group data into clusters.
Specifically, it takes a dataset of customer transactions and groups the customers
into clusters based on their transaction history. This can be useful for businesses
that want to identify groups of customers with similar purchasing behavior and tailor
their marketing strategies accordingly. Note that this is just an example and not necessarily
the best or only way to solve this problem.
Run the code below to see the output.
"""
# Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Generate customer transaction data
# This would typically be read from a database or file like a CSV
data = {
'Customer ID': list(range(1, 101)),
'Amount Spent': np.random.randint(0, 100, size=100),
'Items Purchased': np.random.randint(1, 20, size=100)
}
# Create a DataFrame
# Making a table
# Dataframe: https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html
df = pd.DataFrame(data)
# Extract the features
# Features are the independent variables
# Target variable is the dependent variable
X = df.iloc[:, 1:].values
# Scale the features
# Making the features have the same scale
scaler = StandardScaler()
# StandardScaler: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
X = scaler.fit_transform(X)
# Create a KMeans object
# KMeans model: https://youtu.be/4b5d3muPQmA?si=8f2f9f9f9f9f9f9f
kmeans = KMeans(n_clusters=4)
# Fit the model using the customer transaction data
kmeans.fit(X)
# Add the cluster labels to the original data
df['Cluster'] = kmeans.labels_
# Calculate the correlation within each cluster
# Group the data by cluster and calculate the correlation between Amount Spent and Items Purchased
# Ref: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
correlations = df.groupby('Cluster')['Amount Spent', 'Items Purchased'].corr().iloc[0::2, -1]
avg_items = df.groupby('Cluster')['Items Purchased'].mean()
avg_spent = df.groupby('Cluster')['Amount Spent'].mean()
# Convert the indices to a flat index
# Flat Index: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.to_flat_index.html
correlations.index = correlations.index.to_flat_index()
avg_items.index = avg_items.index.to_flat_index()
avg_spent.index = avg_spent.index.to_flat_index()
# Combine the correlations, average items purchased, and average amount spent into a single table
# Concat: https://pandas.pydata.org/docs/reference/api/pandas.concat.html
table = pd.concat([correlations, avg_items, avg_spent], axis=1)
table.columns = ['Correlation', 'Avg Items Purchased', 'Avg Amount Spent']
# Print the table
print(table)
# Plot the clusters
plt.scatter(df['Amount Spent'], df['Items Purchased'], c=df['Cluster'], cmap='viridis')
centroids = scaler.inverse_transform(kmeans.cluster_centers_)
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='*')
# Annotate: https://matplotlib.org/stable/users/explain/text/annotations.html#plotting-guide-annotation
for i, centroid in enumerate(centroids):
plt.annotate(f" {i}", (centroid[0], centroid[1]), fontsize=12, fontweight='bold', color='red')
plt.xlabel('Amount Spent')
plt.ylabel('Items Purchased')
plt.title('#5. Customer Segmentation using K-means clustering')
plt.show()