To review only [do not merge] #1

gmichaeljaison · 2017-08-30T03:00:37Z

No description provided.

gmichaeljaison · 2017-08-30T03:13:19Z

knn/knn.py

+
+
+def main():
+    if len(sys.argv) == 5:


#add
argparse is a better module to handle with command line arguments.

gmichaeljaison · 2017-08-30T03:16:53Z

knn/knn.py

+
+
+def main():
+    if len(sys.argv) == 5:


#style
Instead of pushing the entire function code by one intent level. You can skip instead.

if len(sys.argv) < 5: continue

gmichaeljaison · 2017-08-30T03:19:20Z

knn/knn.py

+        k = int(sys.argv[1])  # number of nearest neighbor
+        d = int(sys.argv[2])  # number of pca dimension
+        n = int(sys.argv[3])  # number of test samples to consider
+        input_data = sys.argv[4]


#style
input_file

gmichaeljaison · 2017-08-30T03:34:38Z

knn/knn.py

+
+        # convert the image to gray scale
+        train_grayed = convert_gray(train_data, 1000 - n, 1024)
+        test_grayed = convert_gray(test_data, n, 1024)


#fix
convert entire data matrix to gray and then split into train and test subset

gmichaeljaison · 2017-08-30T03:35:57Z

knn/knn.py

+
+
+def convert_gray(data, row, column):
+    img = data[:]


why creating a copy? what is the use of this line?
img = data is creating a reference
img = data[:] is creating a copy of matrix

gmichaeljaison · 2017-08-30T04:35:43Z

knn/knn.py

+
+def knn(train_data, test_data, test_labels, train_labels, k, d):
+    # do pca and reduce dimension for train data
+    pca_obj, train_pca = do_pca(train_data, d)


do pca on train data also here. give the do_pca single responsiblity.

you can remove a lot of functions here

gmichaeljaison · 2017-08-30T04:35:50Z

knn/knn.py

+    return grayed
+
+
+def do_pca(gray_data, d):


compute_pca

gmichaeljaison · 2017-08-30T04:42:38Z

knn/knn.py

+    distance = []
+    neighbors = []
+    for i in range(len(train_pca)):
+        distance.append((train_pca[i], calculate_distance(train_pca[i], test_pca), train_labels[i]))


#fix
just a distance array of (index, distance) is enough. you are keeping copy of each record in a tuple again.

gmichaeljaison · 2017-08-30T04:45:02Z

knn/knn.py

+    for i in range(len(train_pca)):
+        distance.append((train_pca[i], calculate_distance(train_pca[i], test_pca), train_labels[i]))
+    distance.sort(key=lambda x: x[1])
+    for x in range(k):


#clean
return distance[:k]

gmichaeljaison · 2017-08-30T04:48:33Z

knn/knn.py

+    distance = []
+    neighbors = []
+    for i in range(len(train_pca)):
+        distance.append((train_pca[i], calculate_distance(train_pca[i], test_pca), train_labels[i]))


you can calculate the distance of the entire train_pca without for loop.
distance = np.sqrt(np.sum((train_mat - test_rec) ** 2))

Can you explain ?

kNN for image classification using PCA

5f19dcd

gmichaeljaison commented Aug 30, 2017

View reviewed changes

kNN for text classification using cosine similarity

0a935b1

To review only [do not merge] #1

Are you sure you want to change the base?

To review only [do not merge] #1

Uh oh!

Conversation

gmichaeljaison commented Aug 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants