Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 8 examples, why do they all turn label into one_hot_labels #63

Open
qiulang opened this issue Jan 6, 2023 · 3 comments
Open

Comments

@qiulang
Copy link

qiulang commented Jan 6, 2023

Hi, I don't understand the following code in Chapter 8, why does it turn the original label, a (1000,) tuple into (1000,10) two-dimensional array ? What the purpose of doing that ?

Can someone cast a light on it ? Thanks a lot

images, labels = (x_train[0:1000].reshape(1000,28*28) / 255, y_train[0:1000])

one_hot_labels = np.zeros((len(labels),10))
for i,l in enumerate(labels):
    one_hot_labels[i][l] = 1
labels = one_hot_labels
@marshallxxx
Copy link

marshallxxx commented Jan 8, 2023

It converts an int into an array of 0's and one 1, so number "2" will be converted to [0,0,1,0,0,0,0,0,0,0].

The final prediction is of same type an array of size 10, so for each digit you get its own prediction.

Hopefully this helps.

@qiulang
Copy link
Author

qiulang commented Feb 1, 2023

@marshallxxx 3 weeks later I still feel uncomfortable with that code , particularly this part,

for i in range(len(images)):
    layer_0 = images[i:i+1] # why not change to layer_0 = images[i]
    layer_1 = relu(np.dot(layer_0,weights_0_1))
    layer_2 = np.dot(layer_1,weights_1_2)
    error += np.sum((labels[i:i+1] - layer_2) ** 2)  # then we use labels[i]
    ...

If we change to layer_0 = images[i], i.e. from 2-D array to 1-D array, then we need less matrix transposition operations and implementation seems more easy to understand.

So what is your opinion of using layer_0 = images[i:i+1] there ?

for i in range(len(images)):
    layer_0 = images[i]
    layer_1 = relu(np.dot(layer_0,weights_0_1))
    layer_2 = np.dot(layer_1,weights_1_2)
    error += np.sum((labels[i] - layer_2) ** 2)
    ...

Thanks

@a-lobanova
Copy link

a-lobanova commented May 10, 2024

the matrix needs to be of a certain size for multiplication, the matrix transposes helps us to adjust the shapes. If we input a input vector with a shape of (784,) we can multiply layer_0 with the matrix weight_0_1, and then successfully multiply the result with weight_1_2 to calculate layer_2_delta and layer_1_delta. However, in order to correct the weights of weight_1_2 and weight_0_1, we need to transposes the matrices. For example, if we set the hidden size to 10, we can calculate weights_1_2 -= layer_1.T.dot(layer_2_delta). the shape of layer_1 is (10,) (if we set hidden_size to 10). But for weights_0_1 -= layer_0.T.dot(layer_1_delta), the size of the matrix is not sufficient for multiplication. The shape of layer_0 is (784,) and the shape of layer_1_delta is (10,).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants