Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed typo from issue #46, #49, #42 #57

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _book/convnets.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Prior to this chapter, we've just looked at _fully-connected layers_, in which e

{% include figure_multi.md path1="/images/figures/weights_analogy_2.png" caption1="Weights analogy" %}

We can interpret the set of weights as a _feature detector_ which is trying to detect the presence of a particular feature. We can visualize these feature detectors, as we did previously for MNIST and CIFAR. In a 1-layer fully-connected layer, the "features" are simply the the image classes themselves, and thus the weights appear as templates for the entire classes.
We can interpret the set of weights as a _feature detector_ which is trying to detect the presence of a particular feature. We can visualize these feature detectors, as we did previously for MNIST and CIFAR. In a 1-layer fully-connected layer, the "features" are simply the image classes themselves, and thus the weights appear as templates for the entire classes.

In convolutional layers, we instead have a collection of smaller feature detectors called _convolutional filters_ which we individually slide along the entire image and perform the same weighted sum operation as before, on each subregion of the image. Essentially, for each of these small filters, we generate a map of responses--called an _activation map_--which indicate the presence of that feature across the image.

Expand Down Expand Up @@ -216,7 +216,7 @@ Generative applications of convnets, including those in the image domain and ass

{% include further_reading.md title="Conv Nets: A Modular Perspective" author="Chris Olah" link="https://colah.github.io/posts/2014-07-Conv-Nets-Modular/" %}

{% include further_reading.md title="Visualizing what ConvNets learn (Stanford CS231n" author="Andrej Karpathy" link="https://cs231n.github.io/understanding-cnn/" %}
{% include further_reading.md title="Visualizing what ConvNets learn (Stanford CS231n)" author="Andrej Karpathy" link="https://cs231n.github.io/understanding-cnn/" %}

{% include further_reading.md title="How do Convolutional Neural Networks work?" author="Brandon Rohrer" link="https://brohrer.github.io/how_convolutional_neural_networks_work.html" %}

Expand Down
2 changes: 1 addition & 1 deletion _book/how_neural_networks_are_trained.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ So how do we actually calculate where that point at the bottom is exactly? There

## The curse of nonlinearity

Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise left to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$.
Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall that the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$.

This nonlinearity means that the parameters do not act independently of each other in influencing the shape of the loss function. Rather than having a bowl shape, the loss function of a neural network is more complicated. It is bumpy and full of hills and troughs. The property of being "bowl-shaped" is called [convexity](https://en.wikipedia.org/wiki/Convex_function), and it is a highly prized convenience in multi-parameter optimization. A convex loss function ensures we have a global minimum (the bottom of the bowl), and that all roads downhill lead to it.

Expand Down