Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename hyperparameter tuning to refine the model #380

Merged
merged 2 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions episodes/1-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
The image below shows some differences between artificial intelligence, Machine Learning and Deep Learning.


![](../fig/01_AI_ML_DL_differences.png){alt='An infographics showing the relation of AI, ML, NN and DL. NN are methods in DL which is a subset of ML algorithms that falls within the umbrella of AI'}

Check warning on line 41 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_AI_ML_DL_differences.png)

The image above is by Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, [original source]( https://en.wikipedia.org/wiki/File:AI-ML-DL.svg)

Expand All @@ -59,13 +59,13 @@
- one example equation to calculate the output for a neuron is: $output = ReLU(\sum_{i} (x_i*w_i) + bias)$


![](../fig/01_neuron.png){alt='A diagram of a single artificial neuron combining inputs and weights using an activation function.' width='600px'}

Check warning on line 62 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_neuron.png)

##### Combining multiple neurons into a network
Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below. In most neural networks neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers.
The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.

![](../fig/01_neural_net.png){alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.'}

Check warning on line 68 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_neural_net.png)
The image above is by Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons, [original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg)

::: challenge
Expand All @@ -88,7 +88,7 @@

Have a look at the following network:

![](../fig/01_xor_exercise.png){alt='A diagram of a neural network with 2 inputs, 2 hidden layer neurons, and 1 output.' width='400px'}

Check warning on line 91 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_xor_exercise.png)

a. Calculate the output of the network for the following combinations of inputs:

Expand Down Expand Up @@ -127,13 +127,13 @@
## Activation functions
Look at the following activation functions:

![](../fig/01_sigmoid.svg){alt='Plot of the sigmoid function' width='200px'}

Check warning on line 130 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_sigmoid.svg)
A. Sigmoid activation function

![](../fig/01_relu.svg){alt='Plot of the ReLU function' width='200px'}

Check warning on line 133 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_relu.svg)
B. ReLU activation function

![](../fig/01_identity_function.svg){alt='Plot of the Identity function' width='200px'}

Check warning on line 136 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_identity_function.svg)
C. Identity (or linear) activation function

Combine the following statements to the correct activation function:
Expand Down Expand Up @@ -172,7 +172,7 @@
The input (left most) layer of the network is an image and the final (right most) layer of the network outputs a zero or one to determine if the input data belongs to the class of data we are interested in.
This image is from the paper ["An Efficient Pedestrian Detection Method Based on YOLOv2" by Zhongmin Liu, Zhicai Chen, Zhanming Li, and Wenjin Hu published in Mathematical Problems in Engineering, Volume 2018](https://doi.org/10.1155/2018/3518959)

![](../fig/01_deep_network.png){alt='An example of a deep neural network'}

Check warning on line 175 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_deep_network.png)

### How do neural networks learn?
What happens in a neural network during the training process?
Expand Down Expand Up @@ -207,7 +207,7 @@
Below you see the Huber loss (green, delta = 1) and Squared error loss (blue)
as a function of `y_true - y_pred`.

![](../fig/01_huber_loss.png){alt='Huber loss (green, delta = 1) and squared error loss (blue)

Check warning on line 210 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/01_huber_loss.png)
as a function of y_true - y_pred' width='400px'}

Which loss function is more sensitive to outliers?
Expand Down Expand Up @@ -325,7 +325,10 @@

Many datasets are not ready for immediate use in a neural network and will require some preparation. Neural networks can only really deal with numerical data, so any non-numerical data (for example words) will have to be somehow converted to numerical data.

Next we will need to divide the data into multiple sets. One of these will be used by the training process and we will call it the training set. Another will be used to evaluate the accuracy of the training and we will call that one the test set. Sometimes we will also use a 3rd set known as a validation set to tune hyperparameters.
Next we will need to divide the data into multiple sets.
One of these will be used by the training process and we will call it the training set.
Another will be used to evaluate the accuracy of the training and we will call that one the test set.
Sometimes we will also use a 3rd set known as a validation set to refine the model.

### 4. Choose a pre-trained model or build a new architecture from scratch

Expand All @@ -345,7 +348,7 @@

We can now go ahead and start training our neural network. We will probably keep doing this for a given number of iterations through our training dataset (referred to as _epochs_) or until the loss function gives a value under a certain threshold. The graph below show the loss against the number of _epochs_, generally the loss will go down with each _epoch_, but occasionally it will see a small rise.

![](../fig/training-0_to_1500.svg){alt='A graph showing an exponentially decreasing loss over the first 1500 epochs of training an example network.'}

Check warning on line 351 in episodes/1-introduction.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/training-0_to_1500.svg)

### 7. Perform a Prediction/Classification

Expand All @@ -358,9 +361,12 @@

Once we trained the network we want to measure its performance. To do this we use some additional data that was not part of the training, this is known as a test set. There are many different methods available for measuring performance and which one is best depends on the type of task we are attempting. These metrics are often published as an indication of how well our network performs.

### 9. Tune Hyperparameters
### 9. Refine the model

Hyperparameters are all the parameters set by the person configuring the machine learning instead of those learned by the algorithm itself. The hyperparameters include the number of epochs or the parameters for the optimizer. It might be necessary to adjust these and re-run the training many times before we are happy with the result.
We refine the model further. We can for example slightly change the architecture of the model, or change the number of nodes in a layer.
Hyperparameters are all the parameters set by the person configuring the machine learning instead of those learned by the algorithm itself.
The hyperparameters include the number of epochs or the parameters for the optimizer.
It might be necessary to adjust these and re-run the training many times before we are happy with the result, this is often done automatically and that is referred to as hyperparameter tuning.

### 10. Share Model

Expand Down Expand Up @@ -452,7 +458,7 @@
- "Deep Learning is a machine learning technique based on using many artificial neurons arranged in layers."
- "Neural networks learn by minimizing a loss function."
- "Deep Learning is well suited to classification and prediction problems such as image recognition."
- "To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, tuning Hyperparameters, measuring performance before we can classify data."
- "To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, refine the model, measuring performance before we can classify data."
- "Keras is a Deep Learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch."

::::::::::::::::::::::::::::::::::::::::::::::::
15 changes: 8 additions & 7 deletions episodes/2-keras.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ As a reminder below are the steps of the deep learning workflow:
6. Train the model
7. Perform a Prediction/Classification
8. Measure performance
9. Tune hyperparameters
9. Refine the model
10. Save model

In this episode we will focus on a minimal example for each of these steps, later episodes will build on this knowledge to go into greater depth for some or all of these steps.
Expand Down Expand Up @@ -327,7 +327,7 @@ The instantiation here has 2 parameters and a seemingly strange combination of p
let us take a closer look.
The first parameter `10` is the number of neurons we want in this layer, this is one of the
hyperparameters of our system and needs to be chosen carefully. We will get back to this in the section
on hyperparameter tuning.
on refining the model.
The second parameter is the activation function to use, here we choose relu which is 0
for inputs that are 0 and below and the identity function (returning the same value)
for inputs above 0.
Expand Down Expand Up @@ -593,7 +593,7 @@ Length: 69, dtype: object
## 8. Measuring performance
Now that we have a trained neural network it is important to assess how well it performs.
We want to know how well it will perform in a realistic prediction scenario, measuring
performance will also come back when tuning the hyperparameters.
performance will also come back when refining the model.

We have created a test set (i.e. y_test) during the data preparation stage which we will use
now to create a confusion matrix.
Expand Down Expand Up @@ -667,18 +667,19 @@ We can try many things to improve the performance from here.
One of the first things we can try is to balance the dataset better.
Other options include: changing the network architecture or changing the
training parameters

Note that the outcome you have might be slightly different from what is shown in this tutorial.
dsmits marked this conversation as resolved.
Show resolved Hide resolved
::::
:::

## 9. Tune hyperparameters
## 9. Refine the model
As we discussed before the design and training of a neural network comes with
many hyper parameter choices.
We will go into more depth of these hyperparameters in later episodes.
many hyperparameter and model architecture choices.
We will go into more depth of these choices in later episodes.
For now it is important to realize that the parameters we chose were
somewhat arbitrary and more careful consideration needs to be taken to
pick hyperparameter values.

Note that the outcome you have might be slightly different from what is shown in this tutorial.

## 10. Share model
It is very useful to be able to use the trained neural network at a later
Expand Down
2 changes: 1 addition & 1 deletion episodes/3-monitor-the-model.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ randomly predicting a number, so the problem is not impossible to solve with mac
::::
:::

## 9. Tune hyperparameters
## 9. Refine the model

### Watch your model training closely

Expand Down
2 changes: 1 addition & 1 deletion episodes/4-advanced-layer-types.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ As you can see the validation accuracy only reaches about 35%, whereas the CNN r
This demonstrates that convolutional layers are a big improvement over dense layers for this kind of datasets.
:::

## 9. Tune hyperparameters
## 9. Refine the model

::: challenge
## Network depth
Expand Down
4 changes: 2 additions & 2 deletions episodes/fig/graphviz/pipeline.dot
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ digraph {
train [label=<<B>Train</B><BR/>the model>]
predict [label=<<B>Perform</B><BR/>Prediction>]
quality [label=<<B>Measure</B><BR/>Performance>]
tune [label=<<B>Tune</B><BR/>Hyperparameters>]
refine [label=<<B>Refine</B><BR/>the model>]
share [label=<<B>Share</B><BR/>the model>]

#the graph
formulate -> i_o -> prepare
prepare -> create_model -> loss
loss -> train -> predict -> quality -> tune -> share
loss -> train -> predict -> quality -> refine -> share
}