Skip to content

Commit

Permalink
Merge branch 'main' into image-paths-and-captions
Browse files Browse the repository at this point in the history
  • Loading branch information
svenvanderburg authored Nov 8, 2023
2 parents 1752301 + f1e061a commit 56eff9f
Show file tree
Hide file tree
Showing 13 changed files with 302 additions and 170 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
This lesson gives an introduction to deep learning.

## Lesson Design
The design of this lesson can be found in the [lesson design](_extras/design.md)
The design of this lesson can be found in the [lesson design](https://carpentries-incubator.github.io/deep-learning-intro/design.html)

## Target Audience
The main audience of this carpentry lesson is PhD students that have little to no experience with
Expand All @@ -30,7 +30,7 @@ Please see the current list of
[issues](https://github.com/carpentries-incubator/deep-learning_intro/issues)
for ideas for contributing to this repository.

Please also familiarize yourself with the [lesson design](_extras/design.md)
Please also familiarize yourself with the [lesson design](https://carpentries-incubator.github.io/deep-learning-intro/design.html)

For making your contribution, we use the GitHub flow, which is nicely explained in the
chapter [Contributing to a Project](http://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project)
Expand Down
4 changes: 4 additions & 0 deletions episodes/1-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,11 @@ b. What logical problem does this network solve?

:::: solution
## Solution

#### 1: calculate the output for one neuron

You can calculate the output as follows:

* Weighted sum of input: `0 * (-1) + 0.5 * (-0.5) + 1 * 0.5 = 0.25`
* Add the bias: `0.25 + 1 = 1.25`
* Apply activation function: `max(1.25, 0) = 1.25`
Expand Down
52 changes: 40 additions & 12 deletions episodes/2-keras.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ As a reminder below are the steps of the deep learning workflow:

In this episode we will focus on a minimal example for each of these steps, later episodes will build on this knowledge to go into greater depth for some or all of these steps.

::: instructor
This episode really aims to go through the whole process once, as quickly as possible.
In episode 3 we will expand on all the concepts that are lightly inroduced in episode 2. Some concepts like monitoring the training progress, optimization and learning rate are explained in detail in episode 3.
It is good to stress this a few times, because learners will usually have a lot of questions like:
'Why don't we normalize our features' or 'Why do we choose Adam optimizer?'.
It can be a good idea to park some of these questions for discussion in episode 3 and 4.
:::

::: callout
## GPU usage
For this lesson having a GPU (graphics card) available is not needed.
Expand Down Expand Up @@ -202,7 +210,7 @@ penguins_filtered = penguins_filtered.dropna()
Finally, we select only the features
```python
# Extract columns corresponding to features
penguins_features = penguins_filtered.drop(columns=['species'])
features = penguins_filtered.drop(columns=['species'])
```

### Prepare target data for training
Expand Down Expand Up @@ -236,7 +244,7 @@ How many output neurons will our network have now that we one-hot encoded the ta

:::: solution
## Solution
3, one for each output variable class
C: 3, one for each output variable class

::::
:::
Expand All @@ -254,7 +262,7 @@ For this episode we will keep it at just a training and test set however.
To split the cleaned dataset into a training and test set we will use a very convenient
function from sklearn called `train_test_split`.
This function takes a number of parameters which are extensively explained [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) :

Check warning on line 264 in episodes/2-keras.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
- The first two parameters are the dataset (in our case penguins_features) and the corresponding targets (i.e. defined as target).
- The first two parameters are the dataset (in our case features) and the corresponding targets (i.e. defined as target).
- Next is the named parameter `test_size` this is the fraction of the dataset that is
used for testing, in this case `0.2` means 20% of the data will be used for testing.
- `random_state` controls the shuffling of the dataset, setting this value will reproduce
Expand All @@ -265,7 +273,7 @@ the same results (assuming you give the same integer) every time it is called.
```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(penguins_features, target,test_size=0.2, random_state=0, shuffle=True, stratify=target)
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=0, shuffle=True, stratify=target)
```

## 4. Build an architecture from scratch or choose a pretrained model
Expand Down Expand Up @@ -389,11 +397,8 @@ where each layer has **exactly one input tensor and one output tensor**.

:::: solution
## Solution
Have a look at the output of `model.summary()`:
```python
inputs = keras.Input(shape=X_train.shape[1])
hidden_layer = keras.layers.Dense(10, activation="relu")(inputs)
output_layer = keras.layers.Dense(3, activation="softmax")(hidden_layer)
model = keras.Model(inputs=inputs, outputs=output_layer)
model.summary()
```

Expand All @@ -414,10 +419,14 @@ Non-trainable params: 0
_________________________________________________________________
```
The model has 83 trainable parameters.

If you increase the number of neurons in the hidden layer the number of
trainable parameters in both the hidden and output layer increases or
decreases accordingly of neurons.
The name in quotes within the string `Model: "model_1"` may be different in your view; this detail is not important.
decreases in accordance with the number of neurons added.
Each extra neuron has 4 weights connected to the input layer, 1 bias term, and 3 weights connected to the output layer.
So in total 8 extra parameters.

*The name in quotes within the string `Model: "model_1"` may be different in your view; this detail is not important.*

#### (optional) Keras Sequential vs Functional API
3. This implements the same model using the Sequential API:
Expand Down Expand Up @@ -524,11 +533,30 @@ Looking at the training curve we have just made.
* Does the graph look very jittery?
2. Do you think the resulting trained network will work well on the test set?

When the training process does not go well:

3. (optional) Something went wrong here during training. What could be the problem, and how do you see that in the training curve?
Also compare the range on the y-axis with the previous training curve.
![](../fig/02_bad_training_history_1.png){alt='Very jittery training curve with the loss value jumping back and forth between 2 and 4. The range of the y-axis is from 2 to 4, whereas in the previous training curve it was from 0 to 2. The loss seems to decrease a litle bit, but not as much as compared to the previous plot where it dropped to almost 0. The minimum loss in the end is somewhere around 2.'}

Check warning on line 540 in episodes/2-keras.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [](../fig/02_bad_training_history_1.png)

:::: solution
## Solution
1. The loss curve should drop quite quickly in a smooth line with little jitter
1. The training loss decreases quickly. It drops in a smooth line with little jitter.
This is ideal for a training curve.
2. The results of the training give very little information on its performance on a test set.
You should be careful not to use it as an indication of a well trained network.
3. (optional) The loss does not go down at all, or only very slightly. This means that the model is not learning anything.
It could be that something went wrong in the data preparation (for example the labels are not attached to the right features).
In addition, the graph is very jittery. This means that for every update step,
the weights in the network are updated in such a way that the loss sometimes increases a lot and sometimes decreases a lot.
This could indicate that the weights are updated too much at every learning step and you need a smaller learning rate
(we will go into more details on this in the next episode).
Or there is a high variation in the data, leading the optimizer to change the weights in different directions at every learning step.
This could be addressed by presenting more data at every learning step (or in other words increasing the batch size).
In this case the graph was created by training on nonsense data, so this a training curve for a problem where nothing can be learned really.

We will take a closer look at training curves in the next episode. Some of the concepts touched upon here will also be further explained there.

::::
:::

Expand Down Expand Up @@ -736,7 +764,7 @@ Length: 69, dtype: object
[sex_pairplot]: fig/02_sex_pairplot.png "Pair plot grouped by sex"
{alt='Pair plot showing the separability of the two sexes of penguin for combinations of dataset attributes'}

[training_curve]: fig/training_curve.png "Training Curve"
[training_curve]: fig/02_training_curve.png "Training Curve"
{alt='Training loss curve of the neural network training which depicts exponential decrease in loss before a plateau from ~10 epochs'}

[confusion_matrix]: fig/confusion_matrix.png "Confusion Matrix"
Expand Down
30 changes: 20 additions & 10 deletions episodes/3-monitor-the-model.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,19 @@ exercises: 80
- "Implement basic strategies to prevent overfitting"
:::

::: instructor
## Copy-pasting code
In this episode we first introduce a simple approach to the problem,
then we iterate on that a few times to, step-by-step,
working towards a more complex solution.
Unfortunately this involves using the same code repeatedly over and over again,
only slightly adapting it.

To avoid too much typing, it can help to copy-paste code from higher up in the notebook.
Be sure to make it clear where you are copying from
and what you are actually changing in the copied code.
It can for example help to add a comment to the lines that you added.
:::

In this episode we will explore how to monitor the training progress, evaluate our the model predictions and finetune the model to avoid over-fitting. For that we will use a more complicated weather data-set.

Expand Down Expand Up @@ -281,10 +294,10 @@ Answer the following questions:
We want to move towards the global minimum, so in the opposite direction of the gradient.

3. Correct answer: B & D
- A. The number of samples in an epoch also increases (incorrect, an epoch is always defined as passing through the training data for one cycle)
- B. The number of batches in an epoch goes down (correct, the number of batches is the samples in an epoch divided by the batch size)
- C. The training progress is more jumpy, because more samples are consulted in each update step (one batch). (incorrect, more samples are consulted in each update step, but this makes the progress less jumpy since you get a more accurate estimate of the loss in the entire dataset)
- D. The memory load (memory as in computer hardware) of the training process is increased (correct, the data is begin loaded one batch at a time, so more samples means more memory usage)
- A. The number of samples in an epoch also increases (**incorrect**, an epoch is always defined as passing through the training data for one cycle)
- B. The number of batches in an epoch goes down (**correct**, the number of batches is the samples in an epoch divided by the batch size)
- C. The training progress is more jumpy, because more samples are consulted in each update step (one batch). (**incorrect**, more samples are consulted in each update step, but this makes the progress less jumpy since you get a more accurate estimate of the loss in the entire dataset)
- D. The memory load (memory as in computer hardware) of the training process is increased (**correct**, the data is begin loaded one batch at a time, so more samples means more memory usage)

::::
:::
Expand Down Expand Up @@ -428,7 +441,8 @@ plot_predictions(y_test_predicted, y_test, title='Predictions on the test set')
## Solution
While the performance on the train set seems reasonable, the performance on the test set is much worse.
This is a common problem called **overfitting**, which we will discuss in more detail later.
Optional exercise:

#### Optional exercise:
The metric that we are using: RMSE would be a good one. You could also consider Mean Squared Error, that punishes large errors more (because large errors create even larger squared errors).
It is important that if the model improves in performance on the basis of this metric then that should also lead you a step closer to reaching your goal: to predict tomorrow's sunshine hours.
If you feel that improving the metric does not lead you closer to your goal, then it would be better to choose a different metric
Expand Down Expand Up @@ -702,10 +716,6 @@ An alternative, more common approach, is to add **BatchNormalization** layers ([
Similar to dropout, batch normalization is available as a network layer in Keras and can be added to the network in a similar way.
It does not require any additional parameter setting.

```python
from tensorflow.keras.layers import BatchNormalization
```

The `BatchNormalization` can be inserted as yet another layer into the architecture.

```python
Expand All @@ -714,7 +724,7 @@ def create_nn():
inputs = keras.layers.Input(shape=(X_data.shape[1],), name='input')

# Dense layers
layers_dense = keras.layers.BatchNormalization()(inputs)
layers_dense = keras.layers.BatchNormalization()(inputs) # This is new!
layers_dense = keras.layers.Dense(100, 'relu')(layers_dense)
layers_dense = keras.layers.Dense(50, 'relu')(layers_dense)

Expand Down
Loading

0 comments on commit 56eff9f

Please sign in to comment.