fix image file paths and captions (#426)

Co-authored-by: Sven van der Burg <[email protected]>
carpentries-incubator · Nov 8, 2023 · 512a5c8 · 512a5c8
1 parent f1e061a
commit 512a5c8
Show file tree

Hide file tree

Showing 5 changed files with 64 additions and 57 deletions.
diff --git a/episodes/1-introduction.Rmd b/episodes/1-introduction.Rmd
@@ -38,9 +38,12 @@ Deep Learning (DL) is just one of many techniques collectively known as machine
 The image below shows some differences between artificial intelligence, Machine Learning and Deep Learning.
 
 
-![](../fig/01_AI_ML_DL_differences.png){alt='An infographics showing the relation of AI, ML, NN and DL. NN are methods in DL which is a subset of ML algorithms that falls within the umbrella of AI'}
-
-The image above is by Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, [original source]( https://en.wikipedia.org/wiki/File:AI-ML-DL.svg)
+![
+Image credit: Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, 
+[original source]( https://en.wikipedia.org/wiki/File:AI-ML-DL.svg)
+](fig/01_AI_ML_DL_differences.png){
+alt='An infographic showing the relation of AI, ML, NN and DL. NN are methods in DL which is a subset of ML algorithms that falls within the umbrella of AI'
+}
 
 
 #### Neural Networks
@@ -59,14 +62,18 @@ A neural network consists of connected computational units called **neurons**. E
 - one example equation to calculate the output for a neuron is: $output = ReLU(\sum_{i} (x_i*w_i) + bias)$
 
 
-![](../fig/01_neuron.png){alt='A diagram of a single artificial neuron combining inputs and weights using an activation function.' width='600px'}
+![](fig/01_neuron.png){alt='A diagram of a single artificial neuron combining inputs and weights using an activation function.' width='600px'}
 
 ##### Combining multiple neurons into a network
 Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below. In most neural networks neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers.
 The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.
 
-![](../fig/01_neural_net.png){alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.'}
-The image above is by Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons, [original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg)
+![
+Image credit: Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons, 
+[original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg)
+](fig/01_neural_net.png){
+alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.'
+}
 
 ::: challenge
 ## Neural network calculations
@@ -88,7 +95,7 @@ _Note: You can use whatever you like: brain only, pen&paper, Python, Excel..._
 
 Have a look at the following network:
 
-![](../fig/01_xor_exercise.png){alt='A diagram of a neural network with 2 inputs, 2 hidden layer neurons, and 1 output.' width='400px'}
+![](fig/01_xor_exercise.png){alt='A diagram of a neural network with 2 inputs, 2 hidden layer neurons, and 1 output.' width='400px'}
 
 a. Calculate the output of the network for the following combinations of inputs:
 
@@ -131,14 +138,13 @@ b. This solves the XOR logical problem, the output is 1 if only one of the two i
 ## Activation functions
 Look at the following activation functions:
 
-![](../fig/01_sigmoid.svg){alt='Plot of the sigmoid function' width='200px'}
-A. Sigmoid activation function
+![A. Sigmoid activation function](fig/01_sigmoid.svg){alt='Plot of the sigmoid function' width='200px'}
+
+
+![B. ReLU activation function](fig/01_relu.svg){alt='Plot of the ReLU function' width='200px'}
 
-![](../fig/01_relu.svg){alt='Plot of the ReLU function' width='200px'}
-B. ReLU activation function
 
-![](../fig/01_identity_function.svg){alt='Plot of the Identity function' width='200px'}
-C. Identity (or linear) activation function
+![C. Identity (or linear) activation function](fig/01_identity_function.svg){alt='Plot of the Identity function' width='200px'}
 
 Combine the following statements to the correct activation function:
 
@@ -176,7 +182,7 @@ The image below shows a diagram of all the layers (there are too many neurons to
 The input (left most) layer of the network is an image and the final (right most) layer of the network outputs a zero or one to determine if the input data belongs to the class of data we are interested in.
 This image is from the paper ["An Efficient Pedestrian Detection Method Based on YOLOv2" by Zhongmin Liu, Zhicai Chen, Zhanming Li, and Wenjin Hu published in Mathematical Problems in Engineering, Volume 2018](https://doi.org/10.1155/2018/3518959)
 
-![](../fig/01_deep_network.png){alt='An example of a deep neural network'}
+![](fig/01_deep_network.png){alt='An example of a deep neural network'}
 
 ### How do neural networks learn?
 What happens in a neural network during the training process?
@@ -211,7 +217,7 @@ A more complicated and less used loss function for regression is the [Huber loss
 Below you see the Huber loss (green, delta = 1) and Squared error loss (blue)
 as a function of `y_true - y_pred`.
 
-![](../fig/01_huber_loss.png){alt='Huber loss (green, delta = 1) and squared error loss (blue)
+![](fig/01_huber_loss.png){alt='Huber loss (green, delta = 1) and squared error loss (blue)
 as a function of y_true - y_pred' width='400px'}
 
 Which loss function is more sensitive to outliers?
@@ -352,7 +358,7 @@ The optimizer is responsible for taking the output of the loss function and then
 
 We can now go ahead and start training our neural network. We will probably keep doing this for a given number of iterations through our training dataset (referred to as _epochs_) or until the loss function gives a value under a certain threshold. The graph below show the loss against the number of _epochs_, generally the loss will go down with each _epoch_, but occasionally it will see a small rise.
 
-![](../fig/training-0_to_1500.svg){alt='A graph showing an exponentially decreasing loss over the first 1500 epochs of training an example network.'}
+![](fig/training-0_to_1500.svg){alt='A graph showing an exponentially decreasing loss over the first 1500 epochs of training an example network.'}
 
 ### 7. Perform a Prediction/Classification
 

diff --git a/episodes/2-keras.Rmd b/episodes/2-keras.Rmd
@@ -76,11 +76,11 @@ The goal is to predict a penguins' species using the attributes available in thi
 The `palmerpenguins` data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
 The physical attributes measured are flipper length, beak length, beak width, body mass, and sex.
 
-![][palmer-penguins]
-*Artwork by @allison_horst*
+![*Artwork by @allison_horst*][palmer-penguins]
+
+
+![*Artwork by @allison_horst*][penguin-beaks]
 
-![][penguin-beaks]
-*Artwork by @allison_horst*
 
 These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the [Palmer Station Long Term Ecological Research Program](https://pal.lternet.edu/), part of the [US Long Term Ecological Research Network](https://lternet.edu/). The data were imported directly from the [Environmental Data Initiative](https://environmentaldatainitiative.org/) (EDI) Data Portal, and are available for use by CC0 license ("No Rights Reserved") in accordance with the [Palmer Station Data Policy](https://pal.lternet.edu/data/policies).
 
@@ -752,22 +752,22 @@ Length: 69, dtype: object
 ```
 
 
-[palmer-penguins]: ../fig/palmer_penguins.png "Palmer Penguins"
+[palmer-penguins]: fig/palmer_penguins.png "Palmer Penguins"
 {alt='Illustration of the three species of penguins found in the Palmer Archipelago, Antarctica: Chinstrap, Gentoo and Adele'}
 
-[penguin-beaks]: ../fig/culmen_depth.png "Culmen Depth"
+[penguin-beaks]: fig/culmen_depth.png "Culmen Depth"
 {alt='Illustration of the beak dimensions called culmen length and culmen depth in the dataset'}
 
-[pairplot]: ../fig/pairplot.png "Pair Plot"
+[pairplot]: fig/pairplot.png "Pair Plot"
 {alt='Pair plot showing the separability of the three species of penguin for combinations of dataset attributes'}
 
-[sex_pairplot]: ../fig/02_sex_pairplot.png "Pair plot grouped by sex"
+[sex_pairplot]: fig/02_sex_pairplot.png "Pair plot grouped by sex"
 {alt='Pair plot showing the separability of the two sexes of penguin for combinations of dataset attributes'}
 
-[training_curve]: ../fig/02_training_curve.png "Training Curve"
+[training_curve]: fig/02_training_curve.png "Training Curve"
 {alt='Training loss curve of the neural network training which depicts exponential decrease in loss before a plateau from ~10 epochs'}
 
-[confusion_matrix]: ../fig/confusion_matrix.png "Confusion Matrix"
+[confusion_matrix]: fig/confusion_matrix.png "Confusion Matrix"
 {alt='Confusion matrix of the test set with high accuracy for Adelie and Gentoo classification and no correctly predicted Chinstrap'}
 
 

diff --git a/episodes/3-monitor-the-model.Rmd b/episodes/3-monitor-the-model.Rmd
@@ -46,7 +46,7 @@ Here we want to work with the *weather prediction dataset* (the light version) w
 It contains daily weather observations from 11 different European cities or places through the
 years 2000 to 2010. For all locations the data contains the variables ‘mean temperature’, ‘max temperature’, and ‘min temperature’. In addition, for multiple locations, the following variables are provided: 'cloud_cover', 'wind_speed', 'wind_gust', 'humidity', 'pressure', 'global_radiation', 'precipitation', 'sunshine', but not all of them are provided for every location. A more extensive description of the dataset including the different physical units is given in accompanying metadata file. The full dataset comprises of 10 years (3654 days) of collected weather data across Europe.
 
-![European locations in the weather prediction dataset](../fig/03_weather_prediction_dataset_map.png){alt='18 European locations in the weather prediction dataset'}
+![European locations in the weather prediction dataset](fig/03_weather_prediction_dataset_map.png){alt='18 European locations in the weather prediction dataset'}
 
  A very common task with weather data is to make a prediction about the weather sometime in the future, say the next day. In this episode, we will try to predict tomorrow's sunshine hours, a challenging-to-predict feature, using a neural network with the available weather data for one location: BASEL.
 
@@ -249,7 +249,7 @@ Then, we update the weight by taking a small step in the direction of the negati
 This will slightly decrease the loss. This process is repeated until the loss function reaches a minimum.
 The size of the step that is taken in each iteration is called the 'learning rate'.
 
-![](../fig/03_gradient_descent.png){alt='Plot of the loss as a function of the weights. Through gradient descent the global loss minimum is found'}
+![](fig/03_gradient_descent.png){alt='Plot of the loss as a function of the weights. Through gradient descent the global loss minimum is found'}
 
 ### Batch gradient descent
 You could use the entire training dataset to perform one learning step in gradient descent,
@@ -388,7 +388,7 @@ def plot_history(history, metrics):
 plot_history(history, 'root_mean_squared_error')
 ```
 
-![](../fig/03_training_history_1_rmse.png){alt='Plot of the RMSE over epochs for the trained model that shows a decreasing error metric'}
+![](fig/03_training_history_1_rmse.png){alt='Plot of the RMSE over epochs for the trained model that shows a decreasing error metric'}
 
 This looks very promising! Our metric ("RMSE") is dropping nicely and while it maybe keeps fluctuating a bit it does end up at fairly low *RMSE* values.
 But the *RMSE* is just the root *mean* squared error, so we might want to look a bit more in detail how well our just trained model does in predicting the sunshine hours.
@@ -421,12 +421,12 @@ def plot_predictions(y_pred, y_true, title):
 plot_predictions(y_train_predicted, y_train, title='Predictions on the training set')
 ```
 
-![](../fig/03_regression_predictions_trainset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the train set showing a concise spread'}
+![](fig/03_regression_predictions_trainset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the train set showing a concise spread'}
 
 ```python
 plot_predictions(y_test_predicted, y_test, title='Predictions on the test set')
 ```
-![](../fig/03_regression_predictions_testset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the test set showing a wide spread'}
+![](fig/03_regression_predictions_testset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the test set showing a wide spread'}
 
 ::: challenge
 ## Exercise: Reflecting on our results
@@ -489,7 +489,7 @@ y_baseline_prediction = X_test['BASEL_sunshine']
 plot_predictions(y_baseline_prediction, y_test, title='Baseline predictions on the test set')
 ```
 
-![](../fig/03_regression_test_5_naive_baseline.png){alt="Scatter plot of predicted vs true sunshine hours in Basel for the test set where today's sunshine hours is considered as the true sunshine hours for tomorrow"}
+![](fig/03_regression_test_5_naive_baseline.png){alt="Scatter plot of predicted vs true sunshine hours in Basel for the test set where today's sunshine hours is considered as the true sunshine hours for tomorrow"}
 
 It is difficult to interpret from this plot whether our model is doing better than the baseline.
 We can also have a look at the RMSE:
@@ -557,7 +557,7 @@ With this we can plot both the performance on the training data and on the valid
 plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error'])
 ```
 
-![](../fig/03_training_history_2_rmse.png){alt='Plot of RMSE vs epochs for the training set and the validation set which depicts a divergence between the two around 10 epochs.'}
+![](fig/03_training_history_2_rmse.png){alt='Plot of RMSE vs epochs for the training set and the validation set which depicts a divergence between the two around 10 epochs.'}
 
 ::: challenge
 ## Exercise: plot the training progress.
@@ -646,7 +646,7 @@ history = model.fit(X_train, y_train,
 plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error'])
 ```
 
-![](../fig/03_training_history_3_rmse_smaller_model.png){alt='Plot of RMSE vs epochs for the training set and the validation set with similar performance across the two sets.'}
+![](fig/03_training_history_3_rmse_smaller_model.png){alt='Plot of RMSE vs epochs for the training set and the validation set with similar performance across the two sets.'}
 
 1. With this smaller model we have reduced overfitting a bit, since the training and validation loss are now closer to each other, and the validation loss does now reach a plateau and does not further increase.
 We have not completely avoided overfitting though. 
@@ -695,7 +695,7 @@ As before, we can plot the losses during training:
 plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error'])
 ```
 
-![](../fig/03_training_history_3_rmse_early_stopping.png){alt='Plot of RMSE vs epochs for the training set and the validation set displaying similar performance across the two sets.'}
+![](fig/03_training_history_3_rmse_early_stopping.png){alt='Plot of RMSE vs epochs for the training set and the validation set displaying similar performance across the two sets.'}
 
 This still seems to reveal the onset of overfitting, but the training stops before the discrepancy between training and validation loss can grow further.
 Despite avoiding severe cases of overfitting, early stopping has the additional advantage that the number of training epochs will be regulated automatically.
@@ -772,7 +772,7 @@ history = model.fit(X_train, y_train,
 plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error'])
 ```
 
-![](../fig/03_training_history_5_rmse_batchnorm.png){alt='Output of plotting sample'}
+![](fig/03_training_history_5_rmse_batchnorm.png){alt='Output of plotting sample'}
 
 ::: callout
 ## Batchnorm parameters
@@ -794,7 +794,7 @@ y_test_predicted = model.predict(X_test)
 plot_predictions(y_test_predicted, y_test, title='Predictions on the test set')
 ```
 
-![](../fig/03_regression_test_5_dropout_batchnorm.png){alt='Scatter plot between predictions and true sunshine hours for Basel on the test set'}
+![](fig/03_regression_test_5_dropout_batchnorm.png){alt='Scatter plot between predictions and true sunshine hours for Basel on the test set'}
 
 Well, the above is certainly not perfect. But how good or bad is this? Maybe not good enough to plan your picnic for tomorrow.
 But let's better compare it to the naive baseline we created in the beginning. What would you say, did we improve on that?
@@ -876,7 +876,7 @@ Create a scatter plot to compare with true observations:
 y_test_predicted = model.predict(X_test)
 plot_predictions(y_test_predicted, y_test, title='Predictions on the test set')
 ```
-![](../fig/03_scatter_plot_basel_model.png){alt='Scatterplot of predictions and true number of sunshine hours'}
+![](fig/03_scatter_plot_basel_model.png){alt='Scatterplot of predictions and true number of sunshine hours'}
 
 
 Compute the RMSE on the test set:
@@ -939,7 +939,7 @@ You can launch the tensorboard interface from a Jupyter notebook, showing all tr
 %tensorboard --logdir logs/fit
 ```
 Which will show an interface that looks something like this:
-![](../fig/03_tensorboard.png){alt='Screenshot of tensorboard'}
+![](fig/03_tensorboard.png){alt='Screenshot of tensorboard'}
 :::
 
 ## 10. Save model