carpentries-incubator · cforgaci · Jan 29, 2024 · Oct 30, 2023 · Oct 31, 2023 · Jan 15, 2024
diff --git a/episodes/09-open-and-plot-vector-layers.Rmd b/episodes/09-open-and-plot-vector-layers.Rmd
@@ -12,7 +12,9 @@
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Know the difference between point, line, and polygon vector data.
+After completing this episode, participants should be able to…
+
+- Differentiate between point, line, and polygon vector data.
 - Load vector data into R.
 - Access the attributes of a vector object in R.
 
@@ -21,61 +23,66 @@
 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor
 
 Make sure that the `sf` package and its dependencies are installed before the 
-workshop. The installation can take quite some time, so allocate enough extra
-time before the workshop for solving installation problems. We recommend one
-or two installation 'walk-in' hours on a day before the workshop and 15-30 
+workshop. The installation can be lengthy, so allocate enough extra time 
+before the workshop for solving installation problems. We recommend one
+or two installation 'walk-in' hours on a day before the workshop. Also, 15-30 
 minutes at the beginning of the first workshop day should be enough to tackle 
-installation issues.
+last-minute installation issues.
+
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 
 ::: prereq
 
-If you have not installed the `sf` package yet, run `install.packages("sf")` first. Note that the `sf` package has some external dependencies, namely GEOS, PROJ.4, GDAL and UDUNITS, which need to be installed beforehand. Follow the workshop [setup instructions]() for the installation of `sf` and its dependencies.
+In this lesson you will work with the `sf` package. Note that the `sf` package has some external dependencies, namely GEOS, PROJ.4, GDAL and UDUNITS, which need to be installed beforehand. Before starting the lesson, follow the workshop [setup instructions](../learners/setup.md) for the installation of `sf` and its dependencies.
 
 :::
 
-First we need to load the packages we will use in this lesson. We will use the packages `tidyverse` and `here` with which you are already familiar from the previous lesson. In addition, we need to load the [`sf`](https://r-spatial.github.io/sf/) package for working with spatial vector data. 
+First we need to load the packages we will use in this lesson. We will use the `tidyverse` package with which you are already familiar from the previous lesson. In addition, we need to load the [`sf`](https://r-spatial.github.io/sf/) package for working with spatial vector data. 
 
 ```{r lib, message=FALSE}
-library(tidyverse)  # tools for wrangling, reshaping and visualizing data
-library(here)       # managing paths
+library(tidyverse)  # wrangle, reshape and visualize data
 library(sf)         # work with spatial vector data
 ```
 
 ::: callout
 
 # The `sf` package
 
-`sf` stands for Simple Features which is a standard defined by the Open Geospatial Consortium for storing and accessing geospatial vector data. PostGIS uses the same standard; so those of you who used PostGIS, might find some resemblances with the functions used by the `sf` package. 
+`sf` stands for Simple Features which is a standard defined by the Open Geospatial Consortium for storing and accessing geospatial vector data. PostGIS, for instance, uses the same standard, so PostGIS users might recognise the functions used by the `sf` package. 
 
 :::
 
 ## Import shapefiles
 
-Let's start by opening a shapefile. Shapefiles a common file format to store spatial vector data used in GIS software. We will read a shapefile with the administrative boundary of Delft with the function `st_read()` from the `sf` package. 
+Let's start by opening a shapefile. Shapefiles are a common file format to store spatial vector data used in GIS software. Note that a shapefile consists of multiple files and it is important to keep them all together in the same location. We will read a shapefile with the administrative boundary of Delft with the function `st_read()` from the `sf` package. 
 
-```{r results='hide'}
-boundary_Delft <- st_read("data/delft-boundary.shp")
+```{r}
+boundary_Delft <- st_read("data/delft-boundary.shp", quiet = TRUE)
 ```
 
 ::: callout
 
 # All `sf` functions start with `st_`
 
-Note that all functions from the `sf` package start with the standard prefix `st_` which stands for Spatial Type. This is helpful in at least two ways: (1) it  makes the interaction with or translation to/from software using the simple features standard like PostGIS easy, and (2) it allows for easy autocompletion of function names in RStudio.
+Note that all functions from the `sf` package start with the standard prefix `st_` which stands for Spatial Type. This is helpful in at least two ways:  
+
+1. it allows for easy autocompletion of function names in RStudio, and
+2. it makes the interaction with or translation to/from software using the simple features standard like PostGIS easy.
 
 :::
 
 ## Spatial Metadata
 
-The `st_read()` function gave us a message with a summary of metadata about the file that was read in. To examine the metadata in more detail, we can use other, more specialised, functions from the `sf` package. The `st_geometry_type()` function, for instance, gives us information about the geometry type, which in this case is `POLYGON`.
+By default (with `quiet = FALSE`), the `st_read()` function provides a message with a summary of metadata about the file that was read in. To examine the metadata in more detail, we can use other, more specialised, functions from the `sf` package.  
+
+The `st_geometry_type()` function, for instance, gives us information about the geometry type, which in this case is `POLYGON`.
 
 ```{r}
 st_geometry_type(boundary_Delft)
 ```
 
-The `st_crs()` function returns the coordinate reference system (CRS) used by the shapefile, which in this case is `WGS84` and has the unique reference code `EPSG: 4326`. 
+The `st_crs()` function returns the coordinate reference system (CRS) used by the shapefile, which in this case is ``r st_crs(boundary_Delft)$Name`` and has the unique reference code ``r paste0("EPSG: ", st_crs(boundary_Delft)$epsg)``. 
 
 ```{r}
 st_crs(boundary_Delft)
@@ -89,23 +96,27 @@
 
 :::
 
-The `st_bbox()` function shows the extent of the layer. As `WGS84` is a **geographic CRS**, the extent of the shapefile is displayed in degrees.
+The `st_bbox()` function shows the extent of the layer. As `WGS 84` is a **geographic CRS**, the extent of the shapefile is displayed in degrees.
+
 
 ```{r}
 st_bbox(boundary_Delft)
 ```
 
 We need a **projected CRS**, which in the case of the Netherlands is typically the Amersfort / RD New projection. To reproject our shapefile, we will use the `st_transform()` function. For the `crs` argument we can use the EPSG code of the CRS we want to use, which is `28992` for the `Amersfort / RD New` projection. 
+To check the EPSG code of any CRS, we can check this website: https://epsg.io/
 
 ```{r}
 boundary_Delft <- st_transform(boundary_Delft, 28992)
-st_crs(boundary_Delft)
+st_crs(boundary_Delft)$Name
+st_crs(boundary_Delft)$epsg
 ```
 
 Notice that the bounding box is measured in meters after the transformation.
 
 ```{r}
 st_bbox(boundary_Delft)
+st_crs(boundary_Delft)$units_gdal
 ```
 
 We confirm the transformation by examining the reprojected shapefile.
@@ -116,30 +127,28 @@
 
 ::: callout
 
-More about CRS in [Handling Spatial Projection & CRS]().
+Read more about Coordinate Reference Systems [here](https://datacarpentry.org/organization-geospatial/03-crs.html). We will also practice transformation between CRS in [Handling Spatial Projection & CRS](../episodes/12-handling-spatial-projection-and-crs.Rmd).
 
 :::
 
 
 
 ## Plot a vector layer
 
-Now, let's plot this shapefile. You are already familiar with the `ggplot2` package from [Introduction to Visualisation](). `ggplot2` has special `geom_` functions for spatial data. We will use the `geom_sf()` function for `sf` data.
+Now, let's plot this shapefile. You are already familiar with the `ggplot2` package from [Introduction to Visualisation](../episodes/04-intro-to-visualisation.Rmd). `ggplot2` has special `geom_` functions for spatial data. We will use the `geom_sf()` function for `sf` data.
 
 ```{r}
-
 ggplot(data = boundary_Delft) +
   geom_sf(size = 3, color = "black", fill = "cyan1") +
   labs(title = "Delft Administrative Boundary") +
   coord_sf(datum = st_crs(28992))  # this is needed to display the axes in meters
-
 ```
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
-### Challenge 1: Import line and point vector layers
+### Challenge: Import line and point vector layers
 
-Read in `delft-streets.shp` and `delft-leisure.shp` and assign them to `lines_Delft` and `point_Delft` respectively. Answer the following questions:
+Read in `delft-streets.shp` and `delft-leisure.shp` and assign them to `lines_Delft` and `points_Delft` respectively. Answer the following questions:
 
 1. What type of R spatial object is created when you import each layer?
 2. What is the CRS and extent for each object?
@@ -150,28 +159,28 @@
 
 ```{r results='hide'}
 lines_Delft <- st_read("data/delft-streets.shp")
-point_Delft <- st_read("data/delft-leisure.shp")
+points_Delft <- st_read("data/delft-leisure.shp")
 ```
 
-We can check the type of data with the `class()` function from base R. Both `lines_Delft` and `point_Delft` are objects of class `"sf"`, which extends the `"data.frame"` class. 
+We can check the type of type of geometry with the `st_geometry_type()` function. `lines_Delft` contains `"LINESTRING"` geometry and `points_Delft` is made of `"POINT"` geometries. 
 
 ```{r}
-class(lines_Delft)
-class(point_Delft)
+st_geometry_type(lines_Delft)[1]
+st_geometry_type(points_Delft)[2]
 ```
 
-`lines_Delft` and `point_Delft` are in the correct CRS.
+`lines_Delft` and `points_Delft` are in the correct CRS.
 
 ```{r}
 st_crs(lines_Delft)$epsg
-st_crs(point_Delft)$epsg
+st_crs(points_Delft)$epsg
 ```
 
-When looking at the bounding boxes with the `st_bbox()` function, we see the spatial extent of the two objects in a projected CRS using meters as units. `lines_Delft()` and `point_Delft` have similar extents.
+When looking at the bounding boxes with the `st_bbox()` function, we see the spatial extent of the two objects in a projected CRS using meters as units. `lines_Delft()` and `points_Delft` have similar extents.
 
 ```{r}
 st_bbox(lines_Delft)
-st_bbox(point_Delft)
+st_bbox(points_Delft)
 ```
 
 :::::::::::::::::::::::::::::::::
@@ -184,6 +193,6 @@
 
 - Metadata for vector layers include geometry type, CRS, and extent.
 - Load spatial objects into R with the `st_read()` function.
-- Spatial objects can be plotted directly with `ggplot` using the `geom_sf()` function. No need to convert to a data frame.
+- Spatial objects can be plotted directly with `ggplot2` using the `geom_sf()` function. No need to convert to a data frame.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/10-explore-and-plot-by-vector-layer-attributes.Rmd b/episodes/10-explore-and-plot-by-vector-layer-attributes.Rmd
@@ -1,7 +1,7 @@
 ---
 title: 'Explore and plot by vector layer attributes'
-teaching: 20
-exercises: 10
+teaching: 30
+exercises: 20
 ---
 
 ```{r setup, include=FALSE}
@@ -23,6 +23,8 @@ point_Delft <- st_read("data/delft-leisure.shp")
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
+After completing this episode, participants should be able to…
+
 - Query attributes of a vector object.
 
 - Subset vector objects using specific attribute values.
@@ -60,10 +62,10 @@ attribute table.
 
 :::
 
-We can also preview the content of the object by looking at the first 6 rows with the `head()` function, which is in the case of an `sf` object the same as examining the object directly.
+We can also preview the content of the object by looking at the first 6 rows with the `head()` function, which in the case of an `sf` object is similar to examining the object directly.
 
 ```{r}
-head(lines_Delft)
+head (lines_Delft)
 ```
 
 
@@ -90,6 +92,7 @@ R is also able to handle categorical variables called factors. With factors, we
 ```{r}
 levels(factor(lines_Delft$highway))
 ```
+Note that this way the values are shown by default in alphabetical order and `NA`s are not displayed, whereas using `unique()` returns unique values in the order of their occurrence in the data frame and it also shows `NA` values.
 
 ::::::::::::::::::::::::::::::::
 
@@ -129,16 +132,16 @@ head(point_Delft, 10)  # you might be lucky to see three distinct values
 # point_Delft
 ```
 
-However, this is not a good approach as the first rows might still have many `NA`s and three distinct values might still not be present in the first `n` rows of the data frame. To remove `NA`s, we can use the function `na.omit()` on the leisure column to remove `NA`s completely. Note that we use the `$` operator to examine the content of a single variable.
+We have our answer (`sports_centre` is the third value), but in general this is not a good approach as the first rows might still have many `NA`s and three distinct values might still not be present in the first `n` rows of the data frame. To remove `NA`s, we can use the function `na.omit()` on the leisure column to remove `NA`s completely. Note that we use the `$` operator to examine the content of a single variable.
 
 ```{r}
 head(na.omit(point_Delft$leisure))  # this is better
 ```
 
-To show only unique values, we can use the `unique()` function to only see the first occurrence of each distinct value (note that `NA` will still be included once).
+To show only unique values, we can use the `levels()` function on a factor to only see the first occurrence of each distinct value. Note `NA`s are dropped in this case and that we get the first three of the unique alphabetically ordered values.
 
 ```{r}
-head(unique(point_Delft$leisure), n = 3)   # this is even better
+head(levels(factor(point_Delft$leisure)), n = 3)   # this is even better
 ```
 
 3. To see a list of all attribute names, we can use the `names()` function.
@@ -190,6 +193,8 @@ ggplot(data = cycleway_Delft) +
 ::: challenge
 <!-- 7 minutes -->
 
+Challenge: Now with motorways (and pedestrian streets)
+
 1. Create a new object that only contains the motorways in Delft. 
 2. How many features does the new object have?
 3. What is the total length of motorways?
@@ -227,7 +232,7 @@ motorway_Delft_length <- motorway_Delft %>%
 nrow(motorway_Delft)
 ```
 
-4. Plot the motorways
+4. Plot the motorways.
 
 ```{r}
 ggplot(data = motorway_Delft) + 
@@ -236,7 +241,7 @@ ggplot(data = motorway_Delft) +
   coord_sf(datum = st_crs(28992))
 ```
 
-5. Same steps with pedestrian streets
+5. Follow the same steps with pedestrian streets.
 
 ```{r}
 pedestrian_Delft <- lines_Delft %>% 
@@ -269,7 +274,7 @@ Let's say that we want to color different road types with different colors and t
 unique(lines_Delft$highway)
 ```
 
-If we look at all the unique values of the highway field of our street network we see more than 20 values. Let's focus on a subset of four values to illustrate the use of distinct colors. We use a piped expression in which we only filter the rows of our data frame that have one of the four given values `"motorway"`, `"primary"`, `"secondary"`, and `"cycleway"`. We also make sure that the highway column is a factor column.
+If we look at all the unique values of the highway field of our street network we see more than 20 values. Let's focus on a subset of four values to illustrate the use of distinct colors. We use a piped expression in which we only filter the rows of our data frame that have one of the four given values `"motorway"`, `"primary"`, `"secondary"`, and `"cycleway"`. Note that we do this with the `%in` operator which is a more compact equivalent of a series of conditions joined by the `|` (or) operator. We also make sure that the highway column is a factor column.
 
 ```{r}
 road_types <- c("motorway", "primary", "secondary", "cycleway")
@@ -279,7 +284,7 @@ lines_Delft_selection <- lines_Delft %>%
   mutate(highway = factor(highway, levels = road_types))
 ```
 
-Next we define the four colors we want to use, one for each type of road in our vector object. Note that in R you can use named colors like `"blue"`, `"green"`, `"navy"`, and `"purple"`. A full list of named colors can be listed with the `colors()` function.
+Next we define the four colors we want to use, one for each type of road in our vector object. Note that in R you can use named colors like `"blue"`, `"green"`, `"navy"`, and `"purple"`. If you are using RStudio, you will see the named colors previewed in line. A full list of named colors can be listed with the `colors()` function.
 
 ```{r}
 road_colors <- c("blue", "green", "navy", "purple")
@@ -319,7 +324,7 @@ ggplot(data = lines_Delft_selection) +
 
 ::: challenge
 
-# Plot line width by attribute
+# Challenge: Plot line width by attribute
 
 In the example above, we set the line widths to be 1, 0.75, 0.5, and 0.25. In our case line thicknesses are consistent with the hierarchy of the selected road types, but in some cases we might want to show a different hierarchy.
 
@@ -359,44 +364,26 @@ ggplot(data = lines_Delft_selection) +
 Let’s add a legend to our plot. We will use the `road_colors` object that we created above to color the legend. We can customize the appearance of our legend by manually setting different parameters.
 
 ```{r fig.cap="Mobility network in Delft using thicker lines than the previous example."}
-ggplot(data = lines_Delft_selection) + 
+p1 <- ggplot(data = lines_Delft_selection) + 
   geom_sf(aes(color = highway), linewidth = 1.5) +
   scale_color_manual(values = road_colors) +
   labs(color = 'Road Type') + 
   labs(title = "Mobility network of Delft", 
        subtitle = "Roads & Cycleways - Default Legend") + 
   coord_sf(datum = st_crs(28992))
+p1
 ```
 
 ```{r fig.cap="Map of the mobility network in Delft with large-font and border around the legend."}
-ggplot(data = lines_Delft_selection) + 
-  geom_sf(aes(color = highway), linewidth = 1.5) +
-  scale_color_manual(values = road_colors) + 
-  labs(color = 'Road Type') +
-  theme(legend.text = element_text(size = 20), 
-        legend.box.background = element_rect(size = 1)) + 
-  labs(title = "Mobility network of Delft", 
-       subtitle = "Roads & Cycleways - Modified Legend") +
-  coord_sf(datum = st_crs(28992))
-```
-
-```{r fig.cap="Map of the mobility network in Delft using a different color palette."}
-new_colors <- c("springgreen", "blue", "magenta", "orange")
-
-ggplot(data = lines_Delft_selection) + 
-  geom_sf(aes(color = highway), linewidth = 1.5) + 
-  scale_color_manual(values = new_colors) +
-  labs(color = 'Road Type') +
+p2 <- p1 +
   theme(legend.text = element_text(size = 20), 
-        legend.box.background = element_rect(size = 1)) + 
-  labs(title = "Mobility network of Delft", 
-       subtitle = "Roads & Cycleways - Modified Legend") +
-  coord_sf(datum = st_crs(28992))
+        legend.box.background = element_rect(size = 1))
+p2
 ```
 
 ::: challenge
 
-# Plot lines by attributes
+# Challenge: Plot lines by attributes
 <!-- 5 minutes -->
 
 Create a plot that emphasizes only roads where bicycles are allowed. To emphasize this, make the lines where bicycles are not allowed THINNER than the roads where bicycles are allowed. Be sure to add a title and legend to your map. You might consider a color palette that has all bike-friendly roads displayed in a bright color. All other lines can be black.
@@ -431,15 +418,15 @@ ggplot(data = lines_Delft) +
 
 ::: challenge
 
-# Plot polygon by attribute
+# Challenge: Plot polygon by attribute
 <!-- 5 minutes -->
 
 Create a map of the municipal boundaries in the Netherlands using the data located in your data folder: `nl-gemeenten.shp`. Apply a line color to each state using its region value. Add a legend.
 
 ::: solution
 
 ```{r}
-municipal_boundaries_NL <- st_read(here("data", "nl-gemeenten.shp"))
+municipal_boundaries_NL <- st_read("data/nl-gemeenten.shp")
 ```
 
 ```{r}