Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 2 working with vector data #1

Merged
merged 23 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
749a200
Remove here from episode 9
cforgaci Oct 30, 2023
cc6b095
Remove example lesson
cforgaci Oct 31, 2023
92b3d82
Add missing link in episodes/09-open-and-plot-vector-layers.Rmd
cforgaci Jan 15, 2024
87af22a
Fix grammar in episodes/09-open-and-plot-vector-layers.Rmd
cforgaci Jan 15, 2024
51c6005
Make numbered list to improve readability
cforgaci Jan 15, 2024
9f4d210
Add missing link in episodes/11-plot-multiple-shape-files.Rmd
cforgaci Jan 15, 2024
6b6ff74
Add missing link in episodes/11-plot-multiple-shape-files.Rmd
cforgaci Jan 15, 2024
b32f72c
Revise episode 9
cforgaci Jan 15, 2024
825152a
Update wording of objectives in episodes/11-plot-multiple-shape-files…
cforgaci Jan 18, 2024
3ebbaa6
Add link to epsg.io
cforgaci Jan 18, 2024
98fc642
Update wording of objectives in episodes/12-handling-spatial-projecti…
cforgaci Jan 18, 2024
ab4b998
Explain `driver` argument in `st_write()`
cforgaci Jan 18, 2024
b0a222d
Apply Ana's suggestions from code review
cforgaci Jan 22, 2024
7db3b32
Update phrasing of opbjectives in episodes/11-plot-multiple-shape-fil…
cforgaci Jan 22, 2024
87b1323
Merge branch 'main' into 2-vector-cf
cforgaci Jan 22, 2024
07f818f
Change question to first-person singular in episodes/09-open-and-plot…
cforgaci Jan 22, 2024
5226f7e
Add explanation about shapefiles as suggested by Ana
cforgaci Jan 22, 2024
e7e240c
Initialise episode on geospatial concepts
Jan 23, 2024
46e131e
Update vector episodes
cforgaci Jan 24, 2024
ce9d883
Add vector slides
cforgaci Jan 29, 2024
bc5a446
Update vector slides
cforgaci Jan 29, 2024
5c47460
Merge branch 'main' into 2-vector-cf
cforgaci Jan 29, 2024
a17f609
remove usage of emo package
cforgaci Jan 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 42 additions & 33 deletions episodes/09-open-and-plot-vector-layers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@

::::::::::::::::::::::::::::::::::::: objectives

- Know the difference between point, line, and polygon vector data.
After completing this episode, participants should be able to…

- Differentiate between point, line, and polygon vector data.
- Load vector data into R.
- Access the attributes of a vector object in R.

Expand All @@ -21,61 +23,66 @@
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Make sure that the `sf` package and its dependencies are installed before the
workshop. The installation can take quite some time, so allocate enough extra
time before the workshop for solving installation problems. We recommend one
or two installation 'walk-in' hours on a day before the workshop and 15-30
workshop. The installation can be lengthy, so allocate enough extra time
before the workshop for solving installation problems. We recommend one
or two installation 'walk-in' hours on a day before the workshop. Also, 15-30
minutes at the beginning of the first workshop day should be enough to tackle
installation issues.
last-minute installation issues.


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::: prereq

If you have not installed the `sf` package yet, run `install.packages("sf")` first. Note that the `sf` package has some external dependencies, namely GEOS, PROJ.4, GDAL and UDUNITS, which need to be installed beforehand. Follow the workshop [setup instructions]() for the installation of `sf` and its dependencies.
In this lesson you will work with the `sf` package. Note that the `sf` package has some external dependencies, namely GEOS, PROJ.4, GDAL and UDUNITS, which need to be installed beforehand. Before starting the lesson, follow the workshop [setup instructions](../learners/setup.md) for the installation of `sf` and its dependencies.

:::

First we need to load the packages we will use in this lesson. We will use the packages `tidyverse` and `here` with which you are already familiar from the previous lesson. In addition, we need to load the [`sf`](https://r-spatial.github.io/sf/) package for working with spatial vector data.
First we need to load the packages we will use in this lesson. We will use the `tidyverse` package with which you are already familiar from the previous lesson. In addition, we need to load the [`sf`](https://r-spatial.github.io/sf/) package for working with spatial vector data.

```{r lib, message=FALSE}
library(tidyverse) # tools for wrangling, reshaping and visualizing data
library(here) # managing paths
library(tidyverse) # wrangle, reshape and visualize data
library(sf) # work with spatial vector data
```

::: callout

# The `sf` package

`sf` stands for Simple Features which is a standard defined by the Open Geospatial Consortium for storing and accessing geospatial vector data. PostGIS uses the same standard; so those of you who used PostGIS, might find some resemblances with the functions used by the `sf` package.
`sf` stands for Simple Features which is a standard defined by the Open Geospatial Consortium for storing and accessing geospatial vector data. PostGIS, for instance, uses the same standard, so PostGIS users might recognise the functions used by the `sf` package.

:::

## Import shapefiles

Let's start by opening a shapefile. Shapefiles a common file format to store spatial vector data used in GIS software. We will read a shapefile with the administrative boundary of Delft with the function `st_read()` from the `sf` package.
Let's start by opening a shapefile. Shapefiles are a common file format to store spatial vector data used in GIS software. Note that a shapefile consists of multiple files and it is important to keep them all together in the same location. We will read a shapefile with the administrative boundary of Delft with the function `st_read()` from the `sf` package.

```{r results='hide'}
boundary_Delft <- st_read("data/delft-boundary.shp")
```{r}
boundary_Delft <- st_read("data/delft-boundary.shp", quiet = TRUE)
cforgaci marked this conversation as resolved.
Show resolved Hide resolved
```

::: callout

# All `sf` functions start with `st_`
cforgaci marked this conversation as resolved.
Show resolved Hide resolved

Note that all functions from the `sf` package start with the standard prefix `st_` which stands for Spatial Type. This is helpful in at least two ways: (1) it makes the interaction with or translation to/from software using the simple features standard like PostGIS easy, and (2) it allows for easy autocompletion of function names in RStudio.
Note that all functions from the `sf` package start with the standard prefix `st_` which stands for Spatial Type. This is helpful in at least two ways:

1. it allows for easy autocompletion of function names in RStudio, and
2. it makes the interaction with or translation to/from software using the simple features standard like PostGIS easy.

:::

## Spatial Metadata

The `st_read()` function gave us a message with a summary of metadata about the file that was read in. To examine the metadata in more detail, we can use other, more specialised, functions from the `sf` package. The `st_geometry_type()` function, for instance, gives us information about the geometry type, which in this case is `POLYGON`.
By default (with `quiet = FALSE`), the `st_read()` function provides a message with a summary of metadata about the file that was read in. To examine the metadata in more detail, we can use other, more specialised, functions from the `sf` package.

The `st_geometry_type()` function, for instance, gives us information about the geometry type, which in this case is `POLYGON`.

```{r}
st_geometry_type(boundary_Delft)
```

The `st_crs()` function returns the coordinate reference system (CRS) used by the shapefile, which in this case is `WGS84` and has the unique reference code `EPSG: 4326`.
The `st_crs()` function returns the coordinate reference system (CRS) used by the shapefile, which in this case is ``r st_crs(boundary_Delft)$Name`` and has the unique reference code ``r paste0("EPSG: ", st_crs(boundary_Delft)$epsg)``.

```{r}
st_crs(boundary_Delft)
Expand All @@ -89,23 +96,27 @@

:::

The `st_bbox()` function shows the extent of the layer. As `WGS84` is a **geographic CRS**, the extent of the shapefile is displayed in degrees.
The `st_bbox()` function shows the extent of the layer. As `WGS 84` is a **geographic CRS**, the extent of the shapefile is displayed in degrees.


```{r}
st_bbox(boundary_Delft)
```

We need a **projected CRS**, which in the case of the Netherlands is typically the Amersfort / RD New projection. To reproject our shapefile, we will use the `st_transform()` function. For the `crs` argument we can use the EPSG code of the CRS we want to use, which is `28992` for the `Amersfort / RD New` projection.
cforgaci marked this conversation as resolved.
Show resolved Hide resolved
cforgaci marked this conversation as resolved.
Show resolved Hide resolved
To check the EPSG code of any CRS, we can check this website: https://epsg.io/

```{r}
boundary_Delft <- st_transform(boundary_Delft, 28992)
st_crs(boundary_Delft)
st_crs(boundary_Delft)$Name
st_crs(boundary_Delft)$epsg
```

Notice that the bounding box is measured in meters after the transformation.

```{r}
st_bbox(boundary_Delft)
st_crs(boundary_Delft)$units_gdal
```

We confirm the transformation by examining the reprojected shapefile.
Expand All @@ -116,30 +127,28 @@

::: callout

More about CRS in [Handling Spatial Projection & CRS]().
Read more about Coordinate Reference Systems [here](https://datacarpentry.org/organization-geospatial/03-crs.html). We will also practice transformation between CRS in [Handling Spatial Projection & CRS](../episodes/12-handling-spatial-projection-and-crs.Rmd).

Check warning on line 130 in episodes/09-open-and-plot-vector-layers.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://datacarpentry.org/organization-geospatial/03-crs.html)
cforgaci marked this conversation as resolved.
Show resolved Hide resolved

:::



## Plot a vector layer

Now, let's plot this shapefile. You are already familiar with the `ggplot2` package from [Introduction to Visualisation](). `ggplot2` has special `geom_` functions for spatial data. We will use the `geom_sf()` function for `sf` data.
Now, let's plot this shapefile. You are already familiar with the `ggplot2` package from [Introduction to Visualisation](../episodes/04-intro-to-visualisation.Rmd). `ggplot2` has special `geom_` functions for spatial data. We will use the `geom_sf()` function for `sf` data.

Check warning on line 138 in episodes/09-open-and-plot-vector-layers.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [Introduction to Visualisation](../episodes/04-intro-to-visualisation.Rmd)

```{r}

ggplot(data = boundary_Delft) +
geom_sf(size = 3, color = "black", fill = "cyan1") +
labs(title = "Delft Administrative Boundary") +
coord_sf(datum = st_crs(28992)) # this is needed to display the axes in meters

```

::::::::::::::::::::::::::::::::::::: challenge

### Challenge 1: Import line and point vector layers
### Challenge: Import line and point vector layers

Read in `delft-streets.shp` and `delft-leisure.shp` and assign them to `lines_Delft` and `point_Delft` respectively. Answer the following questions:
Read in `delft-streets.shp` and `delft-leisure.shp` and assign them to `lines_Delft` and `points_Delft` respectively. Answer the following questions:

1. What type of R spatial object is created when you import each layer?
2. What is the CRS and extent for each object?
Expand All @@ -150,28 +159,28 @@

```{r results='hide'}
lines_Delft <- st_read("data/delft-streets.shp")
point_Delft <- st_read("data/delft-leisure.shp")
points_Delft <- st_read("data/delft-leisure.shp")
```

We can check the type of data with the `class()` function from base R. Both `lines_Delft` and `point_Delft` are objects of class `"sf"`, which extends the `"data.frame"` class.
We can check the type of type of geometry with the `st_geometry_type()` function. `lines_Delft` contains `"LINESTRING"` geometry and `points_Delft` is made of `"POINT"` geometries.

```{r}
class(lines_Delft)
class(point_Delft)
st_geometry_type(lines_Delft)[1]
st_geometry_type(points_Delft)[2]
```

`lines_Delft` and `point_Delft` are in the correct CRS.
`lines_Delft` and `points_Delft` are in the correct CRS.

```{r}
st_crs(lines_Delft)$epsg
st_crs(point_Delft)$epsg
st_crs(points_Delft)$epsg
```

When looking at the bounding boxes with the `st_bbox()` function, we see the spatial extent of the two objects in a projected CRS using meters as units. `lines_Delft()` and `point_Delft` have similar extents.
When looking at the bounding boxes with the `st_bbox()` function, we see the spatial extent of the two objects in a projected CRS using meters as units. `lines_Delft()` and `points_Delft` have similar extents.

```{r}
st_bbox(lines_Delft)
st_bbox(point_Delft)
st_bbox(points_Delft)
```

:::::::::::::::::::::::::::::::::
Expand All @@ -184,6 +193,6 @@

- Metadata for vector layers include geometry type, CRS, and extent.
- Load spatial objects into R with the `st_read()` function.
- Spatial objects can be plotted directly with `ggplot` using the `geom_sf()` function. No need to convert to a data frame.
- Spatial objects can be plotted directly with `ggplot2` using the `geom_sf()` function. No need to convert to a data frame.

::::::::::::::::::::::::::::::::::::::::::::::::
63 changes: 25 additions & 38 deletions episodes/10-explore-and-plot-by-vector-layer-attributes.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: 'Explore and plot by vector layer attributes'
teaching: 20
exercises: 10
teaching: 30
exercises: 20
---

```{r setup, include=FALSE}
Expand All @@ -23,6 +23,8 @@ point_Delft <- st_read("data/delft-leisure.shp")

::::::::::::::::::::::::::::::::::::: objectives

After completing this episode, participants should be able to…

- Query attributes of a vector object.
cforgaci marked this conversation as resolved.
Show resolved Hide resolved

- Subset vector objects using specific attribute values.
Expand Down Expand Up @@ -60,10 +62,10 @@ attribute table.

:::

We can also preview the content of the object by looking at the first 6 rows with the `head()` function, which is in the case of an `sf` object the same as examining the object directly.
We can also preview the content of the object by looking at the first 6 rows with the `head()` function, which in the case of an `sf` object is similar to examining the object directly.

```{r}
head(lines_Delft)
head (lines_Delft)
```


Expand All @@ -90,6 +92,7 @@ R is also able to handle categorical variables called factors. With factors, we
```{r}
levels(factor(lines_Delft$highway))
```
Note that this way the values are shown by default in alphabetical order and `NA`s are not displayed, whereas using `unique()` returns unique values in the order of their occurrence in the data frame and it also shows `NA` values.

::::::::::::::::::::::::::::::::

Expand Down Expand Up @@ -129,16 +132,16 @@ head(point_Delft, 10) # you might be lucky to see three distinct values
# point_Delft
```

However, this is not a good approach as the first rows might still have many `NA`s and three distinct values might still not be present in the first `n` rows of the data frame. To remove `NA`s, we can use the function `na.omit()` on the leisure column to remove `NA`s completely. Note that we use the `$` operator to examine the content of a single variable.
We have our answer (`sports_centre` is the third value), but in general this is not a good approach as the first rows might still have many `NA`s and three distinct values might still not be present in the first `n` rows of the data frame. To remove `NA`s, we can use the function `na.omit()` on the leisure column to remove `NA`s completely. Note that we use the `$` operator to examine the content of a single variable.

```{r}
head(na.omit(point_Delft$leisure)) # this is better
```

To show only unique values, we can use the `unique()` function to only see the first occurrence of each distinct value (note that `NA` will still be included once).
To show only unique values, we can use the `levels()` function on a factor to only see the first occurrence of each distinct value. Note `NA`s are dropped in this case and that we get the first three of the unique alphabetically ordered values.

```{r}
head(unique(point_Delft$leisure), n = 3) # this is even better
head(levels(factor(point_Delft$leisure)), n = 3) # this is even better
```

3. To see a list of all attribute names, we can use the `names()` function.
Expand Down Expand Up @@ -190,6 +193,8 @@ ggplot(data = cycleway_Delft) +
::: challenge
<!-- 7 minutes -->

Challenge: Now with motorways (and pedestrian streets)

1. Create a new object that only contains the motorways in Delft.
2. How many features does the new object have?
3. What is the total length of motorways?
Expand Down Expand Up @@ -227,7 +232,7 @@ motorway_Delft_length <- motorway_Delft %>%
nrow(motorway_Delft)
```

4. Plot the motorways
4. Plot the motorways.

```{r}
ggplot(data = motorway_Delft) +
Expand All @@ -236,7 +241,7 @@ ggplot(data = motorway_Delft) +
coord_sf(datum = st_crs(28992))
```

5. Same steps with pedestrian streets
5. Follow the same steps with pedestrian streets.

```{r}
pedestrian_Delft <- lines_Delft %>%
Expand Down Expand Up @@ -269,7 +274,7 @@ Let's say that we want to color different road types with different colors and t
unique(lines_Delft$highway)
```

If we look at all the unique values of the highway field of our street network we see more than 20 values. Let's focus on a subset of four values to illustrate the use of distinct colors. We use a piped expression in which we only filter the rows of our data frame that have one of the four given values `"motorway"`, `"primary"`, `"secondary"`, and `"cycleway"`. We also make sure that the highway column is a factor column.
If we look at all the unique values of the highway field of our street network we see more than 20 values. Let's focus on a subset of four values to illustrate the use of distinct colors. We use a piped expression in which we only filter the rows of our data frame that have one of the four given values `"motorway"`, `"primary"`, `"secondary"`, and `"cycleway"`. Note that we do this with the `%in` operator which is a more compact equivalent of a series of conditions joined by the `|` (or) operator. We also make sure that the highway column is a factor column.

```{r}
road_types <- c("motorway", "primary", "secondary", "cycleway")
Expand All @@ -279,7 +284,7 @@ lines_Delft_selection <- lines_Delft %>%
mutate(highway = factor(highway, levels = road_types))
```

Next we define the four colors we want to use, one for each type of road in our vector object. Note that in R you can use named colors like `"blue"`, `"green"`, `"navy"`, and `"purple"`. A full list of named colors can be listed with the `colors()` function.
Next we define the four colors we want to use, one for each type of road in our vector object. Note that in R you can use named colors like `"blue"`, `"green"`, `"navy"`, and `"purple"`. If you are using RStudio, you will see the named colors previewed in line. A full list of named colors can be listed with the `colors()` function.

```{r}
road_colors <- c("blue", "green", "navy", "purple")
Expand Down Expand Up @@ -319,7 +324,7 @@ ggplot(data = lines_Delft_selection) +

::: challenge

# Plot line width by attribute
# Challenge: Plot line width by attribute

In the example above, we set the line widths to be 1, 0.75, 0.5, and 0.25. In our case line thicknesses are consistent with the hierarchy of the selected road types, but in some cases we might want to show a different hierarchy.

Expand Down Expand Up @@ -359,44 +364,26 @@ ggplot(data = lines_Delft_selection) +
Let’s add a legend to our plot. We will use the `road_colors` object that we created above to color the legend. We can customize the appearance of our legend by manually setting different parameters.

```{r fig.cap="Mobility network in Delft using thicker lines than the previous example."}
ggplot(data = lines_Delft_selection) +
p1 <- ggplot(data = lines_Delft_selection) +
geom_sf(aes(color = highway), linewidth = 1.5) +
scale_color_manual(values = road_colors) +
labs(color = 'Road Type') +
labs(title = "Mobility network of Delft",
subtitle = "Roads & Cycleways - Default Legend") +
coord_sf(datum = st_crs(28992))
p1
```

```{r fig.cap="Map of the mobility network in Delft with large-font and border around the legend."}
ggplot(data = lines_Delft_selection) +
geom_sf(aes(color = highway), linewidth = 1.5) +
scale_color_manual(values = road_colors) +
labs(color = 'Road Type') +
theme(legend.text = element_text(size = 20),
legend.box.background = element_rect(size = 1)) +
labs(title = "Mobility network of Delft",
subtitle = "Roads & Cycleways - Modified Legend") +
coord_sf(datum = st_crs(28992))
```

```{r fig.cap="Map of the mobility network in Delft using a different color palette."}
new_colors <- c("springgreen", "blue", "magenta", "orange")

ggplot(data = lines_Delft_selection) +
geom_sf(aes(color = highway), linewidth = 1.5) +
scale_color_manual(values = new_colors) +
labs(color = 'Road Type') +
p2 <- p1 +
theme(legend.text = element_text(size = 20),
legend.box.background = element_rect(size = 1)) +
labs(title = "Mobility network of Delft",
subtitle = "Roads & Cycleways - Modified Legend") +
coord_sf(datum = st_crs(28992))
legend.box.background = element_rect(size = 1))
p2
```

::: challenge

# Plot lines by attributes
# Challenge: Plot lines by attributes
<!-- 5 minutes -->

Create a plot that emphasizes only roads where bicycles are allowed. To emphasize this, make the lines where bicycles are not allowed THINNER than the roads where bicycles are allowed. Be sure to add a title and legend to your map. You might consider a color palette that has all bike-friendly roads displayed in a bright color. All other lines can be black.
Expand Down Expand Up @@ -431,15 +418,15 @@ ggplot(data = lines_Delft) +

::: challenge

# Plot polygon by attribute
# Challenge: Plot polygon by attribute
<!-- 5 minutes -->

Create a map of the municipal boundaries in the Netherlands using the data located in your data folder: `nl-gemeenten.shp`. Apply a line color to each state using its region value. Add a legend.

::: solution

```{r}
municipal_boundaries_NL <- st_read(here("data", "nl-gemeenten.shp"))
municipal_boundaries_NL <- st_read("data/nl-gemeenten.shp")
```

```{r}
Expand Down
Loading
Loading