forked from Rbanism/geospatial-data-carpentry-tud
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Session1.Rmd
741 lines (475 loc) · 25.7 KB
/
Session1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
---
title: 'Geospatial Carpentry: Session 1: Intro to R'
author: "Aleksandra Wilczynska"
date: "2023-06-05"
output:
html_document:
toc: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
# Load libraries ----------------------------------------------------------
# Package names
packages <- c("tidyverse", "here")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Packages loading
invisible(lapply(packages, library, character.only = TRUE))
```
# Resources
- [Coordination document](https://docs.google.com/document/d/1x4mRcsSTyUueN_mCiZHE1TOTvF-OZRwq2gV_J9svPRI/edit#heading=h.rv3i711oni3d)
- [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/)
- [Rolling notes](https://docs.google.com/document/d/1P9cPJAbGmPlmG-qik1SPLsAowD2THdzSWjdEikVMt8g/edit?usp=sharing)
<!-- need to reduce this part by 15 minutes -> to 80 minutes -->
# [Introduction to RStudio](https://datacarpentry.org/r-intro-geospatial/01-rstudio-intro/index.html) and [Project management](https://datacarpentry.org/r-intro-geospatial/02-project-intro/index.html)
## Project management
RStudio is an integrated development environment (IDE). It provides a (much prettier) interface for the R software. R is integrated into RStudio, so you never actually have to open R.
R Studio gives a functionality of creating projects: self-contained working space (i.e. working directory), to which R will refer to, when looking for and saving files. You can create projects in existing directories (folders) or create a directory anew.
We’re going to create a project in RStudio in the existing directory:
- `File`
- `New Project`
- `Existing directory`
- browse for directory directory you created where downloaded the data: `gdc`
-`Create project`
<!-- ### Tips for project organisation -->
<!-- + **Raw data** is read only -->
<!-- + **Cleaned data** as read only -->
<!-- + **Generated output** is disposable -->
<!-- + **Related data together** (Some GIS file formats are really 3-6 files that need to be kept together and have the same name, e.g. shapefiles) -->
<!-- + Keep consistent **naming schema** -->
<!-- + Name all files to reflect their content or function -->
<!-- + Stage your scripts: Creating separate R scripts or Rmarkdown documents for different stages of a project -->
### Organising project/working directory
This is one suggestion of how your R project can look like. Your data folder is already there. Let's go ahead and create the other folders:
- `data` - should be where your raw data is. **READ ONLY**
- `data_output` - should be where your data output is saved **READ AND WRITE**
- `documents` - all the documentation associated with the project (e.g. cookbook)
- `fig_output` - your figure outputs go here **WRITE ONLY**
- `scripts` - all your code goes here **READ AND WRITE**
![](assets/img/rstudio_project_files.jpeg)
<font size="3">*Source*:[Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/00-intro/index.html#organizing-your-working-directory)</font>
### Two main ways to interact with R
- Test and play within the **interactive R console** (chat)
- **Pros:** immediate results
- **Cons:** work lost once we close RStudio
- Start writing in an **.R file** (email)
- File still will be executed in the console
- **Pros:** complete record of what you did!
- **Cons:** Can be messy if we're just want to print things out
### Running the code
- In the console: `Enter`
- In the script:
- `Ctrl` + `Enter` (for MAC users: `Command` + `Enter`)
- `Run` button on right left - current line or selection
### Creating a script
We're going to work with a script. Let's create one now and save it in the `scripts` directory.
- `File`
- `New File`
- `R Script`
- A new `Untitled` script will appear in the source pane. Save it using floppy disc icon.
- Name it `intro-to-r.R`
<!-- ### Escaping -->
<!-- The console shows it's ready to get new commands with `>` sign. It will show `+` sign if it still requires input for the command to be executed. -->
<!-- Sometimes you don't know what is missing/ you change your mind and want to run something else, or your code is running much too long and you just want it to stop. The way to do it is to hit `Esc`. -->
### Packages
A great power of R lays in **packages: add-on sets of functions** that are build by the community and once they go through a quality process they are available to download from a repository called `CRAN`. They need to be explicitly activated.
Now, we will be using `tidyverse` package, which is actually a collection of useful packages. Another package that will be useful for us is `here`.
If you have have not installed this package earlier, please do so. You can check if you have it installed in the `Packages` pane in the bottom-right window.
```{r install-package, eval=FALSE}
# install.packages('tidyverse')
install.packages('here')
```
You need to install package only once, but you will need to load it each time you want to use its functionalities. To do that you use `library()` command:
```{r load-package}
library(tidyverse)
library(here)
```
### Handling paths
![](assets/img/relative_root.png)
<font size="3">*Source*:[kaggle.com](https://www.kaggle.com/code/rtatman/reproducibility-tips-absolute-vs-relative-paths/notebook)</font>
You have created a project which is your working directory, and a number of sub-folders, that will help you organise your project better. But now each time you will save or retrieve a file from those folders, you will need to specify the path from the folder you are in (most likely `scripts`).
That can become complicated and can become a reproducibility problem if the person using your code (e.g. future you) is working in a different sub-folder.
`here()` to the rescue! This package provides absolute paths from the root (main directory) of your project.
![](assets/img/here.png)
<font size="3">*Source*:[Allison horst](https://github.com/allisonhorst)</font>
```{r here}
here('data')
```
### Download files
We still need to download data for the first part of the workshop. You can do it with the function `download.file()`. We will save it in the `data` folder, where the **raw** data should go.
```{r download-files}
download.file('https://bit.ly/geospatial_data', here('data','gapminder_data.csv'), mode = 'wb')
```
# [Intro to R](https://datacarpentry.org/r-intro-geospatial/01-rstudio-intro/index.html)
## Use R as a calculator
```{r calculator}
1+100
```
## Variables and assignment
We can store values in variables using the assignment operator `<-`, like this:
```{r asignment-operator}
x <- 1/40
```
Notice that assignment does not print a value. Instead, we stored it for later in something called a variable. `x` now contains the value `0.025`:
```{r asignment-operator2}
x
```
Look for the `Environment` tab in the upper pane of RStudio, and you will see that `x` and its value have appeared.
Our variable `x` can be used in place of a number in any calculation that expects a number, e.g. when calculating a square root:
```{r use-variable}
sqrt(x)
```
Variables can be also reassigned:
```{r reassign}
x <- 100
x
```
You can use the 'old' value when creating a new object
```{r reassign2}
y <- sqrt(x) # you can use value stored in object x to create y
y
```
# [Data Structures](https://datacarpentry.org/r-intro-geospatial/03-data-structures-part1/index.html)
## Vectors
So far we've looked on individual values. Now we will move to a data structure called vectors. Vectors are arrays of values of a same data type (will explain in a second :) ) .
You can create a vector with a `c()` function.
```{r vectors}
numeric_vector <- c(2, 6, 3) # vector of numbers - numeric data type.
numeric_vector
character_vector <- c('banana', 'apple', 'orange') # vector of words - or strings of characters- character data type
character_vector
logical_vector <- c(TRUE, FALSE, TRUE) # vector of logical values (is something true or false?)- logical data type.
logical_vector
```
### Combining vectors
The combine function, `c()`, will also append things to an existing vector:
```{r combine-vectors}
ab_vector <- c('a', 'b')
ab_vector
abcd_vector <- c(ab_vector, 'c', 'd')
abcd_vector
```
### Retrieving elements of a vector
### Missing values
A common operation you want to perform is to remove all the missing values (in R denoted as `NA`). Let's have a look how to do it:
```{r remove-na}
with_na <- c(1, 2, 1, 1, NA, 3, NA ) # vector including missing value
```
First, let's try to calculate mean for the values in this vector
```{r remove-na1}
mean(with_na) # mean() function cannot interpret the missing values
mean(with_na, na.rm = T) # You can add the argument na.rm=TRUE to calculate the result while ignoring the missing values.
```
However, sometimes, you would like to have the `NA` completely removed from your vector. for this you need to identify which elements of the vector hold missing values with `is.na()` function.
```{r remove-na2}
is.na(with_na) # This will produce a vector of logical values, stating if a statement 'This element of the vector is a missing value' is true or not
!is.na(with_na) # # The ! operator means negation ,i.e. not is.na(with_na)
```
We know which elements in the vectors are `NA`. Now we need to retrieve the subset of the `with_na` vector that is not `NA`.
Sub-setting in `R` is done with square brackets`[ ]`.
```{r remove-na3}
without_na <- with_na[ !is.na(with_na) ] # this notation will return only the elements that have TRUE on their respective positions
without_na
```
## Factors (adapted from [Starting with Data](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html))
Another important data structure is called a **factor**. Factors look like character data, but are used to represent categorical information.
Factors create a structured relation between the different levels (values) of a categorical variable, such as days of the week or responses to a question in a survey. While factors look (and often behave) like character vectors, they are actually treated as numbers by `R`. So you need to be very careful when treating them as strings.
### Create factors
Once created, factors can only contain a pre-defined set of values, known as levels.
```{r factor-create}
nordic_str <- c('Norway', 'Sweden', 'Norway', 'Denmark', 'Sweden')
nordic_str # regular character vectors printed out
nordic_cat <- factor(nordic_str) # factor() function converts a vector to factor data type
nordic_cat # With factors, R prints out additional information - 'Levels'
```
### Inspect factors
R will treat each unique value from a factor vector as a **level** and (silently) assign numerical values to it. This can come in handy when performing statistical analysis. You can inspect and adapt levels of the factor.
```{r factor-inspect}
levels(nordic_cat) # returns all levels of a factor vector.
nlevels(nordic_cat) # returns number of levels in a vector
```
### Reorder levels
Note that `R` sorts the levels in the alphabetic order, not in the order of occurrence in the vector. `R` assigns value of:
- 1 to level 'Denmark',
- 2 to 'Norway'
- 3 to 'Sweden'.
This is important as it can affect e.g. the order in which categories are displayed in a plot or which category is taken as a baseline in a statistical model.
You can reorder the categories using `factor()` function.
```{r factor-reorder}
nordic_cat <- factor(nordic_cat, levels = c('Norway' , 'Denmark', 'Sweden')) # now Norway should be the first category, Denmark second and Sweden third
# OR
# nordic_cat <- fct_relevel(nordic_cat, 'Norway' , 'Denmark', 'Sweden') # now Norway should be the first category, Denmark second and Sweden third
nordic_cat
str(nordic_cat) # you can also inspect vectors with str() function. In factor vectors, it shows the underlying values of each category. You can also see the structure in the environment tab of RStudio.
```
There is more than one way to reorder factors. Later in the lesson, we will use `fct_relevel()` function from `forcats` package to do the reordering.
### Note of caution (optional)
Remember that once created, factors can only contain a pre-defined set of values, known as levels.
It means that whenever you try to add something to the factor vector outside of this set, it will become an unknown/missing value detonated by `R` as `NA`.
```{r factor-missing-level}
nordic_str
nordic_cat2 <- factor(nordic_str, levels = c('Norway', 'Denmark'))
nordic_cat2 # since we have not included Sweden in the list of factor levels, it has become NA.
```
# [Exploring Data frames](https://datacarpentry.org/r-intro-geospatial/04-data-structures-part2/index.html)
Now we turn to the bread-and-butter of working with `R`: working with tabular data. In `R` data are stored in a data structure called **data frames**.
A data frame is a representation of data in the format of a **table** where the columns are **vectors** that all have the **same length**.
Because columns are vectors, each column must contain a **single type of data** (e.g., characters, numeric, factors).
For example, here is a figure depicting a data frame comprising a numeric, a character, and a logical vector.
![](assets/img/data-frame.svg)
<font size="3">*Source*:[Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html#what-are-data-frames-and-tibbles)</font>
## Reading data
`read.csv()` is a function used to read coma separated data files (`.csv` format)). There are other functions for files separated with other delimiters.
We're gonna read in the `gapminder` data set with information about countries' size, GDP and average life expectancy in different years.
```{r reading-data}
gapminder <- read.csv(here('data','gapminder_data.csv') )
```
## Exploring dataset
Let’s investigate the `gapminder` data frame a bit; the first thing we should always do is check out what the data looks like.
It is important to see if all the variables (columns) have the data type that we require. Otherwise we can run into trouble.
```{r inspecting-data-str}
str(gapminder)
```
We can see that the `gapminder` object is a data.frame with `r nrow(gapminder)` observations/ rows and `r ncol(gapminder)` variables/columns.
In each line after a `$` sign, we see the name of each column, its type and first few values.
### First look at the dataset
There are multiple ways to explore a data set. Here are just a few examples:
```{r}
head(gapminder) # see first 5 rows of the data set
summary(gapminder) # gives basic statistical information about each column. Information format differes by data type.
nrow(gapminder) # returns number of rows in a dataset
ncol(gapminder) # returns number of columns in a dataset
```
### Dollar sign ($)
When you're analyzing a data set, you often need to access its specific columns.
One handy way to access a column is using it's name and a dollar sign `$`:
```{r subset-dollar-sign}
country_vec <- gapminder$country # Notation means: From dataset gapminder, give me column country. You can see that the column accessed in this way is just a vector of characters.
head(country_vec)
```
Note that the calling a column with a `$` sign will return a *vector*, it's not a data frame anymore.
# [Data frame Manipulation with dplyr](https://datacarpentry.org/r-intro-geospatial/06-dplyr/index.html)
## Select
Let's start manipulating the data.
First, we will adapt our data set, by keeping only the columns we're interested in, using the `select()` function from the `dplyr` package:
```{r dplyr-select}
year_country_gdp <- select(gapminder, year, country, gdpPercap)
head(year_country_gdp)
```
## Pipe
Now, this is not the most common notation when working with `dplyr` package. `dplyr` offers an operator `%>%` called a pipe, which allows you build up very complicated commands in a readable way.
In newer installation of `R` you can also find a notation `|>` . This pipe works in a similar way. The main difference is that you don't need to load any packages to have it available.
The `select()` statement with pipe would look like that:
```{r dplyr-pipe}
year_country_gdp <- gapminder %>%
select(year,country,gdpPercap)
head(year_country_gdp)
```
First we define data set, then - with the use of pipe we pass it on to the `select()` function. This way we can chain multiple functions together, which we will be doing now.
## Filter
We already now how to select only the needed columns. But now, we also want to filter the data set via certain condition with `filter()` function. Instead doing it in separate steps, we can do it all together.
In the `gapminder` data set, we want to see the results from outside of Europe for the 21st century.
```{r}
year_country_gdp_euro <- gapminder %>%
filter(continent != "Europe" & year > 2000) %>% # & operator (AND) - both conditions must be met
select(year, country, gdpPercap)
head(year_country_gdp_euro)
```
Let's now find all the observations from Eurasia:
```{r}
year_country_gdp_eurasia <- gapminder %>%
filter(continent == "Europe" | continent == "Asia") %>% # I operator (OR) - one of the conditions must be met
select(year, country, gdpPercap)
head(year_country_gdp_eurasia)
```
### Exercise 1
<div class="alert alert-info">
<strong>Challenge</strong>
Write a single command (which can span multiple lines and includes pipes) that will produce a dataframe that has the African values for life expectancy, country and year, but not for other Continents. How many rows does your data frame have and why?
<strong>Solution</strong>
</div>
```{r ex5, class.source="bg-info"}
year_country_lifeExp_Africa <- gapminder %>%
filter(continent=="Africa" ) %>%
select(year,country,lifeExp)
nrow(year_country_lifeExp_Africa)
```
## Group and summarize
So far, we have created a data frame for one of the continents represented in the `gapminder` data set. But often instead of doing that, we would like to know statistics about all of the continents, presented by group.
```{r dplyr-group}
gapminder %>% # select the dataset
group_by(continent) %>% # group by continent
summarize(avg_gdpPercap = mean(gdpPercap)) # summarize function creates statistics for the data set
```
### Exercise 2
<div class="alert alert-info">
<strong>Challenge</strong>
Calculate the average life expectancy per country. Which country has the longest average life expectancy and which has the shortest average life expectancy?
<strong>Hint</strong> Use `max()` and `min()` functions to find minimum and maximum.
<strong>Solution</strong>
</div>
```{r ex6 , class.source="bg-info"}
lifeExp_bycountry <- gapminder %>%
group_by(country) %>%
summarize(avg_lifeExp=mean(lifeExp))
lifeExp_bycountry %>%
filter(avg_lifeExp == min(avg_lifeExp) | avg_lifeExp == max(avg_lifeExp))
```
### Multiple groups and summary variables
You can also group by multiple columns:
```{r dplyr-group-multi}
gapminder %>%
group_by(continent, year) %>%
summarize(avg_gdpPercap = mean(gdpPercap))
```
On top of this, you can also make multiple summaries of those groups:
```{r dplyr-summ}
gdp_pop_bycontinents_byyear <- gapminder %>%
group_by(continent,year) %>%
summarize(
avg_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
avg_pop = mean(pop),
sd_pop = sd(pop),
n_obs = n()
)
```
## Frequencies
If you need only a number of observations per group, you can use the `count()` function
```{r dplyr-count}
gapminder %>%
group_by(continent) %>%
count()
```
## Mutate
Frequently you’ll want to create new columns based on the values in existing columns, for example to do unit conversions, or to find the ratio of values in two columns. For this we’ll use `mutate()`.
```{r dplyr-mutate}
gapminder_gdp <- gapminder %>%
mutate(gdpBillion = gdpPercap*pop/10^9)
head(gapminder_gdp)
```
# [Introduction to Visualisation](https://datacarpentry.org/r-intro-geospatial/07-plot-ggplot2/index.html)
Package `ggplot2` is a powerful plotting system. I will introduce key features of `ggplot`. In the following parts of this workshop, you will use this package to visualize geo-spatial data.
`gg` stands for grammar of graphics, the idea that three components needed to create a graph are:
- data
- aesthetics - coordinate system on which we map the data (what is represented on x axis, what on y axis)
- geometries - visual representation of the data (points, bars, etc.)
Fun part about `ggplot` is that you can then add additional layers to the plot providing more information and make it more beautiful.
First, lets plot distribution of life expectancy in the `gapminder` data set.
```{r ggplot}
ggplot(data = gapminder, aes(x = lifeExp) ) + # aesthetics layer
geom_histogram() # geometry layer
```
You can see that in `ggplot` you use `+` as a pipe, to add layers. Within `ggplot` call, it is the only pipe that will work.
But, it is possible to chain operations on a data set with a pipe that we have already learned: `%>%` ( or `|>`) and follow them by ggplot grammar.
Let's create another plot, this time only on a subset of observations:
```{r ggplot-col}
gapminder %>% # we select a data set
filter(year == 2007 &
continent == 'Americas') %>% # and filter it to keep only one year and one continent
ggplot(aes(x = country, y = gdpPercap)) + # we create aesthetics, both x and y axis represent values of columns
geom_col() # we select a column graph as a geometry
```
Now, you can iteratively improve how the plot looks like. For example, you might want to flip it, to better display the labels.
```{r ggplot-coord-flip}
gapminder %>%
filter(year == 2007,
continent == 'Americas') %>%
ggplot(aes(x = country, y = gdpPercap)) +
geom_col()+
coord_flip()
```
One thing you might want to change here is the order in which countries are displayed. It would be easier to compare GDP per capita, if they were showed in order.
To do that, we need to reorder factor levels (you remember, we've already done this before).
Now the order of the levels will depend on another variable - GDP per capita.
```{r ggplot-color}
gapminder %>%
filter(year == 2007,
continent == 'Americas') %>%
mutate(country = fct_reorder(country, gdpPercap )) %>%
ggplot(aes(x = country , y = gdpPercap)) +
geom_col() +
coord_flip()
```
Let's make things more colorful - let's represent the average life expectancy of a country by color
```{r ggplot-colors}
gapminder %>%
filter(year == 2007,
continent == 'Americas') %>%
mutate(country = fct_reorder(country, gdpPercap )) %>%
ggplot(aes(x = country, y = gdpPercap, fill = lifeExp )) + # fill argument for coloring surfaces, color for points and lines
geom_col()+
coord_flip()
```
We can also adapt the color scale. Common choice that is used for its readibility and colorblind-proofness are the pallettes available in the `viridis` package.
```{r ggplot-colors-adapt}
plot_c <-
gapminder %>%
filter(year == 2007,
continent == 'Americas') %>%
mutate(country = fct_reorder(country, gdpPercap )) %>%
ggplot(aes(x = country, y = gdpPercap, fill = lifeExp )) +
geom_col()+
coord_flip()+
scale_fill_viridis_c() # _c stands for continuous scale
```
Maybe we don't need that much information about the life expectancy. We only want to know if it's below or above average.
```{r ggplot-colors-discrete}
plot_d <- # this time let's save the plot in the object.
gapminder %>%
filter(year == 2007 &
continent == 'Americas') %>%
mutate(country_reordered = fct_reorder(country, gdpPercap ),
lifeExpCat = if_else(lifeExp >= mean(lifeExp), 'high', 'low' )
) %>%
ggplot(aes(x = country_reordered, y = gdpPercap, fill = lifeExpCat )) +
geom_col()+
coord_flip()+
scale_fill_manual(values = c('light blue', 'orange'))
```
Since we saved a plot as an object, nothing has been printed out. Just like with any other object in `R`, if you want to see it, you need to call it.
```{r ggplot-call}
plot_d
```
Now we can make use of the saved object and add things to it.
Let's also give it a title and name the axes:
```{r ggplot-titles}
plot_d <-
plot_d +
ggtitle('GDP per capita in Americas', subtitle = 'Year 2007') +
xlab('Country')+
ylab('GDP per capita')
plot_d
```
# [Writing data](https://datacarpentry.org/r-intro-geospatial/08-writing-data/index.html)
## Saving the plot
Once we are happy with our plot we can save it in a format of our choice. Remember to save it in the dedicated folder.
```{r save-plot}
ggsave(plot = plot_d,
filename = here('fig_output','plot_americas_2007.pdf') ) # By default, ggsave() saves the last displayed plot, but you can also explicitly name the plot you want to save
```
### Using help documentation
My saved plot is not very readable. We can see why it happened by exploring the help documentation. We can do it by writing directly in the Console:
```{r help, eval=FALSE}
?ggsave
```
We can read that it uses the "size of the current graphics device". That would explain why our saved plots look slightly different. Feel free to explore the documentation to see how to adapt the size e.g. by adapting `width`, `height` and `units` parameter.
## Saving the data
Another output of your work you want to save is a cleaned data set. In your analysis, you can then load directly that data set. Say we want to save the data only for Australia:
```{r writing-data}
gapminder_amr_2007 <- gapminder %>%
filter(year == 2007 &
continent == 'Americas') %>%
mutate(country_reordered = fct_reorder(country, gdpPercap ),
lifeExpCat = if_else(lifeExp >= mean(lifeExp), 'high', 'low' ))
write.csv(gapminder_amr_2007,
here('data_output', 'gapminder_americas_2007.csv'),
row.names=FALSE)
```