Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
cforgaci committed Feb 18, 2024
2 parents 1a31814 + 49d967f commit 1f6f14e
Show file tree
Hide file tree
Showing 8 changed files with 277 additions and 111 deletions.
6 changes: 3 additions & 3 deletions episodes/01-intro-to-r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,8 @@ Each of the modes o interactions has its advantages and drawbacks.

| | Console | R script|
|--------|---------|---------|
|**Pros**|Immediate results|Work lost once you close RStudio |
|**Cons**|Complete record of your work |Messy if you just want to print things out|
|**Pros**|Immediate results| Complete record of your work |
|**Cons**| Work lost once you close RStudio | Messy if you just want to print things out|



Expand Down Expand Up @@ -312,7 +312,7 @@ In the script, we will write:
```{r download-files}
# Download the data
download.file('https://bit.ly/geospatial_data',
here('episodes', 'data','gapminder_data.csv'))
here('data','gapminder_data.csv'))
```

Expand Down
55 changes: 42 additions & 13 deletions episodes/02-data-structures.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,16 @@ You can create a vector with a `c()` function.

```{r vectors}
numeric_vector <- c(2, 6, 3) # vector of numbers - numeric data type.
# vector of numbers - numeric data type.
numeric_vector <- c(2, 6, 3)
numeric_vector
character_vector <- c('banana', 'apple', 'orange') # vector of words - or strings of characters- character data type
# vector of words - or strings of characters- character data type
character_vector <- c('banana', 'apple', 'orange')
character_vector
logical_vector <- c(TRUE, FALSE, TRUE) # vector of logical values (is something true or false?)- logical data type.
# vector of logical values (is something true or false?)- logical data type.
logical_vector <- c(TRUE, FALSE, TRUE)
logical_vector
```
Expand Down Expand Up @@ -121,7 +124,9 @@ First, let's try to calculate mean for the values in this vector
```{r remove-na1}
mean(with_na) # mean() function cannot interpret the missing values
mean(with_na, na.rm = T) # You can add the argument na.rm=TRUE to calculate the result while ignoring the missing values.
# You can add the argument na.rm=TRUE to calculate the result while
# ignoring the missing values.
mean(with_na, na.rm = T)
```

However, sometimes, you would like to have the `NA`
Expand All @@ -130,9 +135,11 @@ For this you need to identify which elements of the vector hold missing values
with `is.na()` function.

```{r remove-na2}
is.na(with_na) # This will produce a vector of logical values, stating if a statement 'This element of the vector is a missing value' is true or not
is.na(with_na) # This will produce a vector of logical values,
# stating if a statement 'This element of the vector is a missing value'
# is true or not
!is.na(with_na) # # The ! operator means negation ,i.e. not is.na(with_na)
!is.na(with_na) # The ! operator means negation, i.e. not is.na(with_na)
```

Expand All @@ -142,7 +149,8 @@ Sub-setting in `R` is done with square brackets`[ ]`.

```{r remove-na3}
without_na <- with_na[ !is.na(with_na) ] # this notation will return only the elements that have TRUE on their respective positions
without_na <- with_na[ !is.na(with_na) ] # this notation will return only
# the elements that have TRUE on their respective positions
without_na
Expand Down Expand Up @@ -170,7 +178,8 @@ known as levels.
nordic_str <- c('Norway', 'Sweden', 'Norway', 'Denmark', 'Sweden')
nordic_str # regular character vectors printed out
nordic_cat <- factor(nordic_str) # factor() function converts a vector to factor data type
# factor() function converts a vector to factor data type
nordic_cat <- factor(nordic_str)
nordic_cat # With factors, R prints out additional information - 'Levels'
```
Expand Down Expand Up @@ -201,8 +210,14 @@ displayed in a plot or which category is taken as a baseline in a statistical mo
You can reorder the categories using `factor()` function. This can be useful, for instance, to select a reference category (first level) in a regression model or for ordering legend items in a plot, rather than using the default category systematically (i.e. based on alphabetical order).

```{r factor-reorder1}
nordic_cat <- factor(nordic_cat, levels = c('Norway' , 'Denmark', 'Sweden')) # now Norway should be the first category, Denmark second and Sweden third
nordic_cat <- factor(
nordic_cat, levels = c(
'Norway',
'Denmark',
'Sweden'
))
# now Norway will be the first category, Denmark second and Sweden third
nordic_cat
```

Expand All @@ -212,7 +227,15 @@ There is more than one way to reorder factors. Later in the lesson,
we will use `fct_relevel()` function from `forcats` package to do the reordering.

```{r factor-reorder2}
# nordic_cat <- fct_relevel(nordic_cat, 'Norway' , 'Denmark', 'Sweden') # now Norway should be the first category, Denmark second and Sweden third
library(forcats)
nordic_cat <- fct_relevel(
nordic_cat,
'Norway' ,
'Denmark',
'Sweden'
) # With this, Norway will be first category,
# Denmark second and Sweden third
nordic_cat
```
Expand All @@ -239,8 +262,14 @@ outside of this set, it will become an unknown/missing value detonated by

```{r factor-missing-level}
nordic_str
nordic_cat2 <- factor(nordic_str, levels = c('Norway', 'Denmark'))
nordic_cat2 # since we have not included Sweden in the list of factor levels, it has become NA.
nordic_cat2 <- factor(
nordic_str,
levels = c('Norway', 'Denmark')
)
# because we did not include Sweden in the list of
# factor levels, it has become NA.
nordic_cat2
```
::::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
27 changes: 17 additions & 10 deletions episodes/03-explore-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Because columns are vectors, each column must contain a **single type of data**
For example, here is a figure depicting a data frame comprising a numeric, a character, and a logical vector.

![](fig/data-frame.svg)
<br><font size="3">*Source*:[Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html#what-are-data-frames-and-tibbles)</font>
<br><font size="3">*Source*: [Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html#what-are-data-frames-and-tibbles)</font>


## Reading data
Expand All @@ -68,7 +68,7 @@ For example, here is a figure depicting a data frame comprising a numeric, a cha
We're gonna read in the `gapminder` data set with information about countries' size, GDP and average life expectancy in different years.

```{r reading-data}
gapminder <- read_csv("data/gapminder_data.csv")
gapminder <- read.csv("data/gapminder_data.csv")
```

Expand All @@ -92,9 +92,11 @@ There are multiple ways to explore a data set. Here are just a few examples:


```{r}
head(gapminder) # see first 6 rows of the data set
summary(gapminder) # gives basic statistical information about each column. Information format differes by data type.
head(gapminder) # shows first 6 rows of the data set
summary(gapminder) # basic statistical information about each column.
# Information format differes by data type.
nrow(gapminder) # returns number of rows in a dataset
Expand All @@ -108,7 +110,9 @@ When you're analyzing a data set, you often need to access its specific columns.

One handy way to access a column is using it's name and a dollar sign `$`:
```{r subset-dollar-sign}
country_vec <- gapminder$country # Notation means: From dataset gapminder, give me column country. You can see that the column accessed in this way is just a vector of characters.
# This notation means: From dataset gapminder, give me column country. You can
# see that the column accessed in this way is just a vector of characters.
country_vec <- gapminder$country
head(country_vec)
Expand Down Expand Up @@ -157,8 +161,9 @@ We already know how to select only the needed columns. But now, we also want to
In the `gapminder` data set, we want to see the results from outside of Europe for the 21st century.
```{r}
year_country_gdp_euro <- gapminder %>%
filter(continent != "Europe" & year >= 2000) %>% # & operator (AND) - both conditions must be met
filter(continent != "Europe" & year >= 2000) %>%
select(year, country, gdpPercap)
# '&' operator (AND) - both conditions must be met
head(year_country_gdp_euro)
```
Expand All @@ -177,8 +182,9 @@ Write a single command (which can span multiple lines and includes pipes) that w

```{r ex5, class.source="bg-info"}
year_country_gdp_eurasia <- gapminder %>%
filter(continent == "Europe" | continent == "Asia") %>% # | operator (OR) - one of the conditions must be met
select(year, country, gdpPercap)
filter(continent == "Europe" | continent == "Asia") %>%
select(year, country, gdpPercap)
# '|' operator (OR) - one of the conditions must be met
nrow(year_country_gdp_eurasia)
```
Expand All @@ -191,7 +197,7 @@ So far, we have provided summary statistics on the whole dataset, selected colum
```{r dplyr-group}
gapminder %>% # select the dataset
group_by(continent) %>% # group by continent
summarize(avg_gdpPercap = mean(gdpPercap)) # summarize function creates statistics for the data set
summarize(avg_gdpPercap = mean(gdpPercap)) # create basic stats
```

Expand All @@ -211,7 +217,8 @@ Calculate the average life expectancy per country. Which country has the longest
gapminder %>%
group_by(country) %>%
summarize(avg_lifeExp=mean(lifeExp)) %>%
filter(avg_lifeExp == min(avg_lifeExp) | avg_lifeExp == max(avg_lifeExp))
filter(avg_lifeExp == min(avg_lifeExp) |
avg_lifeExp == max(avg_lifeExp) )
```

### Multiple groups and summary variables
Expand Down
Loading

0 comments on commit 1f6f14e

Please sign in to comment.