ae-10b-lifecycle.qmd

---
title: "Pair Programming - Data Science Lifecycle"
format: html
editor: visual
bibliography: references.bib
execute:
  echo: true
  message: false
  warning: false
editor_options: 
  chunk_output_type: console
---

# Task 0: Load R packages

The required R Packages are loaded at the beginning of the script.

1.  Run the code contained in the code-chunk and fix any errors. (**Tipp: Green play button in the top right corner of the code-chunk.**)

```{r}

library(readr)
library(dplyr)
library(ggplot2)
```

# Task 1: Data import

**Fill in the blanks**

1. A code-chunk has already been created
2. Import the CSV file titled 'country_level_data_0.csv', contained in the 'raw_data' directory with help of the `read_csv()` function
3. Use the assignment operator (`<-`) to assign the data to an object named `global_waste_data`
4. Run the code contained in the code-chunk and fix any errors
5. Next to the code-chunk option `#| eval:` change the value from `false` to `true` (**Info: This tells R to eval(uate) the code-chunk on render**)
6. **Render:** Render this file to HTML
7. **Add:** Open the Git pane and add all files to the staging area (**Tipp: Tick off all checkboxes under column 'Staged'**)
8. **Commit:** Commit pending changes; add a meaningful commit message
9. **Push:** Push changes to the remote repository (i.e. GitHub)

```{r}
#| eval: false

___ <- read_csv("___")
```

# Task 2: Data tidying (and some transformation)

**Fill in the blanks**

1. A code-chunk has already been created
2. Start with the `global_waste_data` object
3. Add the pipe operator (`%>%`) and on a new line use the `select()` function
4. Inside the parentheses write the names of the following variables:
  - country_name
  - iso3c
  - income_id
  - total_msw_total_msw_generated_tons_year
  - population_population_number_of_people
5. Add the pipe operator (`%>%`) and on a new line use the `rename()` function
6. Rename two variables:
  - (1) from 'total_msw_total_msw_generated_tons_year' to 'msw_tons_year'
  - (2) from 'population_population_number_of_people' to 'population'
7. Use the assignment operator (`<-`) to assign the data to an object named `global_waste_data_small`
8. Run the code contained in the code-chunk and fix any errors
9. Next to the code-chunk option `#| eval:` change the value from `false` to `true`
10. Render
11. Add, Commit, Push


```{r}
#| eval: false

global_waste_data_small <- global_waste_data %>% 
  select(country_name, 
         iso3c, 
         ___, 
         ___, 
         population_population_number_of_people) ___ 
  rename(___ = total_msw_total_msw_generated_tons_year,
         ___ = population_population_number_of_people) 
```

# Task 3: Data transformation

**Fill in the blanks**

1. A code-chunk has already been created
2. Use the `global_waste_data_kg_year` object as data for the visualisation
3. Add the pipe operator (`%>%`) and on a new line use the `mutate()` function
4. Create a new variable named 'capita_kg_year' by dividing 'msw_tons_year' by 'population' and multiplied by [?]
5. Run the code contained in the code-chunk and fix any errors
6. Next to the code-chunk option `#| eval:` change the value from `false` to `true`
7. Render
8. Add, Commit, Push

```{r}
#| eval: false

global_waste_data_kg_year <- ___ %>% 
  ___(___ = msw_tons_year / population * ___) %>% 
  mutate(income_id = factor(income_id, 
                            levels = c("HIC", "UMC", "LMC", "LIC")))

```

# Task 4: Data visualisation

**Fill in the blanks**

1. A code-chunk has already been created
2. Use the `global_waste_data_kg_year` object
3. Use aesthetic mappings to plot the income category on the x-axis and MSW generation per capita on the y-axis
4. Run the code contained in the code-chunk and fix any errors
5. Use a search engine of your choice to figure out how to remove the legend from the plot (**Tipp:** Add the name of the ggplot2 package to your query and focus on results from StackOverflow)
6. Run the code contained in the code-chunk and fix any errors
7. Next to the code-chunk option `#| eval:` change the value from `false` to `true`
8. Render
9. Add, Commit, Push

```{r}
#| eval: false

ggplot(data = global_waste_data_kg_year,
       mapping = aes(x = ___, 
                     y = ___,
                     color = income_id)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.1, alpha = 1/4, size = 3) +
  labs(x = "income category",
       y = "MSW generation per capita [kg/yr]") +
  theme_minimal(base_size = 14) 
```

# Task 5: Data communication

1. In the YAML header, replace `html` with `gfm` (GitHub Flavoured Markdown)
2. Render, and close the pop-up window that opens
3. Add, Commit, Push
4. Open your GitHub repository for this exercise, and click on the file with the `.md` ending (`ae-10b-lifecycle.md`)

# Task 6: Complete assignment

1. Open an issue on the repo for this exercise to let us know you completed it. Use the @larnsce mention.

# References