Skip to content

opencasestudies/ocs-ripples-ancestry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Your case study project should include a README.md file. The README from the CO2 Emissions case study is provided here as an example/template:

Important links

Disclaimer

The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.

License

This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

Citation

To cite this case study:

Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com/opencasestudies/ocs-bp-co2-emissions. Exploring CO2 emissions across time (Version v1.0.0).

Acknowledgments

Title

Exploring CO2 emissions across time

Motivation

In this case study we...

Motivating questions

Data

In this case study we will be using this data...

This case study uses data from blank that was originally obtained from blank.

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Importing data from various types of Excel files and CSV files
  2. Apply action verbs in dplyr for data wrangling
  3. How to pivot between "long" and "wide" datasets
  4. Joining together multiple datasets using dplyr
  5. How to create effective longitudinal data visualizations with ggplot2
  6. How to add text, color, and labels to ggplot2 plots
  7. How to create faceted ggplot2 plots

Statistical Learning Objectives:

  1. Introduction to correlation coefficient as a summary statistic
  2. Relationship between correlation and linear regression
  3. Correlation is not causation

Data import

Data from several .xlsx files and a couple of .csv files were imported using readxl and readr respectively.

Data wrangling

This case study particularly focuses on renaming variables, modifying variables, creating new variables, and modifying the shape of the data using fuctions from the dplyr package such as: rename(), mutate(), pivot_longer(), and pivot_wider().

This case study also covers combining data with bind_rows() and full_join() of the dplyr package, including a comparison of the two functions.

We also cover filtering with thefilter() function of the dplyr package, removing NA values with the drop_na() function of the tidyr package, arrange data with the arrange() function of the dplyr package, as well as grouping and summarizing data with the group_by() and summarize() functions of the dplyr package.

Data Visualization

We include a thorough and introductory explanation of ggplot2 including how to add color, facets and labels to plots.

Analysis

In this case study we look at the correaltion between CO2 emissions and annual average temperatures in the US. We also evaluate the assocation between the two using a linear regression. We discuss the relationship between correlation and linear regression and how we interpret the findings.

Other notes and resources

Tidyverse{target="_blank"}
RStudio cheatsheets{target="_blank"} Introduction to correlation{target="_blank"} Correlation coefficient{target="_blank"}
Correlation does not imply causation{target="_blank"}
Regression{target="_blank"}
Locally estimated scatterplot smoothing{target="_blank"}
Local polynomial regression{target="_blank"}
Autocorrleation{target="_blank"}
Time series{target="_blank"}
Methods to account for autocorrelation{target="_blank"}
US Environmental Protection Agency (EPA) Inventory of U.S. Greenhouse Gas Emissions and Sinks 2020 Report{target="_blank"}
National Climate Assessment Report{target="_blank"}
Greenhouse gases{target="_blank"} Climate change{target="_blank"}

Packages used in this case study:

Package Use in this case study
here{target="_blank"} to easily load and save data
readxl{target="_blank"} to import the excel file data
readr{target="_blank"} to import the csv file data
dplyr{target="_blank"} o view and wrangle the data, by modifying variables, renaming variables, selecting variables, creating variables, and arranging values within a variable
magrittr{target="_blank"} to use and reassign data objects using the %<>%pipe operator
stringr{target="_blank"} to select only the first 4 characters of date data
purrr{target="_blank"} to apply a function on a list of tibbles (tibbles are the tidyverse version of a data frame)
tidyr{target="_blank"} to drop rows with NA values from a tibble
forcats{target="_blank"} to reorder the levels of a factor
ggplot2{target="_blank"} to make visualizations
directlabels{target="_blank"} to add labels to plots easily
ggrepel{target="_blank"} to add labels that don't overlap to plots
broom to make the output form statistical tests easier to work with
patchwork{target="_blank"} to combine plots

For users

(If you do this... at a later stage... )There is a Makefile in this folder that allows you to type make to knit the case study contained in the index.Rmd to index.html and it will also knit the README.Rmd to a markdown file (README.md).

For instructors

Instructors can start at the data visualization section or the data analysis section. However, if you choose to start at the data analysis section, you will need to remove the code for the main plot.

Target audience

This case study is appropriate for those new to R programming and new to statistics. It is also appropriate for more advanced R users who are new to the Tidyverse.

Suggested homework

Ask students to create a plot with labels showing the countries with the lowest CO2 emission levels.

Ask students to plot CO2 emissions and other variables (e.g. energy use) on a scatter plot, calculate the Pearson's correlation coefficient, and discuss results.

About

Case study on ancestry for Ripples project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages