As part of the final project, you and your classmates will put together an annotated list of R functions.
-
Annotations first draft: Time on Date 1
-
Peer reviews: Time on Date 2
-
Annotations final draft: Time on Date 3
Read the section merging into the master branch for instructions on the peer review process.
Purpose — A significant portion of the course is dedicated to learning how to use R and the tidyverse
ecosystem to wrangle and explore data, and then analyze it using statistics. Remembering the different functions, how they are used, and what problems they solve requires consistent practice and review. This is why it's important to have a set of notes that explain how a function works in your own words, which can be referenced long after you've completed this course.
For your final project — Each student will be assigned a set of functions that we used during the course that you are responsible for annotating. Everyone's annotations will need to be merged into a single, uniform document that will ideally serve as a useful reference after the course is complete. To get the class started, this starter repository has been set up with a notes template containing a pre-filled example for the ggplot2
syntax and how to use geom_histogram()
. Use these examples as inspiration for how to put together notes on the other functions in R.
Unless told otherwise, when writing notes for a function, include the following:
-
The function's name
-
The important inputs
-
If the input was used in class or for a homework assignment, then it's important
-
For
ggplot2
functions, include a separate subsection for important aesthetic mappings (see example in your template file)
-
-
A summary — 1 or 2 sentences — of what the function does
-
An example showing how to use the function
- Your example cannot use the same dataset as the examples found in the help documentation or from class. For example, when explaining the
dplyr
function you should not use thepresidential
dataset, but you can use a dataset from one of the homework assignments.
- Your example cannot use the same dataset as the examples found in the help documentation or from class. For example, when explaining the
Your notes should discuss the following packages and functions:
-
The
ggplot2
package-
geom_histogram()
,geom_boxplot()
,geom_point()
,geom_bar()
,geom_col()
,stat_ecdf()
,geom_smooth()
,facet_wrap()
,facet_grid()
-
Basic instructions on how to change the axes labels and give the plot a title
-
-
The
readr
packageread_csv()
,read_rds()
,write_csv()
,write_rds()
-
The
dplyr
packageselect()
,slice()
,rename()
,recode()
,arrange()
,filter()
,mutate()
,group_by()
,summarize()
,count()
,cume_dist()
,combine()
,pull()
-
The
tibble
package- Instructions on manually creating a data frame using
data_frame()
- Instructions on manually creating a data frame using
-
The
tidyr
packagegather()
,spread()
,separate()
,unite()
-
The
rvest
package- Instead of notes for each function, describe the procedure for using the SelectorGadget extension with
read_html()
,html_nodes()
, andhtml_text()
to perform basic webscraping tasks
- Instead of notes for each function, describe the procedure for using the SelectorGadget extension with
-
Statistics functions
-
Summary statistics functions:
mean()
,median()
,sd()
,IQR()
,min()
, andmax()
-
Linear modeling:
lm()
-
Instructions on how to create a Q-Q plot using
geom_qq()
fromggplot2
and then plot the ideal reference line to check for deviations from normality
-
-
The
infer
package-
Describe the inputs for the four functions:
specify()
,hypothesize()
,generate()
, andcalculate()
, see the class 16 slides for guidance -
Instructions with an example on how you use
infer
to conduct a hypothesis test: how you simulate the null distribution, and how you calculate the p-value-
Test must be between two variables (one response variable and one explanatory variable), one explanatory variable must be categorical, the response can be either categorical or numerical
-
Example should show how we compute the p-value for a one-sided test and for a two-sided test
-
-
Instructions with an example on how you use
infer
to find a confidence interval using bootstrapping, including how you would find different size intervals, for example a 90% interval or a 95% interval-
Confidence interval must be computed for a difference of two values, such as means or proportions
-
Consult the class 17 slides for guidance on how to correctly compute the upper and lower bounds!
-
-
-
The
modelr
package-
An example of showing how to plot a simple linear model with one explanatory variable on top of the data in a scatter plot using
data_grid()
(orseq_range()
),add_predictions()
, andgeom_line()
fromggplot2
-
Instructions with an example on how to use
add_predictions()
to create observed versus predicted values plots -
Instructions with an example on how to use
add_residuals()
,add_predictions()
, andgeom_ref_line()
to create residual versus predicted values plots -
Instructions with an example on how you interpret the observed versus predicted and residual versus predicted plots to determine whether the conditions for using a linear model are met: linearity, nearly normal residuals, and constant variability
-
Remember, these notes are ultimately for you, so think about what you would find helpful when you refer back to these.
Students will complete their annotations in separate branches of the final project repository, writing their annotations under the appropriate header. When your annotations are complete, you will need to have your work merged into the master
branch of the repository. Before you can do this, you will need to have your work edited and reviewed by one of your classmates. This will be completing using Github's Pull Request functionality. The instructor will set up the Pull Requests and assign the peer reviewers. In order to indicate that you are ready for a review, you should post the following in the discussion box of the Pull Request:
My annotations are ready for review @username
You will need to replace @username
with your reviewer's actual Github username. You should also send your reviewer a direct message on Slack or email him or her to begin the review. After you receive your feedback, you will use the suggestions to update and correct your submission. After your edits are complete and commited to your branch, message the instructor on Slack to check to see that you've satisfied the reviewer's comments. If you have, the instructor will approve the merge and add your contribution to the master version of the document.
-
First drafts of your annotations must be commited to your branch by Time on Date 1
-
Peer reviews must be completed by Time on Date 2
-
Final edits should be commited no later than Time on Date 3