R-Cheat-Sheet

Clay Rowland 11 December, 2018

This cheat sheet was developed for a course project in MGSC 790 Data Resource Management, a Business Analytics elective in the PMBA program of the Darla Moore School of Business at the University of South Carolina.

It is intended as a quick reference of data structures, functions, and packages for the R language. Please see the official Rdocumentation for a comprehensive reference.

This is not inclusive of all the R packages available. There are over 13,000 packages that have been developed to supplement the R language. Below are a handful that were covered during the Data Scientist With R track on Data Camp.

A little bit of Base R

Data structures

A vector is an array of homogenous datatypes.

A list is similar to a vector but can contain heterogeneous datatypes.

A matrix is a two-dimensional vector of homogenous datatypes.

An array is a vector with one or more dimensions. An array with one dimension is similar to a vector. An array with two dimensions is similar to a matrix. An array with three or more dimensions is an n-dimensional array.

A data frame is similar to a table in a database. Each column in the data frame holds a value of the same type, and the columns can (and should) be named.

A factor is an enumerated type representing a discrete number of categorical values.

Base functions

c

# combine Values into a Vector or List
c("China", "India", "United States of America")

## [1] "China"                    "India"                   
## [3] "United States of America"

matrix

# creates a matrix from the given set of values.
matrix(1:9, nrow=3)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

data.frame

# creates data frames
worldPop <- data.frame(country, population)
worldPop

##                    country population
## 1                    China 1409517397
## 2                    India 1339180127
## 3 United States of America  324459463

subset

# return subsets of vectors, matrices or data frames which meet conditions.
subset(worldPop, country=="China")

##   country population
## 1   China 1409517397

factor

# encode a vector as a factor
factor(c("A", "B", "B",  "C", "C", "C"))

## [1] A B B C C C
## Levels: A B C

paste

# concatenate vectors
paste("this", "is", "a", "concatenation")

## [1] "this is a concatenation"

lapply

# apply a function over a list or vector and return a list
lapply(c("a", "b", "c"), toupper)

## [[1]]
## [1] "A"
## 
## [[2]]
## [1] "B"
## 
## [[3]]
## [1] "C"

identical

x <- 1000
y <- 1000

# evaluate whether objects are equal
identical(x, y)

## [1] TRUE

Sys.Date

# return the current date
Sys.Date()

## [1] "2018-12-11"

Sys.time

# return the current date and time
Sys.time()

## [1] "2018-12-11 09:12:58 EST"

format

# format an object for printing
format(Sys.Date(), "%B %d %Y")

## [1] "December 11 2018"

summary

# produce result summaries of various model fitting functions
summary(mtcars$mpg)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90

Data manipulation & visualization

dplyr

library(dplyr, warn.conflicts = FALSE)

dplyr provides a library for data manipulation, including the pipe operator %>% which allows for the chaining of functions

# filtering and ordering
starwars %>%
  filter(homeworld=="Naboo") %>% # subset the data based on criteria
  mutate(taxa = ifelse(species == "Droid", "Machine", "Biological")) %>% # add a column to the data
  select(taxa, species, name) %>% # limit the columns to those provided
  na.omit() %>% # exclude records with NA as a value in a column
  arrange(taxa, species, name) # order the result

## # A tibble: 9 x 3
##   taxa       species name         
##   <chr>      <chr>   <chr>        
## 1 Biological Gungan  Jar Jar Binks
## 2 Biological Gungan  Roos Tarpals 
## 3 Biological Gungan  Rugor Nass   
## 4 Biological Human   Cordé        
## 5 Biological Human   Dormé        
## 6 Biological Human   Gregar Typho 
## 7 Biological Human   Padmé Amidala
## 8 Biological Human   Palpatine    
## 9 Machine    Droid   R2-D2

# aggregation
starwars %>%
  filter(homeworld=="Naboo") %>%
  group_by(species) %>% # aggregate by the columns provided
  tally # count the results

## # A tibble: 4 x 2
##   species     n
##   <chr>   <int>
## 1 Droid       1
## 2 Gungan      3
## 3 Human       5
## 4 <NA>        2

ggplot2

library(ggplot2)
library(gapminder) # load gapminder dataset

ggplot2 provides a library for visualizing data

# scatter plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=gdpPercap, y=lifeExp, color=continent, size=pop)) +
  geom_point() +  # produce a scatter plot
  scale_x_log10() + # use a logarithmic scale for the x axis
  theme_minimal() + # minimize the lines/color in the chart
  ggtitle("2007 Life Expectancy by GDP Per Capita") # provide a title

# bar plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
  geom_bar(stat="identity", fill="steelblue") + # produce a bar graph
  expand_limits(y=0) + # set the y-axis to start at 0
  theme_minimal() +
  ggtitle("2007 GDP Per Capita by Continent")

# line plot
gapminder %>% filter(country=="China") %>%
ggplot(aes(x=year, y=gdpPercap, color=country)) +
  geom_line(stat="identity", color="blue") + # produce a line graph
  expand_limits(y=0) +
  theme_minimal() +
  ggtitle("GDP Per Capita by Year in China")

A handful of R packages

Package Name	Purpose
httr	Working with HTTP resources
jsonlite	Parsing and generating JSON
gdata	Manipulating data
readr	Importing tabular data file formats
readxl	Reading Excel spreadsheets
DBI	Database integration
haven	Working with statistical software package (SAS, SPSS, Stata) files
lubridate	Dealing with dates and times
stringr	Working with strings

R Markdown

R Markdown provides a library for generating documents (PDF, HTML, Markdown) that include R functions and output.

This cheat sheet was produced using R Markdown. The source code for producing it can be found on Github.

References

Data structure and function descriptions have been taken from the RDocumentation site.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README_files/figure-markdown		README_files/figure-markdown
R-Cheat-Sheet.Rmd		R-Cheat-Sheet.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R-Cheat-Sheet

Table of Contents

A little bit of Base R

Data manipulation & visualization

A handful of R packages

A little bit of Base R

Data structures

Base functions

Data manipulation & visualization

ggplot2

A handful of R packages

R Markdown

References

About

Releases

Packages

rowlanch/my-R-cheat-sheet

Folders and files

Latest commit

History

Repository files navigation

R-Cheat-Sheet

Table of Contents

A little bit of Base R

Data manipulation & visualization

A handful of R packages

A little bit of Base R

Data structures

Base functions

Data manipulation & visualization

ggplot2

A handful of R packages

R Markdown

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages