Clay Rowland 11 December, 2018
This cheat sheet was developed for a course project in MGSC 790 Data Resource Management, a Business Analytics elective in the PMBA program of the Darla Moore School of Business at the University of South Carolina.
It is intended as a quick reference of data structures, functions, and packages for the R language. Please see the official Rdocumentation for a comprehensive reference.
This is not inclusive of all the R packages available. There are over 13,000 packages that have been developed to supplement the R language. Below are a handful that were covered during the Data Scientist With R track on Data Camp.
A vector is an array of homogenous datatypes.
A list is similar to a vector but can contain heterogeneous datatypes.
A matrix is a two-dimensional vector of homogenous datatypes.
An array is a vector with one or more dimensions. An array with one dimension is similar to a vector. An array with two dimensions is similar to a matrix. An array with three or more dimensions is an n-dimensional array.
A data frame is similar to a table in a database. Each column in the data frame holds a value of the same type, and the columns can (and should) be named.
A factor is an enumerated type representing a discrete number of categorical values.
# combine Values into a Vector or List
c("China", "India", "United States of America")
## [1] "China" "India"
## [3] "United States of America"
# creates a matrix from the given set of values.
matrix(1:9, nrow=3)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
# creates data frames
worldPop <- data.frame(country, population)
worldPop
## country population
## 1 China 1409517397
## 2 India 1339180127
## 3 United States of America 324459463
# return subsets of vectors, matrices or data frames which meet conditions.
subset(worldPop, country=="China")
## country population
## 1 China 1409517397
# encode a vector as a factor
factor(c("A", "B", "B", "C", "C", "C"))
## [1] A B B C C C
## Levels: A B C
# concatenate vectors
paste("this", "is", "a", "concatenation")
## [1] "this is a concatenation"
# apply a function over a list or vector and return a list
lapply(c("a", "b", "c"), toupper)
## [[1]]
## [1] "A"
##
## [[2]]
## [1] "B"
##
## [[3]]
## [1] "C"
x <- 1000
y <- 1000
# evaluate whether objects are equal
identical(x, y)
## [1] TRUE
# return the current date
Sys.Date()
## [1] "2018-12-11"
# return the current date and time
Sys.time()
## [1] "2018-12-11 09:12:58 EST"
# format an object for printing
format(Sys.Date(), "%B %d %Y")
## [1] "December 11 2018"
# produce result summaries of various model fitting functions
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.43 19.20 20.09 22.80 33.90
library(dplyr, warn.conflicts = FALSE)
dplyr provides a library for data manipulation, including the pipe operator %>%
which allows for the chaining of functions
# filtering and ordering
starwars %>%
filter(homeworld=="Naboo") %>% # subset the data based on criteria
mutate(taxa = ifelse(species == "Droid", "Machine", "Biological")) %>% # add a column to the data
select(taxa, species, name) %>% # limit the columns to those provided
na.omit() %>% # exclude records with NA as a value in a column
arrange(taxa, species, name) # order the result
## # A tibble: 9 x 3
## taxa species name
## <chr> <chr> <chr>
## 1 Biological Gungan Jar Jar Binks
## 2 Biological Gungan Roos Tarpals
## 3 Biological Gungan Rugor Nass
## 4 Biological Human Cordé
## 5 Biological Human Dormé
## 6 Biological Human Gregar Typho
## 7 Biological Human Padmé Amidala
## 8 Biological Human Palpatine
## 9 Machine Droid R2-D2
# aggregation
starwars %>%
filter(homeworld=="Naboo") %>%
group_by(species) %>% # aggregate by the columns provided
tally # count the results
## # A tibble: 4 x 2
## species n
## <chr> <int>
## 1 Droid 1
## 2 Gungan 3
## 3 Human 5
## 4 <NA> 2
library(ggplot2)
library(gapminder) # load gapminder dataset
ggplot2 provides a library for visualizing data
# scatter plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=gdpPercap, y=lifeExp, color=continent, size=pop)) +
geom_point() + # produce a scatter plot
scale_x_log10() + # use a logarithmic scale for the x axis
theme_minimal() + # minimize the lines/color in the chart
ggtitle("2007 Life Expectancy by GDP Per Capita") # provide a title
# bar plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
geom_bar(stat="identity", fill="steelblue") + # produce a bar graph
expand_limits(y=0) + # set the y-axis to start at 0
theme_minimal() +
ggtitle("2007 GDP Per Capita by Continent")
# line plot
gapminder %>% filter(country=="China") %>%
ggplot(aes(x=year, y=gdpPercap, color=country)) +
geom_line(stat="identity", color="blue") + # produce a line graph
expand_limits(y=0) +
theme_minimal() +
ggtitle("GDP Per Capita by Year in China")
Package Name | Purpose |
---|---|
httr | Working with HTTP resources |
jsonlite | Parsing and generating JSON |
gdata | Manipulating data |
readr | Importing tabular data file formats |
readxl | Reading Excel spreadsheets |
DBI | Database integration |
haven | Working with statistical software package (SAS, SPSS, Stata) files |
lubridate | Dealing with dates and times |
stringr | Working with strings |
R Markdown provides a library for generating documents (PDF, HTML, Markdown) that include R functions and output.
This cheat sheet was produced using R Markdown. The source code for producing it can be found on Github.
Data structure and function descriptions have been taken from the RDocumentation site.