Skip to content

rowlanch/my-R-cheat-sheet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

R-Cheat-Sheet

Clay Rowland 11 December, 2018

This cheat sheet was developed for a course project in MGSC 790 Data Resource Management, a Business Analytics elective in the PMBA program of the Darla Moore School of Business at the University of South Carolina.

It is intended as a quick reference of data structures, functions, and packages for the R language. Please see the official Rdocumentation for a comprehensive reference.

This is not inclusive of all the R packages available. There are over 13,000 packages that have been developed to supplement the R language. Below are a handful that were covered during the Data Scientist With R track on Data Camp.


Table of Contents

A little bit of Base R

  Data structures

  Base functions

Data manipulation & visualization

  dplyr

  ggplot2

A handful of R packages

  httr

  jsonlite

  gdata

  readr

  readxl

  DBI

  haven

  RMarkdown


A little bit of Base R

Data structures

A vector is an array of homogenous datatypes.

A list is similar to a vector but can contain heterogeneous datatypes.

A matrix is a two-dimensional vector of homogenous datatypes.

An array is a vector with one or more dimensions. An array with one dimension is similar to a vector. An array with two dimensions is similar to a matrix. An array with three or more dimensions is an n-dimensional array.

A data frame is similar to a table in a database. Each column in the data frame holds a value of the same type, and the columns can (and should) be named.

A factor is an enumerated type representing a discrete number of categorical values.

Base functions

c

# combine Values into a Vector or List
c("China", "India", "United States of America")
## [1] "China"                    "India"                   
## [3] "United States of America"

matrix

# creates a matrix from the given set of values.
matrix(1:9, nrow=3)
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

data.frame

# creates data frames
worldPop <- data.frame(country, population)
worldPop
##                    country population
## 1                    China 1409517397
## 2                    India 1339180127
## 3 United States of America  324459463

subset

# return subsets of vectors, matrices or data frames which meet conditions.
subset(worldPop, country=="China")
##   country population
## 1   China 1409517397

factor

# encode a vector as a factor
factor(c("A", "B", "B",  "C", "C", "C"))
## [1] A B B C C C
## Levels: A B C

paste

# concatenate vectors
paste("this", "is", "a", "concatenation")
## [1] "this is a concatenation"

lapply

# apply a function over a list or vector and return a list
lapply(c("a", "b", "c"), toupper)
## [[1]]
## [1] "A"
## 
## [[2]]
## [1] "B"
## 
## [[3]]
## [1] "C"

identical

x <- 1000
y <- 1000

# evaluate whether objects are equal
identical(x, y)
## [1] TRUE

Sys.Date

# return the current date
Sys.Date()
## [1] "2018-12-11"

Sys.time

# return the current date and time
Sys.time()
## [1] "2018-12-11 09:12:58 EST"

format

# format an object for printing
format(Sys.Date(), "%B %d %Y")
## [1] "December 11 2018"

summary

# produce result summaries of various model fitting functions
summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90

Data manipulation & visualization


dplyr

library(dplyr, warn.conflicts = FALSE)

dplyr provides a library for data manipulation, including the pipe operator %>% which allows for the chaining of functions

# filtering and ordering
starwars %>%
  filter(homeworld=="Naboo") %>% # subset the data based on criteria
  mutate(taxa = ifelse(species == "Droid", "Machine", "Biological")) %>% # add a column to the data
  select(taxa, species, name) %>% # limit the columns to those provided
  na.omit() %>% # exclude records with NA as a value in a column
  arrange(taxa, species, name) # order the result
## # A tibble: 9 x 3
##   taxa       species name         
##   <chr>      <chr>   <chr>        
## 1 Biological Gungan  Jar Jar Binks
## 2 Biological Gungan  Roos Tarpals 
## 3 Biological Gungan  Rugor Nass   
## 4 Biological Human   Cordé        
## 5 Biological Human   Dormé        
## 6 Biological Human   Gregar Typho 
## 7 Biological Human   Padmé Amidala
## 8 Biological Human   Palpatine    
## 9 Machine    Droid   R2-D2
# aggregation
starwars %>%
  filter(homeworld=="Naboo") %>%
  group_by(species) %>% # aggregate by the columns provided
  tally # count the results
## # A tibble: 4 x 2
##   species     n
##   <chr>   <int>
## 1 Droid       1
## 2 Gungan      3
## 3 Human       5
## 4 <NA>        2
library(ggplot2)
library(gapminder) # load gapminder dataset

ggplot2 provides a library for visualizing data

# scatter plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=gdpPercap, y=lifeExp, color=continent, size=pop)) +
  geom_point() +  # produce a scatter plot
  scale_x_log10() + # use a logarithmic scale for the x axis
  theme_minimal() + # minimize the lines/color in the chart
  ggtitle("2007 Life Expectancy by GDP Per Capita") # provide a title

# bar plot
gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
  geom_bar(stat="identity", fill="steelblue") + # produce a bar graph
  expand_limits(y=0) + # set the y-axis to start at 0
  theme_minimal() +
  ggtitle("2007 GDP Per Capita by Continent")

# line plot
gapminder %>% filter(country=="China") %>%
ggplot(aes(x=year, y=gdpPercap, color=country)) +
  geom_line(stat="identity", color="blue") + # produce a line graph
  expand_limits(y=0) +
  theme_minimal() +
  ggtitle("GDP Per Capita by Year in China")

A handful of R packages

Package Name Purpose
httr Working with HTTP resources
jsonlite Parsing and generating JSON
gdata Manipulating data
readr Importing tabular data file formats
readxl Reading Excel spreadsheets
DBI Database integration
haven Working with statistical software package (SAS, SPSS, Stata) files
lubridate Dealing with dates and times
stringr Working with strings

R Markdown provides a library for generating documents (PDF, HTML, Markdown) that include R functions and output.

This cheat sheet was produced using R Markdown. The source code for producing it can be found on Github.


References

Data structure and function descriptions have been taken from the RDocumentation site.


About

A cheat sheet for R functions and packages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published