Skip to content

Latest commit

 

History

History
72 lines (63 loc) · 9.63 KB

README_toc.md

File metadata and controls

72 lines (63 loc) · 9.63 KB

1 Survey

  1. An In-class Survey: rmd | r | pdf | html
    • create a tibble dataset
    • draw 10 random students from 50 and build a survey
    • r: factor() + ifelse()
    • dplyr: group_by() + mutate() + summarise()
    • tibble: add_row()
    • readr: write_csv()

2 Dataset, Tables and Graphs

  1. Opening a Dataset: rmd | r | pdf | html
    • Opening a Dataset.
    • r: setwd()
    • readr: write_csv()
  2. One Variable Graphs and Tables: rmd | r | pdf | html
    • Frequency table, bar chart and histogram.
    • R function and lapply to generate graphs/tables for different variables.
    • r: c('word1','word2') + function() + for (ctr in c(1,2)) {} + lapply()
    • dplyr: group_by() + summarize() + n()
    • ggplot: geom_bar() + geom_histogram() + labs(title = 'title', caption = 'caption')
  3. Multiple Variables Graphs and Tables: rmd | r | pdf | html
    • Two-way frequency table, stacked bar chart annd scatter-plot
    • r: interaction()
    • dplyr: group_by(var) + summarize(freq = n()) + spread(gender, freq)
    • ggplot: aes(x,y,fill) + geom_bar(stat='identity', fun.y='mean', position='dodge') + geom_point(size) + geom_text(size,hjust,vjust) + geom_smooth(method=lm) + labs(title,x,y,caption)

3 Summarizing Data

  1. Mean and Standard Deviation: rmd | r | pdf | html
    • Mean and standard deviation from a dataset with city-month temperatures.
    • r: dim() + min() + ceiling() + lapply() + vector(mode="character",length) + substring(var, first, last) + func <- function(return(list))
    • dplyr: mutate() + select() + filter()
    • tidyr: gather(vara, val, -varb)
    • rlang: !!sym(str_var_name)
    • ggplot: aes(x, y, colour, linetype, shape) + facet_wrap(~var, scales='free_y') + geom_line() + geom_point() + geom_jitter(size, width) + scale_x_continuous(labels, breaks)
  2. Rescaling Standard Deviation and Covariance: rmd | r | pdf | html
    • Scatter-plot of a dataset with state-level wage and education data.
    • Coefficient of variation and standard deviation, correlation and covariance.
    • r: mean() + sd() + var() + cov() + cor()
    • ggplot: geom_point(size) + geom_text() + geom_smooth()

4 Basics of Probability

  1. Sample Space, Experimental Outcomes, Events, Probabilities: rmd | r | pdf | html
    • Sample Space, Experimental Outcomes, Events and Probability.
    • Union, intersection and complements
    • conditional probability
  2. Examples of Sample Space and Probabilities: rmd | r | pdf | html
    • Throwing a quarter, four candidates for election, six-sided unfair dice, two basketball games
    • r: sample(size, replace, prob)
  3. Law of Large Number Unfair Dice: rmd | r | pdf | html
    • Throw an unfair dice many times, law of large number.
    • r: head() + tail() + factor() + sample() + as.numeric() + paste0('dice=', var) + sprintf('%0.3f', 1.1234) + sprintf("P(S=1)=%0.3f, P(S=2)=%0.3f", 1.1, 1.2345)
    • stringr: str_extract() + as.numeric(str_extract(variable, "[^.n]+$")))
    • dplyr: mutate(!!str_mean_var := as.numeric(sprintf('%0.5f', freq / sum(freq))))
    • ggplot: geom_line() + scale_x_continuous(trans='log10', labels=c('n=100', 'n=1000'), breaks=c(100, 1000))
  4. Multiple-Step Experiment: Playing the Lottery Three times: rmd | r | pdf | html
    • Paths after 1, 2 and 3 plays.

5 Discrete Probability Distribution

  1. Discrete Random Variable and Binomial Experiment: rmd | r | pdf | html
    • Discrete Random Variable, expected value and variance.
    • Binomial Properties, examples using USA larceny clearance rate, WWII German soldier survival rate
    • r: dbinom() + pbinom() + sprintf(paste0('abc\n', 'efg = %s'), 'opq') + round(1.123, 2) + lapply()
    • ggplot: df %>% ggplot(aes(x)) + geom_bar(aes(y=prob), stat='identity', alpha=0.5, width=0.5, fill) + geom_text(aes(y=prob, label=paste0(sprintf('%2.1f', p), '%')), vjust, size, color, fontface) + labs(title, x, y, caption) + scale_y_continuous(sec.axis, name) + + scale_x_continuous(labels, breaks) + theme(axis.text.y, axis.text.y.right, axis.text.y.left)
  2. Poisson Probability Distribution: rmd | r | pdf | html
    • Poisson Properties, Ladislaus Bortkiewicz and Prussian army horse-kick deaths.
    • r: dpois() + ppois()
    • ggplot: geom_bar() + geom_text() + gome_line() + geom_point() + geom_text() + labs() + scale_y_continuous() + scale_x_continuous() + theme()