In this course we will introduce the basic ideas of Data Science and we will implement them using the R programming language. We will use the Tidyverse, which is a collection of R packages that facilitate data import, manipulation, encoding, exploration and visualisation.
-
To understand basic concepts of data science and how to implement them in R using the Tidyverse.
-
To learn how to extract and communicate insights retrieved through data analysis.
Introduction to R and RStudio. Workflow. Tidy data. The Tidyverse ecosystem. Data import. Tibbles. Dplyr basics. Pipes.
Dplyr verbs. Numerical summaries. SQL and Dplyr.
Factors. The package forcats. Modifying factor order. Modifying factors levels.
Mutating joins. Filtering joins. Set operations.
Introduction to ggplot2. Creating a ggplot. Aesthetic mappings. Geometric objects.
More geometric objects. Themes.
Visualising distributions. Typical vs unusual values. Missing values.
Covariation. A categorical and continuous variable. Two categorical variables. Two continuous variables.