EESW2019_tutorial

Materials for the short course on Statistical Data Cleaning for Business Statistics at the European Establishment Statistics Workshop 2019

Course form

The course form is highly hands-on. Each topic starts with an approximately 10-15 minute session where you run and adapt some R code. Next, I will provide background and details on what you just did. After that there is a more in-depth assignment. Depending on time and topic we will discuss the topic more in-depth after that.

Prerequisites

Bring a laptop

Participants are expected to have a basic knowledge of R/RStudio, explicitly:

Work with the R command line and R scripts
Read/write CSV data
Some basic data manipulations and plots
I highly recommend working with RStudio projects.

Software needed for the course

R See https://r-project.org
(Recommended) Rstudio

Execute the following R code to install the necessary packages.

install.packages(c(
        "validate"
      , "errorlocate"
      , "simputation"
      , "rspa"
      , "daff"
      , "jsonlite"
      , "XML"
      , "readr"
      , "stringr"
      , "lumberjack")
  , dependencies=TRUE)

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Topic	time (m)
Introduction	20
Reading dirty data	30
Approximate matching	50
Data validation	50

Topic	time (m)
Error localization	20
Imputation	50
Adjusting	20
Monitoring	30
Wrap-up	10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EESW2019_tutorial

Contents

Course form

Prerequisites

Software needed for the course

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

EESW2019_tutorial

Contents

Course form

Prerequisites

Software needed for the course

License