Skip to content

Materials for the short course at the European Establishment Statistics Workshop 2019

Notifications You must be signed in to change notification settings

data-cleaning/EESW2019_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EESW2019_tutorial

Materials for the short course on Statistical Data Cleaning for Business Statistics at the European Establishment Statistics Workshop 2019


Contents

Slot 1

Topic time (m)
Introduction 20
Reading dirty data 30
Approximate matching 50
Data validation 50

Slot 2

Topic time (m)
Error localization 20
Imputation 50
Adjusting 20
Monitoring 30
Wrap-up 10

Course form

The course form is highly hands-on. Each topic starts with an approximately 10-15 minute session where you run and adapt some R code. Next, I will provide background and details on what you just did. After that there is a more in-depth assignment. Depending on time and topic we will discuss the topic more in-depth after that.

Prerequisites

Bring a laptop

Participants are expected to have a basic knowledge of R/RStudio, explicitly:

  • Work with the R command line and R scripts
  • Read/write CSV data
  • Some basic data manipulations and plots
  • I highly recommend working with RStudio projects.

Software needed for the course

  1. R See https://r-project.org
  2. (Recommended) Rstudio

Execute the following R code to install the necessary packages.

install.packages(c(
        "validate"
      , "errorlocate"
      , "simputation"
      , "rspa"
      , "daff"
      , "jsonlite"
      , "XML"
      , "readr"
      , "stringr"
      , "lumberjack")
  , dependencies=TRUE)

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

Materials for the short course at the European Establishment Statistics Workshop 2019

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published