Course project for the Getting and cleaning Data course
##Course project objectives are:
- Test the abibility to collect data, check and verify the correctness of the provided data and then prepare a tidy data set.
- Provide a Codebook that explains the variables in the Data, including the steps and transformations required to create the tidy data set.
- Provide a script that when executed will provide the tidy data set.
###Testing the the data file in R Follow the following steps to import/read the text file (exported version of the cleaned data file) in R to be able to view the correctness of the data, this at the same time will help to view if the object of the course was reached. The approach follows some of the ideas discussed by an article by Hadley Wickham (See sources below)
- Create a working directory for testing the R analysis script. This instructions was only tested on a Linux OS
- Download and extract the data files from the lingk provided.
- Open R (Or R studio)
- Set your working directory to the directory where the data was copied and extracted (from item 2)
- Download the run_analysis.R from the git Repo (https://github.com/mtbbiker/datacleaning/blob/master/run_analysis.R)
- Change on the first line of the "run_analysis.R" the Path to where you extracted the Data, PLEASE NOTE: the directory structure as spesified in "run_analysis.R" could be different if run on a windows machine
- Load the script in R (or RStudio) (source("run_analysis.R")). The script will set the Paths to the Data directory, and create a Tidy data set that is wriiten to a file in your Data directory, the file will benamed "avg_tidydata.txt"
- Now to test the data, execute the following commandin R )or R studio) to read the "avg_tidydata.txt" into a Datatable. data <- read.table(file_path, header = TRUE) Where "file_path" points to the directory where the tidydata.txt is stored. View(data)
###What the script does Executing the script, it creates a new wide tidy dataset that has the following characteristics: ####Tidy data satisfies three conditions:
- Each variable forms a column, a 3-axis Accelremoter and Gyroscope where used to measure data in 3 dimensions. For this example each dimension (X,Y,Z) as considered a variable. Melting the data into columns could potentialy created NA values.
- Each observation forms a row
- Each type of observational unit forms a table (in this case tidydata.txt)
##Sources
- Codebook Template used from work done by :https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41
- Link to Hadley Wickham's article about tidying data. http://vita.had.co.nz/papers/tidy-data.pdf