GitHub

Project Description

Course project for the Getting and cleaning Data course

##Course project objectives are:

Test the abibility to collect data, check and verify the correctness of the provided data and then prepare a tidy data set.
Provide a Codebook that explains the variables in the Data, including the steps and transformations required to create the tidy data set.
Provide a script that when executed will provide the tidy data set.

###Testing the the data file in R Follow the following steps to import/read the text file (exported version of the cleaned data file) in R to be able to view the correctness of the data, this at the same time will help to view if the object of the course was reached. The approach follows some of the ideas discussed by an article by Hadley Wickham (See sources below)

Create a working directory for testing the R analysis script. This instructions was only tested on a Linux OS
Download and extract the data files from the lingk provided.
Open R (Or R studio)
Set your working directory to the directory where the data was copied and extracted (from item 2)
Download the run_analysis.R from the git Repo (https://github.com/mtbbiker/datacleaning/blob/master/run_analysis.R)
Change on the first line of the "run_analysis.R" the Path to where you extracted the Data, PLEASE NOTE: the directory structure as spesified in "run_analysis.R" could be different if run on a windows machine
Load the script in R (or RStudio) (source("run_analysis.R")). The script will set the Paths to the Data directory, and create a Tidy data set that is wriiten to a file in your Data directory, the file will benamed "avg_tidydata.txt"
Now to test the data, execute the following commandin R )or R studio) to read the "avg_tidydata.txt" into a Datatable. data <- read.table(file_path, header = TRUE) Where "file_path" points to the directory where the tidydata.txt is stored. View(data)

###What the script does Executing the script, it creates a new wide tidy dataset that has the following characteristics: ####Tidy data satisfies three conditions:

Each variable forms a column, a 3-axis Accelremoter and Gyroscope where used to measure data in 3 dimensions. For this example each dimension (X,Y,Z) as considered a variable. Melting the data into columns could potentialy created NA values.
Each observation forms a row
Each type of observational unit forms a table (in this case tidydata.txt)

##Sources

Codebook Template used from work done by :https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41
Link to Hadley Wickham's article about tidying data. http://vita.had.co.nz/papers/tidy-data.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Codebook.md		Codebook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

About

Releases

Packages

Languages

mtbbiker/datacleaning

Folders and files

Latest commit

History

Repository files navigation

Project Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages