Samsung Wearables Data Analysis

Project for Getting and Cleaning Data Course on Coursera

The purpose of the R script run_script.r is to collect and process data from the "UCI HAR Dataset" and present it as a "tidy" dataset.

The sequence of commands in the data processing script are summarized below:

The script starts (lines 5 and 6) by reading in the activity (e.g. walking, standing, etc) and feature (i.e body acceleration, gravity acceleration, etc) definitions, both as character vectors.
Next, lines 8-29 performs the following sequence on both the test and training datasets:
Reads in the X file, which contains the time series for all of the accelerometer data, into a variable called (test/train)DataTimeSeries. This winds up being a 561-column data frame.
Reads in the subject file, which contains the time series for all of the subjects. This is a 1-column data frame called (test/train)SubjectTimeSeries with several repeated values, as presumably several time series points are recorded for a single subject.
Reads in the y file, which contains activity codes for the time series. This is a 1-column data frame called (test/train)ActivityCodeTimeSeries
Accomplishing Objective 3, the (test/train)ActivityCodeTimeSeries is translated from codes into activity names by subsetting the activitydefs character vector.
Finally, all of the data is integrated into one data frame called (test/train)TimeSeries using cbind. The use of [[1]] is to ensure that a vector is passed to the function rather than a 1-column data frame, which cbind apparently can't handle. This results in a 563-column data frame.
Line 35 integrates all data from both test and training data sets by using rbind, accomplishing Objective 1.
Line 38 accomplishes Objective 2 by filtering only the columns containing "mean" or "std" designations. A grep command is used for this purpose. The first two columns (Subject and Activity) are retained. The column count is reduced to 68 after this operation.
Finally, Objective 5 is accomplished in line 41 by an aggregate command. This results in the final variable that is used to output the Tidy data set, MeanStdSummary, which averages all of the "mean" and "std" data columns by Subject and Activity.

It is my contention that Objective 4 is accomplished by using the feature and activity definitions supplied, in addition to the code book.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_script.R		run_script.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Samsung Wearables Data Analysis

About

Releases

Packages

Languages

Masonavic/WearablesProject

Folders and files

Latest commit

History

Repository files navigation

Samsung Wearables Data Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages