The input data, Human Activity Recognition Using Smartphones Data Set, is obtained from UCI Machine Learning repository. There are 2 data layers Raw sensory data and post-processed data
The data as captured by accelerometer and gyroscope sensors of a Samsung smartphone at a constant speed of 50Hz by 30 different individuals (subjects) aged from 19 to 48 as they perform activities of daily living (ADL).
The data is first cleaned and normalized and bounded within [-1,1], using the median filter and and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz.
Subsequently, the acceleration signal was then separated into body and gravity acceleration signals components(tBodyAcc-XYZ
and tGravityAcc-XYZ
). and the following additional variables were derived:
- The body linear acceleration and angular velocity were derived in time to obtain Jerk signal (
tBodyAccJerk-XYZ
andtBodyGyroJerk-XYZ
) - The magnitude of the above signals were calculated using the Euclidean norm (
tBodyAccMag
,tGravityAccMag
,tBodyAccJerkMag
,tBodyGyroMag
,tBodyGyroJerkMag
) - Some frequency domain variables were synthesized via the Fast Fourier Transform (FFT)(
fBodyAcc-XYZ
,fBodyAccJerk-XYZ
,fBodyGyro-XYZ
,fBodyAccJerkMag
,fBodyGyroMag
,fBodyGyroJerkMag
)
The set of variables that were estimated from these signals are:
mean()
Mean valuestd()
Standard deviationmad()
Median absolute deviationmax()
Largest value in arraymin()
Smallest value in arraysma()
Signal magnitude areaenergy()
Energy measure. Sum of the squares divided by the number of values.iqr()
Interquartile rangeentropy()
Signal entropyarCoeff()
Autorregresion coefficients with Burg order equal to 4correlation()
correlation coefficient between two signalsmaxInds()
index of the frequency component with largest magnitudemeanFreq()
Weighted average of the frequency components to obtain a mean frequencyskewness()
skewness of the frequency domain signalkurtosis()
kurtosis of the frequency domain signalbandsEnergy()
Energy of a frequency interval within the 64 bins of the FFT of each window.angle()
Angle between to vectors.
Additional vectors obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
gravityMean
tBodyAccMean
tBodyAccJerkMean
tBodyGyroMean
tBodyGyroJerkMean
Description of the stored Input data can be found in the README file
The transformation analysis is detailed in the README file
There are 69 columns divided into 2 column types categories and sensor derived (Derived from the observation of the sensors)
There are 3 category columns
Name | Class | Values | Comment |
---|---|---|---|
type | character |
train , test |
Origin of observation |
activity | character |
standing ,sitting ,laying ,walking ,walking_downstairs ,walking_upstairs |
|
subject | integer |
1-30 | Subject id |
There are 66 sensor derivded variables, that are ordered in the following table by their description
and the data domain (Time domain, or Frequency domain). They are all of type numeric
and their values are normalized and bounded within [-1,1]
Description | Time domain variable | Frequency domain variable |
---|---|---|
Body Acceleration | tBodyAcc-mean-XYZ | fBodyAcc-mean-XYZ |
| tBodyAcc-std-XYZ | fBodyAcc-std-XYZ
Gravity Acceleration | tGravityAcc-mean-XYZ |
| tGravityAcc-std-XYZ |
Body Acceleration Jerk | tBodyAccJerk-mean-XYZ | fBodyAccJerk-mean-XYZ
| tBodyAccJerk-std-XYZ | fBodyAccJerk-std-XYZ
Body Angular Speed | tBodyGyro-mean-XYZ | fBodyGyro-mean-XYZ
| tBodyGyro-std-XYZ | fBodyGyro-std-XYZ
Body Angular Acceleration | tBodyGyroJerk-mean-XYZ |
| tBodyGyroJerk-std-XYZ |
Body Acceleration Magnitude | tBodyAcckMag-mean | fBodyAccMag-mean
| tBodyAcckMag-std | fBodyAccMag-std
Gravity Acceleration Magnitude | tGravityAccMag-mean |
| tGravityAccMag-std |
Body Acceleration Jerk Magnitude | tBodyAccJerkMag-mean | fBodyAccJerkMag-mean
| tBodyAccJerkMag-std | fBodyAccJerkMag-std
Body Angular Speed Magnitude | tBodyGyroMag-mean | fBodyGyroMag-mean
| tBodyGyroMag-std | fBodyGyroMag-std
Body Angular Acceleration Magnitude | tBodyGyroJerkMag-mean | fBodyGyroJerkMag-mean
| tBodyGyroJerkMag-std | fBodyGyroJerkMag-std
Notes:
- -XYZ is used to denote 3-axial signals in the X, Y and Z directions, i.e variable described as tBodyAcc-mean-XYZ, is actually a description of 3 variables tBodyAcc-mean-X, tBodyAcc-mean-Y and tBodyAcc-Z.
- The value of a variable whose name contain
mean
string is the average of all the mean observations for a specific subject, activity pair. - The value of a variable whose name contain
std
string is the average of standard deviation of all observations for a specific subject, activity pair. - There are 180 observation rows in the tidy dataset.
- prefixes 'f' and 't' denote frequency domain and time domain repectively.
The data is output to a file named tidy.txt
in the current working directory.
An example code to read it in the R langauge is
> View(read.table('tidy.txt', header = TRUE))