Skip to content
/ HAR Public

A solution of Human Activity Recognition with Smartphones challenge hosted on Kaggle

License

Notifications You must be signed in to change notification settings

nicofretti/HAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Activity Recognition with Smartphones (HAR)

This project contains my solution of the HAR problem hosted on Kaggle. The accuracy of the model is around 0.95.

Accuracy: 0.953

Analysis

The dataset has a total of 561 features, and it is divided into two sets:

  • Train: 7352 samples
  • Test: 2947 samples

The dataset is well-formed, and the activities are distributed among the samples.

The number of features is quite large, so an initial step is to try to reduce the number of features.

Dimensionality Reduction

In this case I have used the PCA algorithm to reduce the number of features. In the file tools.py there is the function PCA that performs the PCA's operation and returns the projection matrix that can be used to transform the data:

pca_proj = tools.PCA(x_train, n_eigenvectors)
pca_data = np.matmul(x_train, pca_proj.T)

The variable n_eigenvectory is the number of eigenvectors to be used, looking at the plot the number of eigenvectors for a correct coverage of 99% of the variance is around 154:

After applying the PCA algorithm the data are transformed into the new space where the activities are visibly separated:

After applying PCA to reduce even more the number of features, I have applied LDA, in order to reduce the number of features to C-1 where C is the number of classes:

lda_proj = tools.LDA(pca_data, y_train, n_classes=6)
lda_data = np.matmul(pca_data, lda_proj.T)

This way the data is transformed into the new space where the separability of the activities looks better:

Classification

For this step I have used the sklearn library to perform the classification. I choose the KNeighborsClassifier algorithm, because looking at the plot it is clear that there are some blobs where the classes are not well separated:

knn = KNeighborsClassifier(n_neighbors=20)
knn.fit(lda_data, y_train)

Conclusion

The number of features has been reduced from 561 to only 5 and the accuracy of the model is 0.95, looking at the confusion matrix it is clear that the model makes the wrong prediction with the classes SITTING and STANDING, as expected, because in the plot the two classes are still not well separated.

About

A solution of Human Activity Recognition with Smartphones challenge hosted on Kaggle

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages