Scalable PCA (sPCA) is a scalable implementation of Principal component analysis (PCA) on top of Spark and MapReduce. sPCA achieves scalability via employing efficient large matrix operations, effectively leveraging matrix sparsity, and minimizing intermediate data. The repository contains two README files that will take you through running sPCA on Spark and MapReduce, respectively: (sPCA-Spark README, sPCA-mapreduce README).
- Ashraf Aboulnaga
- Mohamed Hefeeda
- Tarek Elgamal
- Maysam Yabandeh
- Waleed Mustafa
-
T. Elgamal, M. Yabandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda. sPCA: Scalable Principal Component Analysis fo Big Data on Distributed Platforms. In Proc. of ACM SIGMOD’15, Melbourne, Australia, May 2015. [pdf] [bibtex]
-
T. Elgamal and M. Hefeeda. Analysis of PCA Algorithms in Distributed Environments. Technical Report arXiv:1503.05214. [pdf][bibtex]
sPCA is released under the terms of the MIT License.
For any issues or enhancement please use the issue pages in Github, or contact us. We will try our best to help you sort it out.