-
Notifications
You must be signed in to change notification settings - Fork 2
zonagit/HadoopSparkEigenfaces
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
There are 2 relevant packages in this project: test eigenfaces a) In package test there is just one class TestSVD. It starts by generating an nxm=100x2000 matrix with a certain percentage of non zero values. This percentage is 5% of the entries by default but it can be changed through a command line arg. It then computes the SVD of this matrix using: a) Mahout Stochastic SVD solver (Hadoop based) b) Mahout sequential Stochastic SVD solver. c) Spark MLib implementation of Stochastic SVD. A comparison of the singular values of a) and c) is made against the singular values computed by b) and if they are not all within 10E-12 precision a message is printed out. To run TestSVD from eclipse create a run/debug configuration then enter arguments for command line. or use the jar (to create a jar file from this project do from within Eclipse Right Click the project->Export Java Jar file->Enter location and name of jar) To run it from the jar run it either from a local machine (rather than using hdfs-this is ok because the matrix is fairly small). java -cp RandomizedSVD.jar:lib/*:. test.TestSVD 15 some_local_folder The first argument is the percent of non zero entries in the randomly generated matrix and the second argument is some location where the matrix, and the output of the mahout svd is written to. RandomizedSVD.jar is *this* jar and it is assumed that in the same folder there is a subfolder lib containing all the jars in the lib folder of the git commit + the spark-assembly-1.1.1-hadoop2.4.0.jar which is too big for git (it is available at /mnt/scratch/u0082100/RandomizedSVD/lib in apt023) The code will output the top 100 singular values produced by each of the 3 methods. To run using hadoop an extra argument is needed specifying the location of a remote (hdfs) folder [The initial matrix is written to the local path then copied to the remote path for SVD computation] Here it is an example cmd /usr/local/hadoop-2.5.0/bin/hadoop jar RandomizedSVD.jar test.TestSVD 5 /mnt/scratch/u0082100/RandomizedSVD/test /user/u0082100/test b) In package eigenfaces There are several Main programs that all implement similar functionality but using different algorithms/implementations of computing the SVD b.1) EigenfacesMain.java This uses Mahout's Distributed version of the Lanczos algorithm to compute the eigenvectors of the covariance matrix Here it is an example run command /usr/local/hadoop-2.5.0/bin/hadoop jar RandomizedSVD.jar eigenfaces.EigenFacesMain 10 /mnt/scratch/u0082100/RandomizedSVD /user/u0082100 the first argument (10 above) is the rank, the second is a local directory and the third is a folder in the hdfs system b.2) EigenFacesSSVDMain.java This uses Mahout's Stochastic SVD to compute the SVD decomposition of the covariance matrix and the right eigenvectors that allows computation of the eigenfaces Here it is an example cmd /usr/local/hadoop-2.5.0/bin/hadoop jar RandomizedSVD.jar eigenfaces.EigenFacesSSVDMain 10 /mnt/scratch/u0082100/RandomizedSVD /user/u0082100 the first argument (10 above) is the rank, the second is a local directory and the third is a folder in the hdfs system b.3) EingenFacesSparkSVD.java This uses Spark's MLlib Stochastic SVD implementation to compute the SVD decomposition of the covariance matrix and the right eigenvectors that allows computation of the eigenfaces Here it is an example cmd /usr/local/hadoop-2.5.0/bin/hadoop jar RandomizedSVD.jar eigenfaces.EigenFacesSparkMain 10 /mnt/scratch/u0082100/RandomizedSVD /user/u0082100 the first argument (10 above) is the rank, the second is a local directory and the third is a folder in the hdfs system This needs the spark-assembly-1.1.1-hadoop2.4.0.jar in order for spark to be able to run from within hadoop. The code assumes that there are 2 directories called training-set and testing-set off the local path. So in the examples above they would be at /mnt/scratch/u0082100/RandomizedSVD/training-set /mnt/scratch/u0082100/RandomizedSVD/testing-set
About
SVD computation via Hadoop and Spark for Eigenfaces face recognition
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published