PageRank

Instructions for the page rank algorithm applications running

For Hadoop Page Rank running:

open the VM with the IP = 172.16.3.166
go on hadoop with ssh hadoop@172.16.3.166
start-dfs.sh
start-yarn.sh
go into the pagerank folder -> cd pagerank
compile the code with -> mvn clean package
run of the application passing parameters "input name", "output name", "iterations" -> hadoop jar target/pagerank-1.0-SNAPSHOT.jar it.unipi.hadoop.PageRank synthetic.txt output 10
to read the output -> hadoop fs -cat output/sort/part* | head

open the VM with the IP = 172.16.3.166
go on hadoop with ssh hadoop@172.16.3.166
start-dfs.sh
start-yarn.sh
go into the pagerank folder for spark application -> cd pagerank_spark
run of the application passing parameters "input name", "output name", "alpha", "iterations" -> spark-submit PageRankSpark.py synthetic.txt spark_output 0.15 10
to read the output -> hadoop fs -cat output_name/part* | head