PageRank

Instructions for the page rank algorithm applications running

For Hadoop Page Rank running:

open the VM with the IP = 172.16.3.166
go on hadoop with ssh [email protected]
start-dfs.sh
start-yarn.sh
go into the pagerank folder -> cd pagerank
compile the code with -> mvn clean package
run of the application passing parameters "input name", "output name", "iterations" -> hadoop jar target/pagerank-1.0-SNAPSHOT.jar it.unipi.hadoop.PageRank synthetic.txt output 10
to read the output -> hadoop fs -cat output/sort/part* | head

open the VM with the IP = 172.16.3.166
go on hadoop with ssh [email protected]
start-dfs.sh
start-yarn.sh
go into the pagerank folder for spark application -> cd pagerank_spark
run of the application passing parameters "input name", "output name", "alpha", "iterations" -> spark-submit PageRankSpark.py synthetic.txt spark_output 0.15 10
to read the output -> hadoop fs -cat output_name/part* | head

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Page_rank_algorithm_hadoop		Page_rank_algorithm_hadoop
page_rank_algorithm_spark		page_rank_algorithm_spark
PageRank_Stark_Group.pdf		PageRank_Stark_Group.pdf
README.md		README.md