Instructions for the page rank algorithm applications running
- open the VM with the IP = 172.16.3.166
- go on hadoop with ssh [email protected]
- start-dfs.sh
- start-yarn.sh
- go into the pagerank folder -> cd pagerank
- compile the code with -> mvn clean package
- run of the application passing parameters "input name", "output name", "iterations" -> hadoop jar target/pagerank-1.0-SNAPSHOT.jar it.unipi.hadoop.PageRank synthetic.txt output 10
- to read the output -> hadoop fs -cat output/sort/part* | head
- open the VM with the IP = 172.16.3.166
- go on hadoop with ssh [email protected]
- start-dfs.sh
- start-yarn.sh
- go into the pagerank folder for spark application -> cd pagerank_spark
- run of the application passing parameters "input name", "output name", "alpha", "iterations" -> spark-submit PageRankSpark.py synthetic.txt spark_output 0.15 10
- to read the output -> hadoop fs -cat output_name/part* | head